09. Agent Anatomy — Components and Their Interaction¶
Why This Chapter?¶
An agent is a simple system. At its core: a loop "LLM responds → calls tools → gets results → responds again", plus a linear message history and a stable system prompt. No "meta-architectures", state graphs, or multi-layer memory are needed in the base case (see Preface: Mental Model).
But for that loop to be reliable in production, it helps to understand what it actually consists of: where messages live, what the system prompt is and why you can't mutate it on every step, how to defend against infinite loops, how to isolate a subagent. Without this anatomy it's easy to "fix" the problem in the wrong place and bolt on trendy patterns that complicate the agent without solving anything.
In this chapter we'll go through the minimal set of components and how they connect, so you can:
- Properly manage context and dialogue history (linear history + a single
condense). - Implement an autonomous agent loop with predictable safeguards.
- Optimize token usage via prompt cache, not via "smart" strategies.
- Build an extensible system through tools, not through over-engineered abstractions inside the core.
Real-World Case Study¶
Situation: You've created a DevOps agent. After 20 messages, the agent "forgets" the beginning of the conversation and stops remembering the task context.
Problem: Dialogue history overflows the model's context window. Old messages are "pushed out", and the agent loses important information.
Solution: Once you understand memory, on overflow you can condense the old part of the history into a short summary via the LLM — without breaking tool_call ↔ tool_result pairs and without touching the stable system prefix. Details in Ch. 13: Context Engineering. You do not need to reorder messages "by importance": that's an anti-pattern that kills prompt cache (see Ch. 12: Agent Memory).
Theory in Simple Terms¶
Simplified formula:
Memory¶
An agent must "remember" conversation context and action history.
Short-term Memory¶
This is the message history (messages array). Limited by the context window.
Message structure:
type ChatCompletionMessage struct {
Role string // "system", "user", "assistant", "tool"
Content string // Message text
ToolCallID string // If this is a tool result
}
History example:
messages := []ChatCompletionMessage{
{Role: "system", Content: "You are a DevOps engineer"},
{Role: "user", Content: "Check server status"},
{Role: "assistant", Content: "", ToolCalls: [...]}, // Tool call
{Role: "tool", Content: "Server is ONLINE", ToolCallID: "call_123"},
{Role: "assistant", Content: "Server is running normally"},
}
Problem: If history is too long, it doesn't fit in the context window.
Problem example:
// Model context window: 128 000 tokens
// System Prompt: 500 tokens
// Dialogue history: 127 000 tokens // 30 iterations with large tool_results
// New request: 1 000 tokens
// TOTAL: 128 500 tokens > 128 000 [ERROR: context overflow]
Modern models have windows from 128k (most) to 1M+ (Gemini 2.5). The problem hasn't disappeared — it's just shifted: in long agent sessions with large tool_result payloads (logs, dumps, web pages), overflow shows up around iterations 20-50.
Two rules that make this memory work (details in Ch. 12 and Ch. 13):
- History is immutable. Messages are only appended at the end; nothing is reordered or deleted "by importance", old messages are not edited. Any prefix mutation kills the provider's prompt cache, which means 5-10× more expensive input and higher latency.
- The system prompt stays stable within a Run. Any "dynamic" update of the system prompt between iterations (live state, facts, plan progress) is an anti-pattern. Dynamic data lives in
tool_resultand the latestusermessage — not in the system prefix.
When the window overflows, there is one reaction — condense: call the LLM and ask it to summarize the old part of the history, then replace that part with a single user message "Context of previous work: ...". Limit: one condense per Run; the token count comes from usage.PromptTokens in the provider's response, not from your own counters. Details in Ch. 13.
Long-term Memory¶
This is a vector database (RAG). It can store gigabytes of documents and find what's needed by meaning (Semantic Search).
How it works:
- Documents are split into chunks
- Each chunk is converted to a vector (embedding)
- When agent queries, similar vectors are searched
- Relevant chunks are added to context
Usage examples:
- DevOps: Storing protocols, runbooks, service documentation
- Support: Knowledge base with solutions to common problems
- Data: Database schemas, API documentation
- Security: Incident playbooks, security policies
See Chapter 06: RAG
Planning¶
Planning is the agent's ability to break down a complex task into a sequence of simple steps and execute them in the correct order.
Why Is Planning Needed?¶
Imagine a task: "We have database issues, investigate"
Without planning:
- Agent may try to do everything at once and get confused
- Agent may skip important steps
- Agent may execute steps in wrong order
With planning:
- Agent first analyzes the task
- Agent breaks it into subtasks
- Agent executes subtasks sequentially
- Agent checks result of each step
Planning Levels¶
Level 1: Implicit Planning (ReAct)¶
Agent doesn't create an explicit plan, but plans "on the fly" during execution.
How it works:
- Agent receives a task
- Agent thinks: "What do I need to do first?"
- Agent performs an action
- Agent receives result and thinks: "What to do next?"
- Repeats until task is complete
Example (DevOps):
User: "Check why server is slow"
Iteration 1:
Thought: "Need to check server metrics"
Action: get_cpu_metrics()
Observation: CPU 95%
Iteration 2:
Thought: "CPU is high. Need to see which process is consuming resources"
Action: get_top_processes()
Observation: Process: ffmpeg, CPU: 90%
Iteration 3:
Thought: "ffmpeg is consuming CPU. Need to check what this process is"
Action: get_process_info(pid=12345)
Observation: This is video conversion started by user
Iteration 4:
Thought: "This is a legitimate process, but it's blocking the system. I'll suggest user limit priority"
Action: [Final response to user]
Advantages:
- Flexibility: agent can adapt to results
- Simple implementation
- Suitable for medium complexity tasks
Disadvantages:
- May "get lost" in details
- May skip important steps
- Hard to track progress
Level 2: Explicit Planning (Plan-and-Solve)¶
Agent first creates a complete plan, then executes it step by step.
How it works:
- Agent receives a task
- Agent generates complete plan (list of steps)
- Agent executes steps in order
- Agent marks completed steps
- Agent completes when all steps are done
Example (DevOps - Incident):
User: "Payment service unavailable (502). Investigate"
Step 1: Planning
Agent thinks: "Need to create investigation plan"
Agent generates plan:
1. Check HTTP status of service
2. If not 200 — read logs
3. Analyze errors in logs
4. Determine cause (config error / connection error / memory)
5. Apply fix (rollback / restart / scale)
6. Verify service is restored
Step 2: Execution
Agent: "Executing step 1: Checking HTTP status"
Action: check_http_status("payment-service")
Observation: 502 Bad Gateway
Agent: "Executing step 2: Reading logs"
Action: read_logs("payment-service", lines=50)
Observation: ERROR: Config syntax error in line 42
Agent: "Executing step 3: Analyzing errors"
Thought: "Logs show syntax error in config. This is not a temporary issue."
Agent: "Executing step 4: Determining cause"
Thought: "Cause: Config error. Solution: Rollback"
Agent: "Executing step 5: Applying fix"
Action: rollback_deploy("payment-service", version="v1.9")
Observation: Rollback complete. Service restarted.
Agent: "Executing step 6: Verifying"
Action: check_http_status("payment-service")
Observation: 200 OK
Agent: "All steps completed. Incident resolved."
Advantages:
- Guarantee all steps are executed
- Easy to track progress
- Suitable for complex tasks (5+ steps)
Disadvantages:
- Less flexible: plan may be suboptimal
- Requires more tokens (plan generation)
- May be excessive for simple tasks
Level 3: Hierarchical Planning¶
For very complex tasks, the plan is broken into subplans.
Example (Security - Incident investigation):
Main task: "Investigate suspicious activity on host 192.168.1.10"
Top-level plan:
1. Alert triage
2. Evidence collection
3. Threat analysis
4. Action (containment)
5. Report generation
Subplan for step 2 (Evidence collection):
2.1. Request SIEM logs for last hour
2.2. Check network traffic
2.3. Check system metrics
2.4. Check running processes
2.5. Check filesystem for changes
Subplan for step 3 (Threat analysis):
3.1. Determine attack type
3.2. Assess criticality
3.3. Determine scope (affected systems)
3.4. Assess damage
When to use:
- Very complex tasks (10+ steps)
- Tasks with multiple dependencies
- Tasks requiring coordination of multiple specialists
Planning Strategies¶
1. ReAct (Reason + Act)¶
Most popular architecture. Formula: Thought -> Action -> Observation.
sequenceDiagram
participant User
participant Agent
participant Tool
User->>Agent: "Check server"
Note over Agent: Thought: "Need to check status"
Agent->>Tool: check_status()
Tool->>Agent: "ONLINE"
Note over Agent: Thought: "Server is running. I'll check metrics"
Agent->>Tool: get_metrics()
Tool->>Agent: "CPU: 50%"
Agent->>User: "Server is running normally"
ReAct cycle:
- Thought: Model generates CoT (action plan)
- Action: Model generates tool call token
- Runtime: Our code intercepts call, executes function, gets result
- Observation: Result is fed back to model
- Loop: Repeat until model decides task is solved
ReAct examples in different domains:
DevOps:
Thought: User complains about slowness. Need to check CPU.
Action: get_cpu_metrics()
Observation: CPU 99% process: ffmpeg
Thought: ffmpeg is consuming everything. Need to kill or limit it.
Action: kill_process(pid=123)
Support:
Thought: User can't log in. I'll check account status.
Action: check_account_status(user_id="12345")
Observation: Account is locked due to suspicious activity
Thought: Account is locked. Need to unlock, but first check logs.
Action: check_security_logs(user_id="12345")
Observation: Multiple failed login attempts from unknown IP
Thought: This looks like a hack. I'll unlock, but send notification to change password.
Action: unlock_account(user_id="12345")
Data Analytics:
Thought: User asks about sales. Need to understand data structure.
Action: describe_table("sales")
Observation: Table contains: date, region, amount
Thought: Now I'll formulate SQL query.
Action: sql_select("SELECT region, SUM(amount) FROM sales GROUP BY region")
Observation: Region A: 100k, Region B: 150k
Thought: I'll analyze results and formulate conclusions.
Plan-and-Solve¶
For complex tasks (Lab 06 Incident), ReAct may "get lost" in details.
Architecture:
-
Planner: First generate complete plan
-
Solver: Execute plan items in order
When to use Plan-and-Solve instead of ReAct?
- Task is very complex (5+ steps)
- Need guarantee all steps will be executed
- Agent often "forgets" important steps
- Task has clear structure (e.g., SOP for incidents)
Plan-and-Solve implementation:
func planAndSolve(ctx context.Context, client *openai.Client, task string) {
// Step 1: Plan generation
planPrompt := fmt.Sprintf(`Break task into steps:
Task: %s
Create action plan. Each step should be specific and executable.`, task)
planResp, _ := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "gpt-4o",
Messages: []openai.ChatCompletionMessage{
{Role: "system", Content: "You are a task planner. Create detailed plans."},
{Role: "user", Content: planPrompt},
},
})
plan := planResp.Choices[0].Message.Content
fmt.Printf("Plan:\n%s\n", plan)
// Step 2: Plan execution
executionPrompt := fmt.Sprintf(`Execute plan step by step:
Plan:
%s
Execute steps in order. After each step, report result.`, plan)
// Run agent with plan in context
runAgentWithPlan(ctx, client, executionPrompt, plan)
}
3. Tree-of-Thoughts (ToT)¶
Agent considers several solution options and chooses the best one.
How it works:
- Agent generates several possible solution paths
- Agent evaluates each path
- Agent chooses best path
- Agent executes chosen path
Example (Data Analytics):
Task: "Why did sales drop in region X?"
Option 1: Check sales data directly
- Pros: Fast
- Cons: May not show cause
Option 2: Check sales data + marketing campaigns + competitors
- Pros: More complete picture
- Cons: Longer
Option 3: Check data quality first
- Pros: Ensure data is correct
- Cons: May be excessive
Agent chooses Option 2 (most complete)
When to use:
- Task has several possible approaches
- Need to choose optimal path
- Solution efficiency is important
4. Self-Consistency¶
Agent generates several plans and chooses the most consistent one.
How it works:
- Agent generates N plans (e.g., 5)
- Agent finds common elements in all plans
- Agent creates final plan based on common elements
Example:
Plan 1: [A, B, C, D]
Plan 2: [A, B, E, F]
Plan 3: [A, C, D, G]
Plan 4: [A, B, C, H]
Plan 5: [A, B, D, I]
Common elements: A (all 5), B (in 4 of 5), C (in 2 of 5)
Final plan: [A, B, C, ...] (based on most frequent elements)
Task Decomposition¶
How to Properly Break Down Tasks?¶
Decomposition principles:
-
Atomicity: Each step should be executable with one action
- Bad: "Check and fix server"
- Good: "Check server status" → "Read logs" → "Apply fix"
-
Dependencies: Steps should execute in correct order
- Bad: "Apply fix" → "Read logs"
- Good: "Read logs" → "Analyze" → "Apply fix"
-
Verifiability: Each step should have clear success criterion
- Bad: "Improve performance"
- Good: "Reduce CPU from 95% to 50%"
Decomposition example (Support):
Original task: "Process user ticket about login problem"
Decomposition:
1. Read ticket completely
- Success criterion: All problem details obtained
2. Gather context
- Success criterion: Software version, OS, browser known
3. Search knowledge base
- Success criterion: Similar cases or solution found
4. Check account status
- Success criterion: Status known (active/locked)
5. Formulate response
- Success criterion: Response ready and contains solution
6. Send response or escalate
- Success criterion: Ticket processed
Practical Planning Examples¶
Example 1: DevOps - Incident Investigation¶
// Task: "Service is unavailable. Investigate."
// Plan (generated by agent):
plan := []string{
"1. Check HTTP status of service",
"2. If not 200 — read logs",
"3. Analyze errors",
"4. Determine cause",
"5. Apply fix",
"6. Verify recovery",
}
// Execution:
for i, step := range plan {
fmt.Printf("Executing step %d: %s\n", i+1, step)
result := executeStep(step)
if !result.Success {
fmt.Printf("Step %d failed: %s\n", i+1, result.Error)
// Agent may replan or escalate
break
}
}
Example 2: Data Analytics - Sales Analysis¶
// Task: "Why did sales drop in region X?"
// Plan:
plan := []string{
"1. Check data quality (nulls, duplicates)",
"2. Get sales data for last month",
"3. Compare with previous period",
"4. Check marketing campaigns",
"5. Check competitors",
"6. Analyze trends",
"7. Generate report with conclusions",
}
Example 3: Security - Alert Triage¶
// Task: "Alert: suspicious activity on host 192.168.1.10"
// Plan:
plan := []string{
"1. Determine alert severity",
"2. Gather evidence (logs, metrics, traffic)",
"3. Analyze attack patterns",
"4. Determine scope (affected systems)",
"5. Assess criticality",
"6. Make decision (False Positive / True Positive)",
"7. If True Positive — containment (with confirmation)",
"8. Generate report for SOC",
}
Common Planning Errors¶
Error 1: Plan Too General¶
Bad:
Good:
Plan:
1. Check HTTP status of service (check_http)
2. If 502 — read last 50 log lines (read_logs)
3. Find error keywords in logs
4. If "Syntax error" — perform rollback (rollback_deploy)
5. If "Connection refused" — restart service (restart_service)
6. Verify: check HTTP status again (check_http)
Error 2: Wrong Step Order¶
Bad:
Good:
Error 3: Skipping Important Steps¶
Bad:
Good:
Planning Checklist¶
- Task broken into atomic steps
- Steps execute in correct order
- Each step has clear success criterion
- Plan includes result verification
- Correct planning level chosen (ReAct / Plan-and-Solve / Hierarchical)
- Plan adapts to execution results (for ReAct)
Reflexion (Self-Correction)¶
Agents often make mistakes. Reflexion adds a criticism step.
Cycle: Act -> Observe -> Fail -> REFLECT -> Plan Again
Example:
Action: read_file("/etc/nginx/nginx.conf")
Observation: Permission denied
Reflection: "I tried to read file, but got Permission Denied.
This means I don't have permissions. Next time I should use sudo
or check permissions first."
Action: check_permissions("/etc/nginx/nginx.conf")
Observation: File is readable by root only
Action: read_file_sudo("/etc/nginx/nginx.conf")
Runtime (Execution Environment)¶
Runtime is the code that connects the LLM with tools.
Main Runtime functions:
- Parsing LLM responses: Determining if model wants to call a tool
- Executing tools: Calling real Go functions
- Managing history: Adding results to context
- Managing loop: Determining when to stop
Registry Pattern for Extensibility¶
To make an agent extensible, don't hardcode tool logic in main.go. Use a Registry pattern.
Problem without Registry:
- Adding a new tool requires changes in dozens of places
- Code becomes unreadable
- Hard to test individual tools
Solution: Go Interfaces + Registry
Defining Tool Interface¶
type Tool interface {
Name() string
Description() string
Parameters() json.RawMessage
Execute(args json.RawMessage) (string, error)
}
Any tool (Proxmox, Ansible, SSH) must implement this interface.
Tool Implementation¶
type ProxmoxListVMsTool struct{}
func (t *ProxmoxListVMsTool) Name() string {
return "list_vms"
}
func (t *ProxmoxListVMsTool) Description() string {
return "List all VMs in the Proxmox cluster"
}
func (t *ProxmoxListVMsTool) Parameters() json.RawMessage {
return json.RawMessage(`{
"type": "object",
"properties": {},
"required": []
}`)
}
func (t *ProxmoxListVMsTool) Execute(args json.RawMessage) (string, error) {
// Real Proxmox API call logic
return "VM-100 (Running), VM-101 (Stopped)", nil
}
Registry (Tool Registry)¶
Registry is a tool storage accessible by name.
type ToolRegistry struct {
tools map[string]Tool
}
func NewToolRegistry() *ToolRegistry {
return &ToolRegistry{
tools: make(map[string]Tool),
}
}
func (r *ToolRegistry) Register(tool Tool) {
r.tools[tool.Name()] = tool
}
func (r *ToolRegistry) Get(name string) (Tool, bool) {
tool, exists := r.tools[name]
return tool, exists
}
func (r *ToolRegistry) ToOpenAITools() []openai.Tool {
var result []openai.Tool
for _, tool := range r.tools {
result = append(result, openai.Tool{
Type: openai.ToolTypeFunction,
Function: &openai.FunctionDefinition{
Name: tool.Name(),
Description: tool.Description(),
Parameters: tool.Parameters(),
},
})
}
return result
}
Using Registry¶
// Initialization
registry := NewToolRegistry()
registry.Register(&ProxmoxListVMsTool{})
registry.Register(&AnsibleRunPlaybookTool{})
// Get tool list for LLM
tools := registry.ToOpenAITools()
// Execute tool by name
toolCall := msg.ToolCalls[0]
if tool, exists := registry.Get(toolCall.Function.Name); exists {
result, err := tool.Execute(json.RawMessage(toolCall.Function.Arguments))
if err != nil {
return fmt.Errorf("tool execution failed: %v", err)
}
// Add result to history
}
Registry advantages:
- Adding new tool — just implement interface and register
- Tool code is isolated and easily testable
- Runtime doesn't know about specific tools, works through interface
- Easy to add validation, logging, metrics at Registry level
Runtime example with Registry:
func runAgent(ctx context.Context, client *openai.Client, registry *ToolRegistry, userInput string) {
messages := []openai.ChatCompletionMessage{
{Role: "system", Content: systemPrompt},
{Role: "user", Content: userInput},
}
tools := registry.ToOpenAITools() // Get tool list from Registry
for i := 0; i < maxIterations; i++ {
resp, _ := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "gpt-4o-mini",
Messages: messages,
Tools: tools,
})
msg := resp.Choices[0].Message
messages = append(messages, msg)
if len(msg.ToolCalls) == 0 {
// Final response
fmt.Println(msg.Content)
break
}
// Execute tools through Registry
for _, toolCall := range msg.ToolCalls {
tool, exists := registry.Get(toolCall.Function.Name)
if !exists {
result := fmt.Sprintf("Error: Unknown tool %s", toolCall.Function.Name)
messages = append(messages, openai.ChatCompletionMessage{
Role: "tool",
Content: result,
ToolCallID: toolCall.ID,
})
continue
}
result, err := tool.Execute(json.RawMessage(toolCall.Function.Arguments))
if err != nil {
result = fmt.Sprintf("Error: %v", err)
}
messages = append(messages, openai.ChatCompletionMessage{
Role: "tool",
Content: result,
ToolCallID: toolCall.ID,
})
}
}
}
Agent Data Flow Diagram¶
sequenceDiagram
participant User
participant Runtime
participant LLM
participant Tool
User->>Runtime: "Check server status"
Runtime->>Runtime: Collects messages[]:<br/>- System Prompt<br/>- User Input<br/>- History (if any)
Runtime->>LLM: ChatCompletionRequest<br/>(messages, tools[])
LLM->>LLM: Generates tool_call:<br/>{name: "check_status", args: {...}}
LLM->>Runtime: Response (tool_calls)
Runtime->>Runtime: Parses tool_call<br/>Validates name and arguments
Runtime->>Tool: Executes checkStatus(hostname)
Tool->>Runtime: "Server is ONLINE"
Runtime->>Runtime: Adds result to messages[]:<br/>{role: "tool", content: "..."}
Runtime->>LLM: ChatCompletionRequest<br/>(updated history)
LLM->>LLM: Receives tool result<br/>Generates final response
LLM->>Runtime: Response (content: "Server is running")
Runtime->>User: "Server is running normally"
Key points:
- LLM doesn't execute code, only generates JSON
- Runtime manages entire loop and executes real functions
- History (
messages[]) is collected by Runtime and passed in each request - Tool results are added to history before next request
Config-based Agent Injection¶
In practice, a single Agent type serves different modes through Config:
type Config struct {
MaxIter int
Prompt string // system prompt is fixed at the start of the Run
Tools []Tool // tool set is fixed at the start of the Run
Hooks Hooks // AfterResponse, BeforeToolExec, AfterToolExec
Condenser Condenser // reaction to context overflow (see Ch. 13)
OnDelta func(delta string) // streaming
OnToolCall func(name, args string) // tool call notification
}
The same Agent works as:
- Main agent — full Config (a prompt with the full role, Hooks for skill selection, Condenser for overflow).
- Subagent — minimal Config (a narrow prompt for the subtask, no Hooks, no Condenser).
- Triage agent — narrow prompt + a trimmed Tools list, no callbacks.
- ChatOnly — Tools = nil, simple dialogue.
Instead of inheritance or if-branching — behavior injection through Config. This makes each component independently unit-testable.
Why are Prompt and Tools fields, not func(state)? Because the system message and the tools[] block sit at the very start of the request and form the prefix that the provider caches. If you return a different prompt text or a different tool list on every iteration, the cache misses on every step, input cost goes up 5-10×, and latency grows. Dynamic data lives in tool_result and user messages — not in the system prefix. If behavior changes radically (a new role, a different tool set), that's a new Run, not a modification of the current one.
A subagent built this way doesn't need any kind of "Working Memory" — it has its own short history for its task. If the main agent wants to pass it context, it does so as text inside task, not via a shared structure.
Event System¶
The agent generates events as it works. UI, logging, and metrics subscribe to them independently:
type EventType string
const (
EventDelta EventType = "delta" // text fragment
EventToolCall EventType = "tool_call" // tool call
EventToolResult EventType = "tool_result" // tool result
EventDone EventType = "done" // completion
EventPlanUpdate EventType = "plan_update" // plan update
EventSubagentEvent EventType = "subagent" // subagent event
)
type Broker[T any] struct {
subscribers []chan T
mu sync.RWMutex
}
func (b *Broker[T]) Subscribe() <-chan T {
ch := make(chan T, 64)
b.mu.Lock()
b.subscribers = append(b.subscribers, ch)
b.mu.Unlock()
return ch
}
func (b *Broker[T]) Publish(event T) {
b.mu.RLock()
defer b.mu.RUnlock()
for _, ch := range b.subscribers {
select {
case ch <- event:
default: // don't block if subscriber is slow
}
}
}
Fan-out pattern: a single event is delivered to all subscribers. UI shows streaming, logger writes to file, metrics count tool calls — all working in parallel.
Skills¶
An agent with a fixed System Prompt is limited. Skills let you dynamically change agent behavior depending on the task.
Agent Skills is an open format for extending agent capabilities with specialized knowledge and workflows. The standard is supported by Cursor, Claude Code, VS Code, GitHub, and other tools. At its core, a skill is a folder with a SKILL.md file. The agent loads its content into context on demand.
Why Skills?¶
Imagine: your DevOps agent works with Docker. Tomorrow a Kubernetes task comes in. Without skills, you'd rewrite the System Prompt and hardcode instructions.
With skills, you just load the right behavior module.
A Skill Is Not a Tool¶
A skill and a tool are different things. A tool performs a concrete action (calls API, reads file). A skill describes how to behave in a given situation.
Analogy:
- Tool — hammer, saw, screwdriver (concrete actions)
- Skill — instruction "how to assemble a cabinet" (sequence of actions and rules)
Skills — Magic vs Reality¶
Wrong (how it's usually explained):
Skills are AI modules the agent "learned" and applies automatically.
Reality (how it actually works):
A skill is a text file with instructions. The agent loads its content into context before the LLM request. The model simply receives additional text in the prompt. No training happens.
Agent Skills Format (SKILL.md)¶
The Agent Skills specification defines a standard file format. A skill is a directory with a SKILL.md file:
docker-debugging/
├── SKILL.md # Required: metadata + instructions
├── scripts/ # Optional: executable code
├── references/ # Optional: additional documentation
└── assets/ # Optional: templates, schemas
The SKILL.md file contains YAML frontmatter and Markdown instructions:
---
name: docker-debugging
description: >
Debug Docker containers — check status, logs, resources,
restart loops. Use when user mentions Docker problems.
---
# Docker Debugging
## When to use this skill
Use when the user reports Docker container issues...
## Steps
1. Check container status (docker ps)
2. Read logs (docker logs)
3. Check resources (docker stats)
4. If container is restarting — check exit code
5. Never run docker rm -f without confirmation
Two fields are required: name (identifier, lowercase + hyphens) and description (when to use this skill). The Markdown body contains the actual instructions.
Progressive Disclosure¶
Skills use a three-level loading pattern to manage context efficiently:
- Discovery. At startup, the agent loads only
nameanddescriptionof each available skill. This costs ~100 tokens per skill. - Activation. When the task matches a skill's description, the agent reads the full
SKILL.mdbody into context. - Execution. The agent follows the instructions, loading referenced files (
scripts/,references/) only when needed.
This way, the agent stays fast while having access to detailed instructions on demand.
Implementing Skill Loading in Runtime¶
Below is one way to implement skill loading in your agent's Runtime. The Instruction field corresponds to the body of a SKILL.md file. Trigger-based matching shown here is a simplified approach — production implementations (Cursor, Claude Code) let the LLM itself decide which skill to activate based on description.
Skill definition¶
// SkillDefinition describes a skill
type SkillDefinition struct {
Name string // Unique skill name
Description string // When to use this skill
Triggers []string // Keywords for automatic loading
Instruction string // Instruction text (added to prompt)
}
Skill Registry¶
Similar to the Tool Registry, skills are stored in a registry. The registry finds matching skills by keywords in the user's request.
// SkillRegistry is the agent's skill registry
type SkillRegistry struct {
skills map[string]SkillDefinition
}
func NewSkillRegistry() *SkillRegistry {
return &SkillRegistry{
skills: make(map[string]SkillDefinition),
}
}
func (r *SkillRegistry) Register(skill SkillDefinition) {
r.skills[skill.Name] = skill
}
// FindByTrigger finds skills by keyword in user input
func (r *SkillRegistry) FindByTrigger(userInput string) []SkillDefinition {
var matched []SkillDefinition
for _, skill := range r.skills {
for _, trigger := range skill.Triggers {
if strings.Contains(strings.ToLower(userInput), trigger) {
matched = append(matched, skill)
break
}
}
}
return matched
}
// BuildPrompt assembles System Prompt with loaded skills
func (r *SkillRegistry) BuildPrompt(basePrompt string, skills []SkillDefinition) string {
if len(skills) == 0 {
return basePrompt
}
var sb strings.Builder
sb.WriteString(basePrompt)
sb.WriteString("\n\n## Loaded Skills\n\n")
for _, skill := range skills {
sb.WriteString("### " + skill.Name + "\n")
sb.WriteString(skill.Instruction + "\n\n")
}
return sb.String()
}
Usage example¶
registry := NewSkillRegistry()
// Register skills
registry.Register(SkillDefinition{
Name: "docker-debugging",
Description: "Docker container debugging",
Triggers: []string{"docker", "container"},
Instruction: `When working with Docker:
1. First check container status (docker ps)
2. Read logs (docker logs)
3. Check resources (docker stats)
4. If container is restarting — check exit code
5. Never run docker rm -f without confirmation`,
})
registry.Register(SkillDefinition{
Name: "incident-response",
Description: "Incident response",
Triggers: []string{"incident", "outage", "502", "500", "downtime"},
Instruction: `During an incident:
1. Determine severity (P1/P2/P3)
2. Gather facts, don't guess
3. Check monitoring and logs
4. Apply minimal fix (rollback)
5. Verify recovery
6. Don't optimize during a fire`,
})
// Agent receives a request
userInput := "Docker container keeps restarting"
// Find matching skills
skills := registry.FindByTrigger(userInput)
// Build prompt with skills
prompt := registry.BuildPrompt(baseSystemPrompt, skills)
// prompt now contains base prompt + Docker debugging instructions
Skills extend agent behavior without code changes. A new skill is a new SKILL.md file, not a refactor.
For the full format specification, see Agent Skills Specification. Example skills are available in the GitHub repository.
Subagents¶
Sometimes a task is too complex for a single agent. Or the agent needs to handle a subtask without polluting the main context. For this, an agent can spawn a subagent.
When Subagent, When Tool?¶
Tool is the right choice when:
- Action is simple and deterministic
- Result is predictable
- No "thinking" required
Subagent is the right choice when:
- Subtask requires reasoning and multiple steps
- Separate context is needed (to keep the main one clean)
- Subtask is complex on its own
Example:
Task: "Deploy new service and set up monitoring"
Tool approach (bad for complex tasks):
→ deploy_service("my-app") // Single action, no flexibility
→ setup_monitoring("my-app") // Single action, no flexibility
Subagent approach (good for complex tasks):
→ Subagent 1: "Deploy service" (5-10 steps with reasoning)
→ Subagent 2: "Set up monitoring" (3-5 steps with reasoning)
Parent — Child Relationship¶
The parent agent creates a subagent with:
- Separate System Prompt — specialized for the specific task
- Its own tool set — only what the subtask needs
- Its own context — doesn't inherit the parent's full history
- Iteration limit — subagent can't run forever
When the subagent finishes, the result returns to the parent as plain text.
// SubagentConfig holds subagent configuration
type SubagentConfig struct {
Name string // Name for logging
SystemPrompt string // Instructions for subagent
Tools []Tool // Available tools
MaxIterations int // Iteration limit
}
// SpawnSubagent creates and runs a subagent to execute a subtask.
// The subagent runs its own loop and returns a text result.
func SpawnSubagent(
ctx context.Context,
client *openai.Client,
config SubagentConfig,
task string,
) (string, error) {
messages := []openai.ChatCompletionMessage{
{Role: "system", Content: config.SystemPrompt},
{Role: "user", Content: task},
}
registry := NewToolRegistry()
for _, tool := range config.Tools {
registry.Register(tool)
}
// Subagent runs its own loop
for i := 0; i < config.MaxIterations; i++ {
resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "gpt-4o",
Messages: messages,
Tools: registry.ToOpenAITools(),
})
if err != nil {
return "", fmt.Errorf("subagent %s: %w", config.Name, err)
}
msg := resp.Choices[0].Message
messages = append(messages, msg)
// No tool_calls — subagent finished
if len(msg.ToolCalls) == 0 {
return msg.Content, nil
}
// Execute subagent's tools
for _, tc := range msg.ToolCalls {
tool, exists := registry.Get(tc.Function.Name)
if !exists {
messages = append(messages, openai.ChatCompletionMessage{
Role: "tool",
Content: fmt.Sprintf("Error: unknown tool %s", tc.Function.Name),
ToolCallID: tc.ID,
})
continue
}
result, err := tool.Execute(json.RawMessage(tc.Function.Arguments))
if err != nil {
result = fmt.Sprintf("Error: %v", err)
}
messages = append(messages, openai.ChatCompletionMessage{
Role: "tool",
Content: result,
ToolCallID: tc.ID,
})
}
}
return "", fmt.Errorf("subagent %s: iteration limit exceeded (%d)", config.Name, config.MaxIterations)
}
Example: Parent Agent Spawns Subagents¶
// Parent agent received a complex task
task := "Deploy payment-api service and set up alerts"
// Subagent for deployment
deployResult, err := SpawnSubagent(ctx, client, SubagentConfig{
Name: "deploy-agent",
SystemPrompt: "You are a deployment specialist. Deploy the service properly.",
Tools: []Tool{&DeployTool{}, &HealthCheckTool{}},
MaxIterations: 10,
}, "Deploy payment-api service to production")
if err != nil {
log.Fatalf("Deploy failed: %v", err)
}
// Subagent for monitoring (receives context from first)
monitorResult, err := SpawnSubagent(ctx, client, SubagentConfig{
Name: "monitoring-agent",
SystemPrompt: "You are a monitoring specialist. Set up alerts.",
Tools: []Tool{&CreateAlertTool{}, &ListMetricsTool{}},
MaxIterations: 5,
}, fmt.Sprintf("Set up alerts for payment-api. Deploy context: %s", deployResult))
Subagents are a special case of multi-agent systems. For more on coordinating multiple agents, delegation patterns, and orchestration, see Chapter 07: Multi-Agent.
Checkpoint and Resume¶
Long-running agents work for minutes and hours. During that time, the network can drop, an API limit can hit, or the process can restart. Without a state-saving mechanism, all work is lost.
Why This Matters¶
Imagine: an agent analyzes 50 servers. On server 47, the API connection drops. Without checkpoint, the agent starts over — 47 API calls wasted. With checkpoint, it continues from server 48.
Saving Strategies¶
Per-iteration — save after each loop iteration:
- Simple to implement
- May be excessive for short tasks
- Good for long tasks with predictable step count
Per-tool-call — save after each tool call:
- More granular control
- Useful when tool calls are expensive or slow
- Avoids repeating already-executed calls
Checkpoint — Magic vs Reality¶
Wrong (how it's usually explained):
The agent automatically remembers its state and continues from where it stopped.
Reality (how it actually works):
Agent state is a messages[] array plus metadata (iteration number, status). You serialize them to JSON and save to disk. On restart, you load and continue the loop from the same place.
Checkpoint structure¶
// Checkpoint is a snapshot of agent state
type Checkpoint struct {
ID string `json:"id"`
CreatedAt time.Time `json:"created_at"`
Iteration int `json:"iteration"`
Messages []openai.ChatCompletionMessage `json:"messages"`
TaskID string `json:"task_id"`
Status string `json:"status"` // "in_progress", "completed", "failed"
}
Save and load¶
// SaveCheckpoint saves agent state to disk
func SaveCheckpoint(dir string, cp Checkpoint) error {
data, err := json.MarshalIndent(cp, "", " ")
if err != nil {
return fmt.Errorf("marshal checkpoint: %w", err)
}
path := filepath.Join(dir, cp.TaskID+".json")
return os.WriteFile(path, data, 0644)
}
// LoadCheckpoint loads the last checkpoint for a task.
// Returns nil, nil if no checkpoint found — agent starts from scratch.
func LoadCheckpoint(dir, taskID string) (*Checkpoint, error) {
path := filepath.Join(dir, taskID+".json")
data, err := os.ReadFile(path)
if err != nil {
if os.IsNotExist(err) {
return nil, nil // No checkpoint — start from scratch
}
return nil, fmt.Errorf("read checkpoint: %w", err)
}
var cp Checkpoint
if err := json.Unmarshal(data, &cp); err != nil {
return nil, fmt.Errorf("unmarshal checkpoint: %w", err)
}
return &cp, nil
}
Agent with checkpoint support¶
func runAgentWithCheckpoints(
ctx context.Context,
client *openai.Client,
registry *ToolRegistry,
taskID, userInput string,
) error {
checkpointDir := "./checkpoints"
// Try to restore from checkpoint
cp, err := LoadCheckpoint(checkpointDir, taskID)
if err != nil {
return err
}
var messages []openai.ChatCompletionMessage
startIteration := 0
if cp != nil && cp.Status == "in_progress" {
// Restoring — take history and iteration number from checkpoint
messages = cp.Messages
startIteration = cp.Iteration
log.Printf("Restored checkpoint: iteration %d", startIteration)
} else {
// No checkpoint found — start from scratch
messages = []openai.ChatCompletionMessage{
{Role: "system", Content: systemPrompt},
{Role: "user", Content: userInput},
}
}
for i := startIteration; i < maxIterations; i++ {
resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "gpt-4o",
Messages: messages,
Tools: registry.ToOpenAITools(),
})
if err != nil {
// API error — save checkpoint and exit
SaveCheckpoint(checkpointDir, Checkpoint{
TaskID: taskID,
Iteration: i,
Messages: messages,
Status: "in_progress",
CreatedAt: time.Now(),
})
return fmt.Errorf("API error (checkpoint saved at iteration %d): %w", i, err)
}
msg := resp.Choices[0].Message
messages = append(messages, msg)
if len(msg.ToolCalls) == 0 {
// Task completed — save final checkpoint
SaveCheckpoint(checkpointDir, Checkpoint{
TaskID: taskID,
Iteration: i,
Messages: messages,
Status: "completed",
CreatedAt: time.Now(),
})
return nil
}
// Execute tools
for _, tc := range msg.ToolCalls {
tool, _ := registry.Get(tc.Function.Name)
result, _ := tool.Execute(json.RawMessage(tc.Function.Arguments))
messages = append(messages, openai.ChatCompletionMessage{
Role: "tool",
Content: result,
ToolCallID: tc.ID,
})
}
// Save checkpoint after each iteration
SaveCheckpoint(checkpointDir, Checkpoint{
TaskID: taskID,
Iteration: i + 1,
Messages: messages,
Status: "in_progress",
CreatedAt: time.Now(),
})
}
return nil
}
Checkpoint is insurance. For short tasks (2-3 iterations), it's overkill. For long tasks (10+ iterations) or expensive API calls, it's essential.
Common Errors¶
Error 1: History Overflow¶
Symptom: Agent "forgets" the beginning of the conversation. After N messages it stops remembering task context; or the provider returns context_length_exceeded.
Cause: Dialogue history exceeds the model's context window. Old messages are "pushed out" of context, or the request is rejected outright.
Solution — condense (see Ch. 13):
// BAD: blindly truncating the middle — risks breaking tool_call ↔ tool_result pairs,
// returning a provider validation error, and losing important context.
if len(messages) > maxHistoryLength {
messages = append(
[]openai.ChatCompletionMessage{messages[0]},
messages[len(messages)-maxHistoryLength+1:]...,
)
}
// BAD: reordering messages "by importance" — kills prompt cache
// and almost always produces an "orphan tool_result" error.
// GOOD: condense — ask the LLM to summarize the old part of the history
// and replace it with a single user message. System stays. The last N messages
// (with intact tool_call/tool_result pairs) also stay. Limit: one condense per Run.
messages = condense(ctx, messages, keepLastN)
The trigger for condense is usage.PromptTokens >= 0.80 * contextWindow or a context_length_exceeded error. Don't use your own token counter for the trigger — it drifts. Details in Ch. 13.
Error 2: Agent Loops¶
Symptom: Agent repeats the same action infinitely.
Cause: No iteration limit and no detection of repeating actions.
Solution:
// GOOD: Iteration limit + stuck detection
for i := 0; i < maxIterations; i++ {
// ...
// Stuck detection
if lastNActionsAreSame(history, 3) {
break
}
}
Error 3: Tool Result Not Added to History¶
Symptom: Agent doesn't see tool result and continues performing the same action.
Cause: Tool execution result is not added to messages[].
Solution:
// BAD: Result not added
result := executeTool(toolCall)
// History not updated!
// GOOD: Result added to history
result := executeTool(toolCall)
messages = append(messages, openai.ChatCompletionMessage{
Role: openai.ChatMessageRoleTool,
Content: result,
ToolCallID: toolCall.ID,
})
Error 4: Monolithic Run()¶
Symptom: Cannot unit-test agent components. The only way to verify routing is to run the entire Run() with a mock provider.
Cause: The Run() function combines 6 responsibilities: prompt assembly, routing, error recovery, tool execution, plan tracking, event publishing. In practice, this easily grows to 240 lines with nested if-branches.
Solution: Separate responsibilities through injection:
// BAD: everything in one function
func (a *Agent) Run(ctx context.Context, msg string) (string, error) {
// 240 lines: routing + prompt + tools + recovery + events + plan
}
// GOOD: injection through Config
type Config struct {
PromptFunc func(state LoopState) string
ToolFilter func(iter int) []Tool
Hooks Hooks
Condenser Condenser
}
Error 5: Subagent Hardcode in executeTools¶
Symptom: Adding a new type of special tool requires modifying the shared tool execution function.
Cause: Inside executeTools() there's a check if tc.Name == "subagent" with special logic (plan step tracking, progress forwarding). Each "special" tool adds another if.
Solution: Use Hooks:
// BAD: hardcode in shared function
func executeTools(calls []ToolCall) {
for _, tc := range calls {
if tc.Name == "subagent" {
// 20 lines of special logic
}
result := registry.Execute(tc)
}
}
// GOOD: Hooks
type Hooks struct {
BeforeToolExec func(name string, args string)
AfterToolExec func(name string, args string, result string)
}
Mini-Exercises¶
Exercise 1: Pre-send token estimate¶
Implement an approximate token estimator over the message history — for metrics and for deciding "is it time to trigger condense early":
func estimateTokens(messages []openai.ChatCompletionMessage) int {
// Approximate via tiktoken (or any fast tokenizer).
// Account for: Content, ToolCalls, ToolCallID, Role overhead.
// In production this is only an estimate — the exact number always comes
// from usage.PromptTokens.
}
Important rule (see Ch. 13): in a real agent loop the primary source of token counts is the usage.PromptTokens field from the provider's response. Your own counter is good only for an estimate before the very first request, or for cheap metrics. You cannot use it as the condense trigger — it systematically misses by 5-30% (especially with tool_calls and multilingual content).
Expected result:
- The function returns an estimate within ±10% of
usage.PromptTokenson a typical history. - Accounts for all message types (system, user, assistant, tool).
- In agent code it is called only before the first request; after the first response, the
condensetrigger is computed fromusage.PromptTokens.
Exercise 2: Implement condense¶
Note: the theory is in Ch. 13: Context Engineering. Here is the skeleton for self-implementation.
Implement condense — the single history-compression operation the agent applies on context overflow:
// condense replaces the "middle" of the history with a single user message
// containing a summary, by calling the LLM with a request to summarize the
// passed fragment.
//
// Guarantees:
// - the first message (system) is preserved as-is;
// - the last keepLastN messages are preserved as-is, with intact
// tool_call ↔ tool_result pairs (no "orphan tool_result");
// - everything between them is replaced with a single role="user" message
// whose content is "Context of previous work:\n\n" + summary.
//
// Per-Run limit — 1 condense call; a repeated overflow right after a condense
// means the task is too large for a single Run.
func condense(
ctx context.Context,
client *openai.Client,
messages []openai.ChatCompletionMessage,
keepLastN int,
) ([]openai.ChatCompletionMessage, error) {
// 1) if len(messages) < 2 + keepLastN — nothing to compress, return as-is
// 2) split off head (without system and without the last keepLastN) and tail
// 3) call the LLM with a prompt "summarize head into N tokens"
// 4) assemble [system, summary-as-user, ...tail] and return
}
Expected result:
- system always remains first.
- The last
keepLastNmessages are preserved in full. - Between them — exactly one
usermessage with the summary. - No "orphan tool_result": if a
tool_resultends up intailwithout its matchingtool_call, either pull the pair intotailtogether, or drop both. - No reordering, no "importance scoring", no mutation of
system.
Completion Criteria / Checklist¶
Done:
- Short-term memory is a linear immutable history (
messages[]). - System prompt and tool list are fixed at the start of the Run and don't change between iterations.
- The
condensetrigger is computed fromusage.PromptTokensin the provider's response (your own counter — only for a pre-send estimate). - On overflow,
condenseis applied with a 1-per-Run limit;tool_call ↔ tool_resultpairs are not torn apart. - Long-term memory (RAG) is set up if the task needs cross-session knowledge.
- Planning (ReAct/Plan-and-Solve) is implemented.
- Runtime correctly parses LLM responses, executes tools, manages the loop.
- There's protection against loops (iteration limit + repeat detection).
- Subagent is a standalone Agent with its own Config — not a "shared structure with shared memory".
Not done (anti-patterns):
- History overflows and there's no reaction (neither
condensenor a hard stop). - Messages are reordered "by importance" ("important" ones promoted, "unimportant" dropped) — prompt cache is killed, tool pairs are broken.
- System prompt is mutated between iterations (live state, facts, plan progress are stitched into it) — prompt cache is killed.
- Tool list changes on every iteration without a strong reason — same problem.
- The
condensetrigger or the budget cap is computed from your ownlen(content)/3ortiktokencounter instead ofusage.PromptTokens. - The agent loops (no iteration limit).
- Tool results are not added to history.
- Monolithic
Run()with no separation of concerns. - Hardcoding special tools inside the shared
executeTools— every "special" tool adds anotherif.
Connection with Other Chapters¶
- Preface → Mental Model — why Runtime, tool permissions, and confirmation for dangerous actions are not "special LLM machinery" but "the same as for a new employee".
- Chapter 01: LLM Physics — understanding the context window and tokens.
- Chapter 03: Tools and Function Calling — how Runtime executes tools.
- Chapter 04: Autonomy and Loops — how Planning works in the agent loop; loop detection, recovery state machine.
- Chapter 12: Agent Memory Systems — two memory horizons, linear history as an immutable log, prompt cache, compact vs condense vs recall.
- Chapter 13: Context Engineering — stable system prefix, single threshold and single reaction (
condensewith a 1/Run cap),usage.PromptTokensas the primary source. - Agent Skills Specification — open format for agent skills (
SKILL.md).
What's Next?¶
After studying architecture, proceed to:
- 10. Planning and Workflow Patterns — how agent plans complex tasks