Skip to content

09. Agent Anatomy — Components and Their Interaction

Why This Chapter?

An agent is a simple system. At its core: a loop "LLM responds → calls tools → gets results → responds again", plus a linear message history and a stable system prompt. No "meta-architectures", state graphs, or multi-layer memory are needed in the base case (see Preface: Mental Model).

But for that loop to be reliable in production, it helps to understand what it actually consists of: where messages live, what the system prompt is and why you can't mutate it on every step, how to defend against infinite loops, how to isolate a subagent. Without this anatomy it's easy to "fix" the problem in the wrong place and bolt on trendy patterns that complicate the agent without solving anything.

In this chapter we'll go through the minimal set of components and how they connect, so you can:

  • Properly manage context and dialogue history (linear history + a single condense).
  • Implement an autonomous agent loop with predictable safeguards.
  • Optimize token usage via prompt cache, not via "smart" strategies.
  • Build an extensible system through tools, not through over-engineered abstractions inside the core.

Real-World Case Study

Situation: You've created a DevOps agent. After 20 messages, the agent "forgets" the beginning of the conversation and stops remembering the task context.

Problem: Dialogue history overflows the model's context window. Old messages are "pushed out", and the agent loses important information.

Solution: Once you understand memory, on overflow you can condense the old part of the history into a short summary via the LLM — without breaking tool_call ↔ tool_result pairs and without touching the stable system prefix. Details in Ch. 13: Context Engineering. You do not need to reorder messages "by importance": that's an anti-pattern that kills prompt cache (see Ch. 12: Agent Memory).

Theory in Simple Terms

Simplified formula:

\[ Agent = LLM + Memory + Tools + Planning \]

Memory

An agent must "remember" conversation context and action history.

Short-term Memory

This is the message history (messages array). Limited by the context window.

Message structure:

type ChatCompletionMessage struct {
    Role    string  // "system", "user", "assistant", "tool"
    Content string  // Message text
    ToolCallID string  // If this is a tool result
}

History example:

messages := []ChatCompletionMessage{
    {Role: "system", Content: "You are a DevOps engineer"},
    {Role: "user", Content: "Check server status"},
    {Role: "assistant", Content: "", ToolCalls: [...]},  // Tool call
    {Role: "tool", Content: "Server is ONLINE", ToolCallID: "call_123"},
    {Role: "assistant", Content: "Server is running normally"},
}

Problem: If history is too long, it doesn't fit in the context window.

Problem example:

// Model context window: 128 000 tokens
// System Prompt:            500 tokens
// Dialogue history:     127 000 tokens   // 30 iterations with large tool_results
// New request:            1 000 tokens
// TOTAL:                128 500 tokens > 128 000 [ERROR: context overflow]

Modern models have windows from 128k (most) to 1M+ (Gemini 2.5). The problem hasn't disappeared — it's just shifted: in long agent sessions with large tool_result payloads (logs, dumps, web pages), overflow shows up around iterations 20-50.

Two rules that make this memory work (details in Ch. 12 and Ch. 13):

  1. History is immutable. Messages are only appended at the end; nothing is reordered or deleted "by importance", old messages are not edited. Any prefix mutation kills the provider's prompt cache, which means 5-10× more expensive input and higher latency.
  2. The system prompt stays stable within a Run. Any "dynamic" update of the system prompt between iterations (live state, facts, plan progress) is an anti-pattern. Dynamic data lives in tool_result and the latest user message — not in the system prefix.

When the window overflows, there is one reaction — condense: call the LLM and ask it to summarize the old part of the history, then replace that part with a single user message "Context of previous work: ...". Limit: one condense per Run; the token count comes from usage.PromptTokens in the provider's response, not from your own counters. Details in Ch. 13.

Long-term Memory

This is a vector database (RAG). It can store gigabytes of documents and find what's needed by meaning (Semantic Search).

How it works:

  1. Documents are split into chunks
  2. Each chunk is converted to a vector (embedding)
  3. When agent queries, similar vectors are searched
  4. Relevant chunks are added to context

Usage examples:

  • DevOps: Storing protocols, runbooks, service documentation
  • Support: Knowledge base with solutions to common problems
  • Data: Database schemas, API documentation
  • Security: Incident playbooks, security policies

See Chapter 06: RAG

Planning

Planning is the agent's ability to break down a complex task into a sequence of simple steps and execute them in the correct order.

Why Is Planning Needed?

Imagine a task: "We have database issues, investigate"

Without planning:

  • Agent may try to do everything at once and get confused
  • Agent may skip important steps
  • Agent may execute steps in wrong order

With planning:

  • Agent first analyzes the task
  • Agent breaks it into subtasks
  • Agent executes subtasks sequentially
  • Agent checks result of each step

Planning Levels

Level 1: Implicit Planning (ReAct)

Agent doesn't create an explicit plan, but plans "on the fly" during execution.

How it works:

  1. Agent receives a task
  2. Agent thinks: "What do I need to do first?"
  3. Agent performs an action
  4. Agent receives result and thinks: "What to do next?"
  5. Repeats until task is complete

Example (DevOps):

User: "Check why server is slow"

Iteration 1:
Thought: "Need to check server metrics"
Action: get_cpu_metrics()
Observation: CPU 95%

Iteration 2:
Thought: "CPU is high. Need to see which process is consuming resources"
Action: get_top_processes()
Observation: Process: ffmpeg, CPU: 90%

Iteration 3:
Thought: "ffmpeg is consuming CPU. Need to check what this process is"
Action: get_process_info(pid=12345)
Observation: This is video conversion started by user

Iteration 4:
Thought: "This is a legitimate process, but it's blocking the system. I'll suggest user limit priority"
Action: [Final response to user]

Advantages:

  • Flexibility: agent can adapt to results
  • Simple implementation
  • Suitable for medium complexity tasks

Disadvantages:

  • May "get lost" in details
  • May skip important steps
  • Hard to track progress

Level 2: Explicit Planning (Plan-and-Solve)

Agent first creates a complete plan, then executes it step by step.

How it works:

  1. Agent receives a task
  2. Agent generates complete plan (list of steps)
  3. Agent executes steps in order
  4. Agent marks completed steps
  5. Agent completes when all steps are done

Example (DevOps - Incident):

User: "Payment service unavailable (502). Investigate"

Step 1: Planning
Agent thinks: "Need to create investigation plan"
Agent generates plan:
  1. Check HTTP status of service
  2. If not 200 — read logs
  3. Analyze errors in logs
  4. Determine cause (config error / connection error / memory)
  5. Apply fix (rollback / restart / scale)
  6. Verify service is restored

Step 2: Execution
Agent: "Executing step 1: Checking HTTP status"
Action: check_http_status("payment-service")
Observation: 502 Bad Gateway

Agent: "Executing step 2: Reading logs"
Action: read_logs("payment-service", lines=50)
Observation: ERROR: Config syntax error in line 42

Agent: "Executing step 3: Analyzing errors"
Thought: "Logs show syntax error in config. This is not a temporary issue."

Agent: "Executing step 4: Determining cause"
Thought: "Cause: Config error. Solution: Rollback"

Agent: "Executing step 5: Applying fix"
Action: rollback_deploy("payment-service", version="v1.9")
Observation: Rollback complete. Service restarted.

Agent: "Executing step 6: Verifying"
Action: check_http_status("payment-service")
Observation: 200 OK

Agent: "All steps completed. Incident resolved."

Advantages:

  • Guarantee all steps are executed
  • Easy to track progress
  • Suitable for complex tasks (5+ steps)

Disadvantages:

  • Less flexible: plan may be suboptimal
  • Requires more tokens (plan generation)
  • May be excessive for simple tasks

Level 3: Hierarchical Planning

For very complex tasks, the plan is broken into subplans.

Example (Security - Incident investigation):

Main task: "Investigate suspicious activity on host 192.168.1.10"

Top-level plan:
  1. Alert triage
  2. Evidence collection
  3. Threat analysis
  4. Action (containment)
  5. Report generation

Subplan for step 2 (Evidence collection):
  2.1. Request SIEM logs for last hour
  2.2. Check network traffic
  2.3. Check system metrics
  2.4. Check running processes
  2.5. Check filesystem for changes

Subplan for step 3 (Threat analysis):
  3.1. Determine attack type
  3.2. Assess criticality
  3.3. Determine scope (affected systems)
  3.4. Assess damage

When to use:

  • Very complex tasks (10+ steps)
  • Tasks with multiple dependencies
  • Tasks requiring coordination of multiple specialists

Planning Strategies

1. ReAct (Reason + Act)

Most popular architecture. Formula: Thought -> Action -> Observation.

sequenceDiagram
    participant User
    participant Agent
    participant Tool

    User->>Agent: "Check server"
    Note over Agent: Thought: "Need to check status"
    Agent->>Tool: check_status()
    Tool->>Agent: "ONLINE"
    Note over Agent: Thought: "Server is running. I'll check metrics"
    Agent->>Tool: get_metrics()
    Tool->>Agent: "CPU: 50%"
    Agent->>User: "Server is running normally"

ReAct cycle:

  1. Thought: Model generates CoT (action plan)
  2. Action: Model generates tool call token
  3. Runtime: Our code intercepts call, executes function, gets result
  4. Observation: Result is fed back to model
  5. Loop: Repeat until model decides task is solved

ReAct examples in different domains:

DevOps:

Thought: User complains about slowness. Need to check CPU.
Action: get_cpu_metrics()
Observation: CPU 99% process: ffmpeg
Thought: ffmpeg is consuming everything. Need to kill or limit it.
Action: kill_process(pid=123)

Support:

Thought: User can't log in. I'll check account status.
Action: check_account_status(user_id="12345")
Observation: Account is locked due to suspicious activity
Thought: Account is locked. Need to unlock, but first check logs.
Action: check_security_logs(user_id="12345")
Observation: Multiple failed login attempts from unknown IP
Thought: This looks like a hack. I'll unlock, but send notification to change password.
Action: unlock_account(user_id="12345")

Data Analytics:

Thought: User asks about sales. Need to understand data structure.
Action: describe_table("sales")
Observation: Table contains: date, region, amount
Thought: Now I'll formulate SQL query.
Action: sql_select("SELECT region, SUM(amount) FROM sales GROUP BY region")
Observation: Region A: 100k, Region B: 150k
Thought: I'll analyze results and formulate conclusions.

Plan-and-Solve

For complex tasks (Lab 06 Incident), ReAct may "get lost" in details.

Architecture:

  1. Planner: First generate complete plan

    Plan:
    1. Check HTTP status
    2. Read logs
    3. Analyze errors
    4. Apply fix
    5. Verify
    

  2. Solver: Execute plan items in order

When to use Plan-and-Solve instead of ReAct?

  • Task is very complex (5+ steps)
  • Need guarantee all steps will be executed
  • Agent often "forgets" important steps
  • Task has clear structure (e.g., SOP for incidents)

Plan-and-Solve implementation:

func planAndSolve(ctx context.Context, client *openai.Client, task string) {
    // Step 1: Plan generation
    planPrompt := fmt.Sprintf(`Break task into steps:
Task: %s

Create action plan. Each step should be specific and executable.`, task)

    planResp, _ := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
        Model: "gpt-4o",
        Messages: []openai.ChatCompletionMessage{
            {Role: "system", Content: "You are a task planner. Create detailed plans."},
            {Role: "user", Content: planPrompt},
        },
    })

    plan := planResp.Choices[0].Message.Content
    fmt.Printf("Plan:\n%s\n", plan)

    // Step 2: Plan execution
    executionPrompt := fmt.Sprintf(`Execute plan step by step:
Plan:
%s

Execute steps in order. After each step, report result.`, plan)

    // Run agent with plan in context
    runAgentWithPlan(ctx, client, executionPrompt, plan)
}

3. Tree-of-Thoughts (ToT)

Agent considers several solution options and chooses the best one.

How it works:

  1. Agent generates several possible solution paths
  2. Agent evaluates each path
  3. Agent chooses best path
  4. Agent executes chosen path

Example (Data Analytics):

Task: "Why did sales drop in region X?"

Option 1: Check sales data directly
  - Pros: Fast
  - Cons: May not show cause

Option 2: Check sales data + marketing campaigns + competitors
  - Pros: More complete picture
  - Cons: Longer

Option 3: Check data quality first
  - Pros: Ensure data is correct
  - Cons: May be excessive

Agent chooses Option 2 (most complete)

When to use:

  • Task has several possible approaches
  • Need to choose optimal path
  • Solution efficiency is important

4. Self-Consistency

Agent generates several plans and chooses the most consistent one.

How it works:

  1. Agent generates N plans (e.g., 5)
  2. Agent finds common elements in all plans
  3. Agent creates final plan based on common elements

Example:

Plan 1: [A, B, C, D]
Plan 2: [A, B, E, F]
Plan 3: [A, C, D, G]
Plan 4: [A, B, C, H]
Plan 5: [A, B, D, I]

Common elements: A (all 5), B (in 4 of 5), C (in 2 of 5)
Final plan: [A, B, C, ...] (based on most frequent elements)

Task Decomposition

How to Properly Break Down Tasks?

Decomposition principles:

  1. Atomicity: Each step should be executable with one action

    • Bad: "Check and fix server"
    • Good: "Check server status" → "Read logs" → "Apply fix"
  2. Dependencies: Steps should execute in correct order

    • Bad: "Apply fix" → "Read logs"
    • Good: "Read logs" → "Analyze" → "Apply fix"
  3. Verifiability: Each step should have clear success criterion

    • Bad: "Improve performance"
    • Good: "Reduce CPU from 95% to 50%"

Decomposition example (Support):

Original task: "Process user ticket about login problem"

Decomposition:
1. Read ticket completely
   - Success criterion: All problem details obtained

2. Gather context
   - Success criterion: Software version, OS, browser known

3. Search knowledge base
   - Success criterion: Similar cases or solution found

4. Check account status
   - Success criterion: Status known (active/locked)

5. Formulate response
   - Success criterion: Response ready and contains solution

6. Send response or escalate
   - Success criterion: Ticket processed

Practical Planning Examples

Example 1: DevOps - Incident Investigation

// Task: "Service is unavailable. Investigate."

// Plan (generated by agent):
plan := []string{
    "1. Check HTTP status of service",
    "2. If not 200 — read logs",
    "3. Analyze errors",
    "4. Determine cause",
    "5. Apply fix",
    "6. Verify recovery",
}

// Execution:
for i, step := range plan {
    fmt.Printf("Executing step %d: %s\n", i+1, step)
    result := executeStep(step)
    if !result.Success {
        fmt.Printf("Step %d failed: %s\n", i+1, result.Error)
        // Agent may replan or escalate
        break
    }
}

Example 2: Data Analytics - Sales Analysis

// Task: "Why did sales drop in region X?"

// Plan:
plan := []string{
    "1. Check data quality (nulls, duplicates)",
    "2. Get sales data for last month",
    "3. Compare with previous period",
    "4. Check marketing campaigns",
    "5. Check competitors",
    "6. Analyze trends",
    "7. Generate report with conclusions",
}

Example 3: Security - Alert Triage

// Task: "Alert: suspicious activity on host 192.168.1.10"

// Plan:
plan := []string{
    "1. Determine alert severity",
    "2. Gather evidence (logs, metrics, traffic)",
    "3. Analyze attack patterns",
    "4. Determine scope (affected systems)",
    "5. Assess criticality",
    "6. Make decision (False Positive / True Positive)",
    "7. If True Positive — containment (with confirmation)",
    "8. Generate report for SOC",
}

Common Planning Errors

Error 1: Plan Too General

Bad:

Plan:
1. Figure out the problem
2. Fix it
3. Check

Good:

Plan:
1. Check HTTP status of service (check_http)
2. If 502 — read last 50 log lines (read_logs)
3. Find error keywords in logs
4. If "Syntax error" — perform rollback (rollback_deploy)
5. If "Connection refused" — restart service (restart_service)
6. Verify: check HTTP status again (check_http)

Error 2: Wrong Step Order

Bad:

1. Apply fix
2. Read logs
3. Check status

Good:

1. Check status
2. Read logs
3. Apply fix

Error 3: Skipping Important Steps

Bad:

1. Read logs
2. Apply fix
(Verification step skipped!)

Good:

1. Read logs
2. Apply fix
3. Verify result

Planning Checklist

  • Task broken into atomic steps
  • Steps execute in correct order
  • Each step has clear success criterion
  • Plan includes result verification
  • Correct planning level chosen (ReAct / Plan-and-Solve / Hierarchical)
  • Plan adapts to execution results (for ReAct)

Reflexion (Self-Correction)

Agents often make mistakes. Reflexion adds a criticism step.

Cycle: Act -> Observe -> Fail -> REFLECT -> Plan Again

Example:

Action: read_file("/etc/nginx/nginx.conf")
Observation: Permission denied
Reflection: "I tried to read file, but got Permission Denied. 
            This means I don't have permissions. Next time I should use sudo 
            or check permissions first."
Action: check_permissions("/etc/nginx/nginx.conf")
Observation: File is readable by root only
Action: read_file_sudo("/etc/nginx/nginx.conf")

Runtime (Execution Environment)

Runtime is the code that connects the LLM with tools.

Main Runtime functions:

  1. Parsing LLM responses: Determining if model wants to call a tool
  2. Executing tools: Calling real Go functions
  3. Managing history: Adding results to context
  4. Managing loop: Determining when to stop

Registry Pattern for Extensibility

To make an agent extensible, don't hardcode tool logic in main.go. Use a Registry pattern.

Problem without Registry:

  • Adding a new tool requires changes in dozens of places
  • Code becomes unreadable
  • Hard to test individual tools

Solution: Go Interfaces + Registry

Defining Tool Interface

type Tool interface {
    Name() string
    Description() string
    Parameters() json.RawMessage
    Execute(args json.RawMessage) (string, error)
}

Any tool (Proxmox, Ansible, SSH) must implement this interface.

Tool Implementation

type ProxmoxListVMsTool struct{}

func (t *ProxmoxListVMsTool) Name() string {
    return "list_vms"
}

func (t *ProxmoxListVMsTool) Description() string {
    return "List all VMs in the Proxmox cluster"
}

func (t *ProxmoxListVMsTool) Parameters() json.RawMessage {
    return json.RawMessage(`{
        "type": "object",
        "properties": {},
        "required": []
    }`)
}

func (t *ProxmoxListVMsTool) Execute(args json.RawMessage) (string, error) {
    // Real Proxmox API call logic
    return "VM-100 (Running), VM-101 (Stopped)", nil
}

Registry (Tool Registry)

Registry is a tool storage accessible by name.

type ToolRegistry struct {
    tools map[string]Tool
}

func NewToolRegistry() *ToolRegistry {
    return &ToolRegistry{
        tools: make(map[string]Tool),
    }
}

func (r *ToolRegistry) Register(tool Tool) {
    r.tools[tool.Name()] = tool
}

func (r *ToolRegistry) Get(name string) (Tool, bool) {
    tool, exists := r.tools[name]
    return tool, exists
}

func (r *ToolRegistry) ToOpenAITools() []openai.Tool {
    var result []openai.Tool
    for _, tool := range r.tools {
        result = append(result, openai.Tool{
            Type: openai.ToolTypeFunction,
            Function: &openai.FunctionDefinition{
                Name:        tool.Name(),
                Description: tool.Description(),
                Parameters:  tool.Parameters(),
            },
        })
    }
    return result
}

Using Registry

// Initialization
registry := NewToolRegistry()
registry.Register(&ProxmoxListVMsTool{})
registry.Register(&AnsibleRunPlaybookTool{})

// Get tool list for LLM
tools := registry.ToOpenAITools()

// Execute tool by name
toolCall := msg.ToolCalls[0]
if tool, exists := registry.Get(toolCall.Function.Name); exists {
    result, err := tool.Execute(json.RawMessage(toolCall.Function.Arguments))
    if err != nil {
        return fmt.Errorf("tool execution failed: %v", err)
    }
    // Add result to history
}

Registry advantages:

  • Adding new tool — just implement interface and register
  • Tool code is isolated and easily testable
  • Runtime doesn't know about specific tools, works through interface
  • Easy to add validation, logging, metrics at Registry level

Runtime example with Registry:

func runAgent(ctx context.Context, client *openai.Client, registry *ToolRegistry, userInput string) {
    messages := []openai.ChatCompletionMessage{
        {Role: "system", Content: systemPrompt},
        {Role: "user", Content: userInput},
    }

    tools := registry.ToOpenAITools()  // Get tool list from Registry

    for i := 0; i < maxIterations; i++ {
        resp, _ := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
            Model: "gpt-4o-mini",
            Messages: messages,
            Tools: tools,
        })

        msg := resp.Choices[0].Message
        messages = append(messages, msg)

        if len(msg.ToolCalls) == 0 {
            // Final response
            fmt.Println(msg.Content)
            break
        }

        // Execute tools through Registry
        for _, toolCall := range msg.ToolCalls {
            tool, exists := registry.Get(toolCall.Function.Name)
            if !exists {
                result := fmt.Sprintf("Error: Unknown tool %s", toolCall.Function.Name)
                messages = append(messages, openai.ChatCompletionMessage{
                    Role: "tool",
                    Content: result,
                    ToolCallID: toolCall.ID,
                })
                continue
            }

            result, err := tool.Execute(json.RawMessage(toolCall.Function.Arguments))
            if err != nil {
                result = fmt.Sprintf("Error: %v", err)
            }

            messages = append(messages, openai.ChatCompletionMessage{
                Role: "tool",
                Content: result,
                ToolCallID: toolCall.ID,
            })
        }
    }
}

Agent Data Flow Diagram

sequenceDiagram
    participant User
    participant Runtime
    participant LLM
    participant Tool

    User->>Runtime: "Check server status"
    Runtime->>Runtime: Collects messages[]:<br/>- System Prompt<br/>- User Input<br/>- History (if any)
    Runtime->>LLM: ChatCompletionRequest<br/>(messages, tools[])
    LLM->>LLM: Generates tool_call:<br/>{name: "check_status", args: {...}}
    LLM->>Runtime: Response (tool_calls)
    Runtime->>Runtime: Parses tool_call<br/>Validates name and arguments
    Runtime->>Tool: Executes checkStatus(hostname)
    Tool->>Runtime: "Server is ONLINE"
    Runtime->>Runtime: Adds result to messages[]:<br/>{role: "tool", content: "..."}
    Runtime->>LLM: ChatCompletionRequest<br/>(updated history)
    LLM->>LLM: Receives tool result<br/>Generates final response
    LLM->>Runtime: Response (content: "Server is running")
    Runtime->>User: "Server is running normally"

Key points:

  • LLM doesn't execute code, only generates JSON
  • Runtime manages entire loop and executes real functions
  • History (messages[]) is collected by Runtime and passed in each request
  • Tool results are added to history before next request

Config-based Agent Injection

In practice, a single Agent type serves different modes through Config:

type Config struct {
    MaxIter    int
    Prompt     string                         // system prompt is fixed at the start of the Run
    Tools      []Tool                         // tool set is fixed at the start of the Run
    Hooks      Hooks                          // AfterResponse, BeforeToolExec, AfterToolExec
    Condenser  Condenser                      // reaction to context overflow (see Ch. 13)
    OnDelta    func(delta string)             // streaming
    OnToolCall func(name, args string)        // tool call notification
}

The same Agent works as:

  • Main agent — full Config (a prompt with the full role, Hooks for skill selection, Condenser for overflow).
  • Subagent — minimal Config (a narrow prompt for the subtask, no Hooks, no Condenser).
  • Triage agent — narrow prompt + a trimmed Tools list, no callbacks.
  • ChatOnly — Tools = nil, simple dialogue.

Instead of inheritance or if-branching — behavior injection through Config. This makes each component independently unit-testable.

Why are Prompt and Tools fields, not func(state)? Because the system message and the tools[] block sit at the very start of the request and form the prefix that the provider caches. If you return a different prompt text or a different tool list on every iteration, the cache misses on every step, input cost goes up 5-10×, and latency grows. Dynamic data lives in tool_result and user messages — not in the system prefix. If behavior changes radically (a new role, a different tool set), that's a new Run, not a modification of the current one.

A subagent built this way doesn't need any kind of "Working Memory" — it has its own short history for its task. If the main agent wants to pass it context, it does so as text inside task, not via a shared structure.

Event System

The agent generates events as it works. UI, logging, and metrics subscribe to them independently:

type EventType string

const (
    EventDelta         EventType = "delta"       // text fragment
    EventToolCall      EventType = "tool_call"   // tool call
    EventToolResult    EventType = "tool_result"  // tool result
    EventDone          EventType = "done"         // completion
    EventPlanUpdate    EventType = "plan_update"  // plan update
    EventSubagentEvent EventType = "subagent"     // subagent event
)

type Broker[T any] struct {
    subscribers []chan T
    mu          sync.RWMutex
}

func (b *Broker[T]) Subscribe() <-chan T {
    ch := make(chan T, 64)
    b.mu.Lock()
    b.subscribers = append(b.subscribers, ch)
    b.mu.Unlock()
    return ch
}

func (b *Broker[T]) Publish(event T) {
    b.mu.RLock()
    defer b.mu.RUnlock()
    for _, ch := range b.subscribers {
        select {
        case ch <- event:
        default: // don't block if subscriber is slow
        }
    }
}

Fan-out pattern: a single event is delivered to all subscribers. UI shows streaming, logger writes to file, metrics count tool calls — all working in parallel.

Skills

An agent with a fixed System Prompt is limited. Skills let you dynamically change agent behavior depending on the task.

Agent Skills is an open format for extending agent capabilities with specialized knowledge and workflows. The standard is supported by Cursor, Claude Code, VS Code, GitHub, and other tools. At its core, a skill is a folder with a SKILL.md file. The agent loads its content into context on demand.

Why Skills?

Imagine: your DevOps agent works with Docker. Tomorrow a Kubernetes task comes in. Without skills, you'd rewrite the System Prompt and hardcode instructions.

With skills, you just load the right behavior module.

A Skill Is Not a Tool

A skill and a tool are different things. A tool performs a concrete action (calls API, reads file). A skill describes how to behave in a given situation.

Analogy:

  • Tool — hammer, saw, screwdriver (concrete actions)
  • Skill — instruction "how to assemble a cabinet" (sequence of actions and rules)

Skills — Magic vs Reality

Wrong (how it's usually explained):

Skills are AI modules the agent "learned" and applies automatically.

Reality (how it actually works):

A skill is a text file with instructions. The agent loads its content into context before the LLM request. The model simply receives additional text in the prompt. No training happens.

Agent Skills Format (SKILL.md)

The Agent Skills specification defines a standard file format. A skill is a directory with a SKILL.md file:

docker-debugging/
├── SKILL.md        # Required: metadata + instructions
├── scripts/        # Optional: executable code
├── references/     # Optional: additional documentation
└── assets/         # Optional: templates, schemas

The SKILL.md file contains YAML frontmatter and Markdown instructions:

---
name: docker-debugging
description: >
  Debug Docker containers — check status, logs, resources,
  restart loops. Use when user mentions Docker problems.
---

# Docker Debugging

## When to use this skill
Use when the user reports Docker container issues...

## Steps
1. Check container status (docker ps)
2. Read logs (docker logs)
3. Check resources (docker stats)
4. If container is restarting — check exit code
5. Never run docker rm -f without confirmation

Two fields are required: name (identifier, lowercase + hyphens) and description (when to use this skill). The Markdown body contains the actual instructions.

Progressive Disclosure

Skills use a three-level loading pattern to manage context efficiently:

  1. Discovery. At startup, the agent loads only name and description of each available skill. This costs ~100 tokens per skill.
  2. Activation. When the task matches a skill's description, the agent reads the full SKILL.md body into context.
  3. Execution. The agent follows the instructions, loading referenced files (scripts/, references/) only when needed.

This way, the agent stays fast while having access to detailed instructions on demand.

Implementing Skill Loading in Runtime

Below is one way to implement skill loading in your agent's Runtime. The Instruction field corresponds to the body of a SKILL.md file. Trigger-based matching shown here is a simplified approach — production implementations (Cursor, Claude Code) let the LLM itself decide which skill to activate based on description.

Skill definition

// SkillDefinition describes a skill
type SkillDefinition struct {
    Name        string   // Unique skill name
    Description string   // When to use this skill
    Triggers    []string // Keywords for automatic loading
    Instruction string   // Instruction text (added to prompt)
}

Skill Registry

Similar to the Tool Registry, skills are stored in a registry. The registry finds matching skills by keywords in the user's request.

// SkillRegistry is the agent's skill registry
type SkillRegistry struct {
    skills map[string]SkillDefinition
}

func NewSkillRegistry() *SkillRegistry {
    return &SkillRegistry{
        skills: make(map[string]SkillDefinition),
    }
}

func (r *SkillRegistry) Register(skill SkillDefinition) {
    r.skills[skill.Name] = skill
}

// FindByTrigger finds skills by keyword in user input
func (r *SkillRegistry) FindByTrigger(userInput string) []SkillDefinition {
    var matched []SkillDefinition
    for _, skill := range r.skills {
        for _, trigger := range skill.Triggers {
            if strings.Contains(strings.ToLower(userInput), trigger) {
                matched = append(matched, skill)
                break
            }
        }
    }
    return matched
}

// BuildPrompt assembles System Prompt with loaded skills
func (r *SkillRegistry) BuildPrompt(basePrompt string, skills []SkillDefinition) string {
    if len(skills) == 0 {
        return basePrompt
    }

    var sb strings.Builder
    sb.WriteString(basePrompt)
    sb.WriteString("\n\n## Loaded Skills\n\n")
    for _, skill := range skills {
        sb.WriteString("### " + skill.Name + "\n")
        sb.WriteString(skill.Instruction + "\n\n")
    }
    return sb.String()
}

Usage example

registry := NewSkillRegistry()

// Register skills
registry.Register(SkillDefinition{
    Name:        "docker-debugging",
    Description: "Docker container debugging",
    Triggers:    []string{"docker", "container"},
    Instruction: `When working with Docker:
1. First check container status (docker ps)
2. Read logs (docker logs)
3. Check resources (docker stats)
4. If container is restarting — check exit code
5. Never run docker rm -f without confirmation`,
})

registry.Register(SkillDefinition{
    Name:        "incident-response",
    Description: "Incident response",
    Triggers:    []string{"incident", "outage", "502", "500", "downtime"},
    Instruction: `During an incident:
1. Determine severity (P1/P2/P3)
2. Gather facts, don't guess
3. Check monitoring and logs
4. Apply minimal fix (rollback)
5. Verify recovery
6. Don't optimize during a fire`,
})

// Agent receives a request
userInput := "Docker container keeps restarting"

// Find matching skills
skills := registry.FindByTrigger(userInput)

// Build prompt with skills
prompt := registry.BuildPrompt(baseSystemPrompt, skills)
// prompt now contains base prompt + Docker debugging instructions

Skills extend agent behavior without code changes. A new skill is a new SKILL.md file, not a refactor.

For the full format specification, see Agent Skills Specification. Example skills are available in the GitHub repository.

Subagents

Sometimes a task is too complex for a single agent. Or the agent needs to handle a subtask without polluting the main context. For this, an agent can spawn a subagent.

When Subagent, When Tool?

Tool is the right choice when:

  • Action is simple and deterministic
  • Result is predictable
  • No "thinking" required

Subagent is the right choice when:

  • Subtask requires reasoning and multiple steps
  • Separate context is needed (to keep the main one clean)
  • Subtask is complex on its own

Example:

Task: "Deploy new service and set up monitoring"

Tool approach (bad for complex tasks):
  → deploy_service("my-app")     // Single action, no flexibility
  → setup_monitoring("my-app")   // Single action, no flexibility

Subagent approach (good for complex tasks):
  → Subagent 1: "Deploy service" (5-10 steps with reasoning)
  → Subagent 2: "Set up monitoring" (3-5 steps with reasoning)

Parent — Child Relationship

The parent agent creates a subagent with:

  • Separate System Prompt — specialized for the specific task
  • Its own tool set — only what the subtask needs
  • Its own context — doesn't inherit the parent's full history
  • Iteration limit — subagent can't run forever

When the subagent finishes, the result returns to the parent as plain text.

// SubagentConfig holds subagent configuration
type SubagentConfig struct {
    Name          string // Name for logging
    SystemPrompt  string // Instructions for subagent
    Tools         []Tool // Available tools
    MaxIterations int    // Iteration limit
}

// SpawnSubagent creates and runs a subagent to execute a subtask.
// The subagent runs its own loop and returns a text result.
func SpawnSubagent(
    ctx context.Context,
    client *openai.Client,
    config SubagentConfig,
    task string,
) (string, error) {
    messages := []openai.ChatCompletionMessage{
        {Role: "system", Content: config.SystemPrompt},
        {Role: "user", Content: task},
    }

    registry := NewToolRegistry()
    for _, tool := range config.Tools {
        registry.Register(tool)
    }

    // Subagent runs its own loop
    for i := 0; i < config.MaxIterations; i++ {
        resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
            Model:    "gpt-4o",
            Messages: messages,
            Tools:    registry.ToOpenAITools(),
        })
        if err != nil {
            return "", fmt.Errorf("subagent %s: %w", config.Name, err)
        }

        msg := resp.Choices[0].Message
        messages = append(messages, msg)

        // No tool_calls — subagent finished
        if len(msg.ToolCalls) == 0 {
            return msg.Content, nil
        }

        // Execute subagent's tools
        for _, tc := range msg.ToolCalls {
            tool, exists := registry.Get(tc.Function.Name)
            if !exists {
                messages = append(messages, openai.ChatCompletionMessage{
                    Role:       "tool",
                    Content:    fmt.Sprintf("Error: unknown tool %s", tc.Function.Name),
                    ToolCallID: tc.ID,
                })
                continue
            }
            result, err := tool.Execute(json.RawMessage(tc.Function.Arguments))
            if err != nil {
                result = fmt.Sprintf("Error: %v", err)
            }
            messages = append(messages, openai.ChatCompletionMessage{
                Role:       "tool",
                Content:    result,
                ToolCallID: tc.ID,
            })
        }
    }

    return "", fmt.Errorf("subagent %s: iteration limit exceeded (%d)", config.Name, config.MaxIterations)
}

Example: Parent Agent Spawns Subagents

// Parent agent received a complex task
task := "Deploy payment-api service and set up alerts"

// Subagent for deployment
deployResult, err := SpawnSubagent(ctx, client, SubagentConfig{
    Name:          "deploy-agent",
    SystemPrompt:  "You are a deployment specialist. Deploy the service properly.",
    Tools:         []Tool{&DeployTool{}, &HealthCheckTool{}},
    MaxIterations: 10,
}, "Deploy payment-api service to production")

if err != nil {
    log.Fatalf("Deploy failed: %v", err)
}

// Subagent for monitoring (receives context from first)
monitorResult, err := SpawnSubagent(ctx, client, SubagentConfig{
    Name:          "monitoring-agent",
    SystemPrompt:  "You are a monitoring specialist. Set up alerts.",
    Tools:         []Tool{&CreateAlertTool{}, &ListMetricsTool{}},
    MaxIterations: 5,
}, fmt.Sprintf("Set up alerts for payment-api. Deploy context: %s", deployResult))

Subagents are a special case of multi-agent systems. For more on coordinating multiple agents, delegation patterns, and orchestration, see Chapter 07: Multi-Agent.

Checkpoint and Resume

Long-running agents work for minutes and hours. During that time, the network can drop, an API limit can hit, or the process can restart. Without a state-saving mechanism, all work is lost.

Why This Matters

Imagine: an agent analyzes 50 servers. On server 47, the API connection drops. Without checkpoint, the agent starts over — 47 API calls wasted. With checkpoint, it continues from server 48.

Saving Strategies

Per-iteration — save after each loop iteration:

  • Simple to implement
  • May be excessive for short tasks
  • Good for long tasks with predictable step count

Per-tool-call — save after each tool call:

  • More granular control
  • Useful when tool calls are expensive or slow
  • Avoids repeating already-executed calls

Checkpoint — Magic vs Reality

Wrong (how it's usually explained):

The agent automatically remembers its state and continues from where it stopped.

Reality (how it actually works):

Agent state is a messages[] array plus metadata (iteration number, status). You serialize them to JSON and save to disk. On restart, you load and continue the loop from the same place.

Checkpoint structure

// Checkpoint is a snapshot of agent state
type Checkpoint struct {
    ID        string                          `json:"id"`
    CreatedAt time.Time                       `json:"created_at"`
    Iteration int                             `json:"iteration"`
    Messages  []openai.ChatCompletionMessage  `json:"messages"`
    TaskID    string                          `json:"task_id"`
    Status    string                          `json:"status"` // "in_progress", "completed", "failed"
}

Save and load

// SaveCheckpoint saves agent state to disk
func SaveCheckpoint(dir string, cp Checkpoint) error {
    data, err := json.MarshalIndent(cp, "", "  ")
    if err != nil {
        return fmt.Errorf("marshal checkpoint: %w", err)
    }

    path := filepath.Join(dir, cp.TaskID+".json")
    return os.WriteFile(path, data, 0644)
}

// LoadCheckpoint loads the last checkpoint for a task.
// Returns nil, nil if no checkpoint found — agent starts from scratch.
func LoadCheckpoint(dir, taskID string) (*Checkpoint, error) {
    path := filepath.Join(dir, taskID+".json")

    data, err := os.ReadFile(path)
    if err != nil {
        if os.IsNotExist(err) {
            return nil, nil // No checkpoint — start from scratch
        }
        return nil, fmt.Errorf("read checkpoint: %w", err)
    }

    var cp Checkpoint
    if err := json.Unmarshal(data, &cp); err != nil {
        return nil, fmt.Errorf("unmarshal checkpoint: %w", err)
    }
    return &cp, nil
}

Agent with checkpoint support

func runAgentWithCheckpoints(
    ctx context.Context,
    client *openai.Client,
    registry *ToolRegistry,
    taskID, userInput string,
) error {
    checkpointDir := "./checkpoints"

    // Try to restore from checkpoint
    cp, err := LoadCheckpoint(checkpointDir, taskID)
    if err != nil {
        return err
    }

    var messages []openai.ChatCompletionMessage
    startIteration := 0

    if cp != nil && cp.Status == "in_progress" {
        // Restoring — take history and iteration number from checkpoint
        messages = cp.Messages
        startIteration = cp.Iteration
        log.Printf("Restored checkpoint: iteration %d", startIteration)
    } else {
        // No checkpoint found — start from scratch
        messages = []openai.ChatCompletionMessage{
            {Role: "system", Content: systemPrompt},
            {Role: "user", Content: userInput},
        }
    }

    for i := startIteration; i < maxIterations; i++ {
        resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
            Model:    "gpt-4o",
            Messages: messages,
            Tools:    registry.ToOpenAITools(),
        })
        if err != nil {
            // API error — save checkpoint and exit
            SaveCheckpoint(checkpointDir, Checkpoint{
                TaskID:    taskID,
                Iteration: i,
                Messages:  messages,
                Status:    "in_progress",
                CreatedAt: time.Now(),
            })
            return fmt.Errorf("API error (checkpoint saved at iteration %d): %w", i, err)
        }

        msg := resp.Choices[0].Message
        messages = append(messages, msg)

        if len(msg.ToolCalls) == 0 {
            // Task completed — save final checkpoint
            SaveCheckpoint(checkpointDir, Checkpoint{
                TaskID:    taskID,
                Iteration: i,
                Messages:  messages,
                Status:    "completed",
                CreatedAt: time.Now(),
            })
            return nil
        }

        // Execute tools
        for _, tc := range msg.ToolCalls {
            tool, _ := registry.Get(tc.Function.Name)
            result, _ := tool.Execute(json.RawMessage(tc.Function.Arguments))
            messages = append(messages, openai.ChatCompletionMessage{
                Role:       "tool",
                Content:    result,
                ToolCallID: tc.ID,
            })
        }

        // Save checkpoint after each iteration
        SaveCheckpoint(checkpointDir, Checkpoint{
            TaskID:    taskID,
            Iteration: i + 1,
            Messages:  messages,
            Status:    "in_progress",
            CreatedAt: time.Now(),
        })
    }

    return nil
}

Checkpoint is insurance. For short tasks (2-3 iterations), it's overkill. For long tasks (10+ iterations) or expensive API calls, it's essential.

Common Errors

Error 1: History Overflow

Symptom: Agent "forgets" the beginning of the conversation. After N messages it stops remembering task context; or the provider returns context_length_exceeded.

Cause: Dialogue history exceeds the model's context window. Old messages are "pushed out" of context, or the request is rejected outright.

Solution — condense (see Ch. 13):

// BAD: blindly truncating the middle — risks breaking tool_call ↔ tool_result pairs,
// returning a provider validation error, and losing important context.
if len(messages) > maxHistoryLength {
    messages = append(
        []openai.ChatCompletionMessage{messages[0]},
        messages[len(messages)-maxHistoryLength+1:]...,
    )
}

// BAD: reordering messages "by importance" — kills prompt cache
// and almost always produces an "orphan tool_result" error.

// GOOD: condense — ask the LLM to summarize the old part of the history
// and replace it with a single user message. System stays. The last N messages
// (with intact tool_call/tool_result pairs) also stay. Limit: one condense per Run.
messages = condense(ctx, messages, keepLastN)

The trigger for condense is usage.PromptTokens >= 0.80 * contextWindow or a context_length_exceeded error. Don't use your own token counter for the trigger — it drifts. Details in Ch. 13.

Error 2: Agent Loops

Symptom: Agent repeats the same action infinitely.

Cause: No iteration limit and no detection of repeating actions.

Solution:

// GOOD: Iteration limit + stuck detection
for i := 0; i < maxIterations; i++ {
    // ...

    // Stuck detection
    if lastNActionsAreSame(history, 3) {
        break
    }
}

Error 3: Tool Result Not Added to History

Symptom: Agent doesn't see tool result and continues performing the same action.

Cause: Tool execution result is not added to messages[].

Solution:

// BAD: Result not added
result := executeTool(toolCall)
// History not updated!

// GOOD: Result added to history
result := executeTool(toolCall)
messages = append(messages, openai.ChatCompletionMessage{
    Role:       openai.ChatMessageRoleTool,
    Content:    result,
    ToolCallID: toolCall.ID,
})

Error 4: Monolithic Run()

Symptom: Cannot unit-test agent components. The only way to verify routing is to run the entire Run() with a mock provider.

Cause: The Run() function combines 6 responsibilities: prompt assembly, routing, error recovery, tool execution, plan tracking, event publishing. In practice, this easily grows to 240 lines with nested if-branches.

Solution: Separate responsibilities through injection:

// BAD: everything in one function
func (a *Agent) Run(ctx context.Context, msg string) (string, error) {
    // 240 lines: routing + prompt + tools + recovery + events + plan
}

// GOOD: injection through Config
type Config struct {
    PromptFunc func(state LoopState) string
    ToolFilter func(iter int) []Tool
    Hooks      Hooks
    Condenser  Condenser
}

Error 5: Subagent Hardcode in executeTools

Symptom: Adding a new type of special tool requires modifying the shared tool execution function.

Cause: Inside executeTools() there's a check if tc.Name == "subagent" with special logic (plan step tracking, progress forwarding). Each "special" tool adds another if.

Solution: Use Hooks:

// BAD: hardcode in shared function
func executeTools(calls []ToolCall) {
    for _, tc := range calls {
        if tc.Name == "subagent" {
            // 20 lines of special logic
        }
        result := registry.Execute(tc)
    }
}

// GOOD: Hooks
type Hooks struct {
    BeforeToolExec func(name string, args string)
    AfterToolExec  func(name string, args string, result string)
}

Mini-Exercises

Exercise 1: Pre-send token estimate

Implement an approximate token estimator over the message history — for metrics and for deciding "is it time to trigger condense early":

func estimateTokens(messages []openai.ChatCompletionMessage) int {
    // Approximate via tiktoken (or any fast tokenizer).
    // Account for: Content, ToolCalls, ToolCallID, Role overhead.
    // In production this is only an estimate — the exact number always comes
    // from usage.PromptTokens.
}

Important rule (see Ch. 13): in a real agent loop the primary source of token counts is the usage.PromptTokens field from the provider's response. Your own counter is good only for an estimate before the very first request, or for cheap metrics. You cannot use it as the condense trigger — it systematically misses by 5-30% (especially with tool_calls and multilingual content).

Expected result:

  • The function returns an estimate within ±10% of usage.PromptTokens on a typical history.
  • Accounts for all message types (system, user, assistant, tool).
  • In agent code it is called only before the first request; after the first response, the condense trigger is computed from usage.PromptTokens.

Exercise 2: Implement condense

Note: the theory is in Ch. 13: Context Engineering. Here is the skeleton for self-implementation.

Implement condense — the single history-compression operation the agent applies on context overflow:

// condense replaces the "middle" of the history with a single user message
// containing a summary, by calling the LLM with a request to summarize the
// passed fragment.
//
// Guarantees:
//   - the first message (system) is preserved as-is;
//   - the last keepLastN messages are preserved as-is, with intact
//     tool_call ↔ tool_result pairs (no "orphan tool_result");
//   - everything between them is replaced with a single role="user" message
//     whose content is "Context of previous work:\n\n" + summary.
//
// Per-Run limit — 1 condense call; a repeated overflow right after a condense
// means the task is too large for a single Run.
func condense(
    ctx context.Context,
    client *openai.Client,
    messages []openai.ChatCompletionMessage,
    keepLastN int,
) ([]openai.ChatCompletionMessage, error) {
    // 1) if len(messages) < 2 + keepLastN — nothing to compress, return as-is
    // 2) split off head (without system and without the last keepLastN) and tail
    // 3) call the LLM with a prompt "summarize head into N tokens"
    // 4) assemble [system, summary-as-user, ...tail] and return
}

Expected result:

  • system always remains first.
  • The last keepLastN messages are preserved in full.
  • Between them — exactly one user message with the summary.
  • No "orphan tool_result": if a tool_result ends up in tail without its matching tool_call, either pull the pair into tail together, or drop both.
  • No reordering, no "importance scoring", no mutation of system.

Completion Criteria / Checklist

Done:

  • Short-term memory is a linear immutable history (messages[]).
  • System prompt and tool list are fixed at the start of the Run and don't change between iterations.
  • The condense trigger is computed from usage.PromptTokens in the provider's response (your own counter — only for a pre-send estimate).
  • On overflow, condense is applied with a 1-per-Run limit; tool_call ↔ tool_result pairs are not torn apart.
  • Long-term memory (RAG) is set up if the task needs cross-session knowledge.
  • Planning (ReAct/Plan-and-Solve) is implemented.
  • Runtime correctly parses LLM responses, executes tools, manages the loop.
  • There's protection against loops (iteration limit + repeat detection).
  • Subagent is a standalone Agent with its own Config — not a "shared structure with shared memory".

Not done (anti-patterns):

  • History overflows and there's no reaction (neither condense nor a hard stop).
  • Messages are reordered "by importance" ("important" ones promoted, "unimportant" dropped) — prompt cache is killed, tool pairs are broken.
  • System prompt is mutated between iterations (live state, facts, plan progress are stitched into it) — prompt cache is killed.
  • Tool list changes on every iteration without a strong reason — same problem.
  • The condense trigger or the budget cap is computed from your own len(content)/3 or tiktoken counter instead of usage.PromptTokens.
  • The agent loops (no iteration limit).
  • Tool results are not added to history.
  • Monolithic Run() with no separation of concerns.
  • Hardcoding special tools inside the shared executeTools — every "special" tool adds another if.

Connection with Other Chapters

What's Next?

After studying architecture, proceed to: