04. Autonomy and Loops — ReAct Loop¶
Why This Chapter?¶
In this chapter, we implement the ReAct (Reason + Act) pattern — the heart of an autonomous agent.
Without an autonomous loop, an agent works like a chatbot: one request → one response. With an autonomous loop, an agent can perform 10 actions in a row to solve one task, independently making decisions based on the results of previous actions.
Real-World Case Study¶
Situation: A user writes: "I'm out of disk space on the server. Fix it."
Without autonomous loop:
- Agent: "I will check disk usage" → calls
check_disk→ gets "95%" - Agent: [Stops, waits for next user command]
With autonomous loop:
- Agent: "I will check disk usage" → calls
check_disk→ gets "95%" - Agent: "Disk is full. I'll clean logs" → calls
clean_logs→ gets "Freed 20GB" - Agent: "Checking again" → calls
check_disk→ gets "40%" - Agent: "Done! Freed 20GB."
Difference: The agent decides what to do next based on the results of previous actions.
Theory in Simple Terms¶
ReAct Loop (Autonomy Cycle)¶
ReAct is an acronym for Reason + Act. This is a pattern where the agent:
- Reason: Analyzes the situation and decides what to do
- Act: Performs an action (calls a tool)
- Observe: Receives the result of the action
- Repeat: Reasons again based on the result
There's no magic here — it's simply a loop where the model processes the results of previous actions in context and generates the next step.
An autonomous agent works in a loop:
While (Task not solved):
1. Send history to LLM
2. Get response
3. IF it's text → Show user and wait for new input
4. IF it's a tool call →
a. Execute tool
b. Add result to history
c. GOTO 1 (without asking the user!)
Takeaway: Point 4.c provides the "magic" — the agent looks at the result and decides what to do next. But it's not real magic: the model receives the tool result in context (messages[]) and generates the next step based on that context.
Closing the Loop¶
After executing a tool, don't ask the user what to do next. Instead, send the result back to the LLM. The model receives the result of its actions and decides what to do next.
Example dialogue in memory:
Magic vs Reality: How the Loop Works¶
Magic (how it's usually explained):
The agent decided to call
clean_logs()after checking the disk
Reality (how it actually works):
Iteration 1: First Request
// messages before first iteration
messages := []openai.ChatCompletionMessage{
{Role: "system", Content: "You are an autonomous DevOps agent."},
{Role: "user", Content: "Out of disk space."},
}
// Send to model
resp1, _ := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "gpt-4o-mini",
Messages: messages,
Tools: tools,
})
msg1 := resp1.Choices[0].Message
// msg1.ToolCalls = [{ID: "call_1", Function: {Name: "check_disk_usage", Arguments: "{}"}}]
// Add assistant response to history
messages = append(messages, msg1)
// Now messages contains:
// [system, user, assistant(tool_call: check_disk_usage)]
Iteration 2: Tool Execution and Result Return
// Execute tool
result1 := checkDiskUsage() // "95% usage"
// Add result as a message with role "tool"
messages = append(messages, openai.ChatCompletionMessage{
Role: "tool",
Content: result1, // "95% usage"
ToolCallID: "call_1",
})
// Now messages contains:
// [system, user, assistant(tool_call), tool("95% usage")]
// Send UPDATED history to model again
resp2, _ := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "gpt-4o-mini",
Messages: messages, // Model receives check_disk_usage result!
Tools: tools,
})
msg2 := resp2.Choices[0].Message
// msg2.ToolCalls = [{ID: "call_2", Function: {Name: "clean_logs", Arguments: "{}"}}]
messages = append(messages, msg2)
// Now messages contains:
// [system, user, assistant(tool_call_1), tool("95%"), assistant(tool_call_2)]
Iteration 3: Second Tool
// Execute second tool
result2 := cleanLogs() // "Freed 20GB"
messages = append(messages, openai.ChatCompletionMessage{
Role: "tool",
Content: result2, // "Freed 20GB"
ToolCallID: "call_2",
})
// Now messages contains:
// [system, user, assistant(tool_call_1), tool("95%"), assistant(tool_call_2), tool("Freed 20GB")]
// Send again
resp3, _ := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "gpt-4o-mini",
Messages: messages, // Model receives both results!
Tools: tools,
})
msg3 := resp3.Choices[0].Message
// msg3.ToolCalls = [] // Empty! Model decided to respond with text
// msg3.Content = "I cleaned the logs, now there's enough space."
// This is the final response - exit loop
Why this isn't magic:
- The model receives the full history — it doesn't "remember" the past, it processes it in
messages[] - The model receives the tool result — the result is added as a new message with role
tool - The model decides based on context — seeing "95% usage", the model understands that space needs to be freed
- Runtime manages the loop — code checks
len(msg.ToolCalls)and decides whether to continue the loop
Takeaway: The model didn't "decide on its own" — it saw the check_disk_usage result in context and generated the next tool call based on that context.
Visualization: Who Does What?¶
┌─────────────────────────────────────────────────────────┐
│ LLM (Model) │
│ │
│ 1. Receives in context: │
│ - System Prompt: "You are a DevOps agent" │
│ - User Input: "Out of disk space" │
│ - Tools Schema: [{name: "check_disk", ...}] │
│ │
│ 2. Generates tool_call: │
│ {name: "check_disk_usage", arguments: "{}"} │
│ │
│ 3. Does NOT execute code! Only generates JSON. │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ Runtime (Your Go code) │
│ │
│ 1. Receives tool_call from model response │
│ 2. Validates: does the tool exist? │
│ 3. Executes: checkDiskUsage() → "95% usage" │
│ 4. Adds result to messages[]: │
│ {role: "tool", content: "95% usage"} │
│ 5. Sends updated history back to LLM │
└─────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────┐
│ LLM (Model) - next iteration │
│ │
│ 1. Receives in context: │
│ - Previous tool_call │
│ - Result: "95% usage" ← Runtime added! │
│ │
│ 2. Generates next tool_call: │
│ {name: "clean_logs", arguments: "{}"} │
│ │
│ 3. Loop repeats... │
└─────────────────────────────────────────────────────────┘
Takeaway: The LLM doesn't "remember" the past. It processes it in messages[], which Runtime collects.
Loop Implementation¶
for i := 0; i < maxIterations; i++ {
// 1. Send request
resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "gpt-4o-mini",
Messages: messages,
Tools: tools,
})
msg := resp.Choices[0].Message
messages = append(messages, msg) // Save response
// 2. Check response type
if len(msg.ToolCalls) == 0 {
// This is the final text response
fmt.Println("Agent:", msg.Content)
break
}
// 3. Execute tools
for _, toolCall := range msg.ToolCalls {
result := executeTool(toolCall.Function.Name, toolCall.Function.Arguments)
// 4. Add result to history
messages = append(messages, openai.ChatCompletionMessage{
Role: openai.ChatMessageRoleTool,
Content: result,
ToolCallID: toolCall.ID,
})
}
// Loop continues automatically!
// But this isn't magic: send the updated history (with tool result)
// to the model again, and the model receives the result and decides what to do next
}
Error Handling in the Loop¶
Important: Don't forget to handle errors and add them to history! If a tool fails, the LLM should know and try something else.
Proper error handling:
for _, toolCall := range msg.ToolCalls {
result, err := executeTool(toolCall.Function.Name, toolCall.Function.Arguments)
if err != nil {
// Error is also a result! Add it to history
result = fmt.Sprintf("Error: %v", err)
}
// Add result (or error) to history
messages = append(messages, openai.ChatCompletionMessage{
Role: openai.ChatMessageRoleTool,
Content: result, // Model will see the error!
ToolCallID: toolCall.ID,
})
}
What happens:
- Tool returns an error:
Error: connection refused - Error is added to history as tool result
- Model receives the error in context
- Model can:
- Try another tool
- Inform the user about the problem
- Escalate the issue
Example:
Iteration 1:
Action: check_database_status("prod")
Observation: Error: connection refused
Iteration 2 (model receives error):
Thought: "Database is unavailable. I'll check network connectivity"
Action: ping_host("db-prod.example.com")
Observation: "Host is unreachable"
Iteration 3:
Thought: "Network is unavailable. I'll inform the user about the problem"
Action: [Final response] "Database is unavailable. Check network connectivity."
Anti-pattern: Don't hide errors from the model!
// BAD: Hide error
if err != nil {
log.Printf("Error: %v", err) // Only to log
continue // Skip tool
}
// GOOD: Show error to model
if err != nil {
result := fmt.Sprintf("Error: %v", err)
messages = append(messages, ...) // Add to history
}
Parallel Tool Calls¶
The model can return multiple tool calls in a single iteration. For example, a user asks "Check the status of nginx and postgresql" — the model returns two tool_calls in one response.
What This Looks Like¶
msg := resp.Choices[0].Message
// msg.ToolCalls can contain multiple calls:
// [
// {ID: "call_1", Function: {Name: "check_status", Arguments: `{"service":"nginx"}`}},
// {ID: "call_2", Function: {Name: "check_status", Arguments: `{"service":"postgresql"}`}},
// ]
Sequential vs Parallel Execution¶
By default, runtime executes tools sequentially — one after another in a range msg.ToolCalls loop. This is simple and safe.
But if the tools are independent (don't read/write shared data), you can execute them in parallel:
for _, toolCall := range msg.ToolCalls {
if !areIndependent(toolCall, msg.ToolCalls) {
// Dependent tools — execute sequentially
result := executeTool(toolCall)
messages = append(messages, makeToolMessage(toolCall.ID, result))
continue
}
// Independent tools — run in parallel
go func(tc openai.ToolCall) {
result := executeTool(tc)
resultsCh <- toolResult{ID: tc.ID, Content: result}
}(toolCall)
}
Simple implementation with sync.WaitGroup:
type toolResult struct {
ID string
Content string
}
func executeToolsParallel(toolCalls []openai.ToolCall) []toolResult {
results := make([]toolResult, len(toolCalls))
var wg sync.WaitGroup
for i, tc := range toolCalls {
wg.Add(1)
go func(idx int, call openai.ToolCall) {
defer wg.Done()
result, err := executeTool(call.Function.Name, call.Function.Arguments)
if err != nil {
result = fmt.Sprintf("Error: %v", err)
}
results[idx] = toolResult{ID: call.ID, Content: result}
}(i, tc)
}
wg.Wait()
return results
}
When parallel, when sequential:
| Situation | Strategy |
|---|---|
Two check_status for different services |
Parallel — independent |
read_file → parse_json |
Sequential — second depends on first |
| Prototype / MVP | Sequential — easier to debug |
Multi-Model Agent Loop¶
In Chapter 03 we introduced ModelSelector. Now let's apply it to the agent loop.
Why Switch Models Inside the Loop?¶
Different iterations involve different work:
- Iterations with tool calls: The model picks a tool and generates arguments. This is a classification task — a cheap model handles it fine.
- Final iteration: The model analyzes all results and formulates a response. This is a reasoning task — needs a powerful model.
- Iterations after an error: The model decides what to do about the error. This requires reasoning — a powerful model is better.
Implementation¶
type MultiModelLoop struct {
fastModel string // For tool selection: cheap and fast
smartModel string // For analysis and reasoning: powerful
}
func (l *MultiModelLoop) Run(ctx context.Context, client *openai.Client,
messages []openai.ChatCompletionMessage, tools []openai.Tool, maxIter int,
) (string, error) {
lastHadError := false
for i := 0; i < maxIter; i++ {
// Pick model for the current iteration
model := l.fastModel
if lastHadError {
model = l.smartModel // After an error, reasoning is needed
}
resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: model,
Messages: messages,
Tools: tools,
})
if err != nil {
return "", fmt.Errorf("iteration %d: %w", i, err)
}
msg := resp.Choices[0].Message
messages = append(messages, msg)
// Final response — re-ask with the powerful model if the cheap one answered
if len(msg.ToolCalls) == 0 {
if model == l.fastModel && i > 0 {
// The cheap model gave a final answer.
// Optionally re-ask the powerful model for better quality.
return l.refineAnswer(ctx, client, messages)
}
return msg.Content, nil
}
// Execute tools
lastHadError = false
for _, tc := range msg.ToolCalls {
result, execErr := executeTool(tc.Function.Name, tc.Function.Arguments)
if execErr != nil {
result = fmt.Sprintf("Error: %v", execErr)
lastHadError = true
}
messages = append(messages, openai.ChatCompletionMessage{
Role: openai.ChatMessageRoleTool,
Content: result,
ToolCallID: tc.ID,
})
}
}
return "", fmt.Errorf("exceeded %d iterations", maxIter)
}
Key point: The refineAnswer method is optional. It's an optimization: if the cheap model gathered all data through tools, the powerful model can formulate a better final answer.
End-to-End Example: Multi-Model Flow over 5 Iterations¶
The ModelSelector and MultiModelLoop abstractions may look clear in code, but it's hard to see what actually happens at runtime. Let's walk through a concrete scenario step by step.
Scenario¶
The user writes: "Server web-03 has high CPU. Investigate."
The agent has three tools:
tools := []openai.Tool{
{Function: &openai.FunctionDefinition{
Name: "check_cpu",
Description: "Check CPU usage of a server",
Parameters: json.RawMessage(`{"type":"object","properties":{"host":{"type":"string"}},"required":["host"]}`),
}},
{Function: &openai.FunctionDefinition{
Name: "list_top_processes",
Description: "List top CPU-consuming processes on a server",
Parameters: json.RawMessage(`{"type":"object","properties":{"host":{"type":"string"},"limit":{"type":"integer"}},"required":["host"]}`),
}},
{Function: &openai.FunctionDefinition{
Name: "restart_service",
Description: "Restart a systemd service on a server. CAUTION: causes brief downtime.",
Parameters: json.RawMessage(`{"type":"object","properties":{"host":{"type":"string"},"service":{"type":"string"}},"required":["host","service"]}`),
}},
}
Model selection strategy:
- gpt-4o-mini — for iterations where the previous response contained tool calls (simple task: pick a tool)
- gpt-4o — when a decision based on collected data is needed (analysis, final response)
Flow Diagram¶
sequenceDiagram
participant U as User
participant R as Runtime
participant Mini as gpt-4o-mini
participant Pro as gpt-4o
U->>R: "Server web-03 has high CPU. Investigate."
Note over R,Mini: Iteration 1: tool selection
R->>Mini: messages[system, user]
Mini-->>R: tool_call: check_cpu("web-03")
R->>R: exec check_cpu → "CPU: 94%"
Note over R,Mini: Iteration 2: tool selection
R->>Mini: messages[..., tool_call, tool_result]
Mini-->>R: tool_call: list_top_processes("web-03")
R->>R: exec → "nginx-worker: 82%"
Note over R,Pro: Iteration 3: analysis + decision
R->>Pro: messages[..., tool_call, tool_result]
Pro-->>R: tool_call: restart_service("web-03", "nginx")
R->>R: exec restart → "Service restarted"
Note over R,Pro: Iteration 4: final response
R->>Pro: messages[..., tool_call, tool_result]
Pro-->>R: text: "Issue resolved..."
R->>U: "CPU on web-03 was 94%..."
Step-by-Step Walkthrough¶
Iteration 1 — gpt-4o-mini — first tool selection
The model receives:
messages = [
{role: "system", content: "You are a DevOps agent. Diagnose and fix issues."},
{role: "user", content: "Server web-03 has high CPU. Investigate."}
]
The model returns:
Why gpt-4o-mini? This is the first call. The task is to pick from 3 tools the one matching "high CPU". That's classification, not reasoning. A cheap model handles it fine.
Runtime executes check_cpu("web-03") → "CPU usage: 94%, load average: 12.3"
Iteration 2 — gpt-4o-mini — second tool selection
The model receives (messages grew):
messages = [
{role: "system", content: "You are a DevOps agent..."},
{role: "user", content: "Server web-03 has high CPU. Investigate."},
{role: "assistant", tool_calls: [{name: "check_cpu", args: {host: "web-03"}}]},
{role: "tool", content: "CPU usage: 94%, load average: 12.3", tool_call_id: "call_1"}
]
The model returns:
{"tool_calls": [{"function": {"name": "list_top_processes", "arguments": "{\"host\":\"web-03\",\"limit\":5}"}}]}
Why gpt-4o-mini? The previous iteration returned a tool call. The model sees "CPU 94%" and needs to pick the next tool. This is still classification: "which tool helps understand what's consuming CPU?" → list_top_processes.
Runtime executes list_top_processes("web-03", 5):
PID COMMAND CPU%
18234 nginx-worker 82.1%
18235 nginx-worker 4.3%
892 postgres 3.2%
451 node 2.1%
1 systemd 0.1%
Iteration 3 — gpt-4o — making a decision
The model receives (messages contains 6 elements):
messages = [
system, user,
assistant(tool_call: check_cpu), tool("CPU: 94%"),
assistant(tool_call: list_top), tool("nginx-worker: 82.1%...")
]
The model returns:
{"tool_calls": [{"function": {"name": "restart_service", "arguments": "{\"host\":\"web-03\",\"service\":\"nginx\"}"}}]}
Why gpt-4o? This is where reasoning is needed. The model must:
- Understand that nginx-worker consumes 82% CPU (not just pick a tool)
- Decide the right action is restarting nginx (not postgres or node)
- Assess the restart risk (CAUTION in the tool description)
A cheap model might get it wrong: restart the wrong service or keep gathering information instead of acting.
Runtime executes restart_service("web-03", "nginx") → "Service nginx restarted successfully. PID: 19012"
Iteration 4 — gpt-4o — final response
The model receives (messages contains 8 elements):
messages = [
system, user,
assistant(check_cpu), tool("CPU: 94%"),
assistant(list_top), tool("nginx-worker: 82.1%"),
assistant(restart_nginx), tool("Service nginx restarted. PID: 19012")
]
The model returns text (no tool_calls):
CPU on web-03 was at 94%. Main consumer: nginx-worker process (82.1% CPU).
Restarted nginx. New PID: 19012.
Recommendations:
- Check worker_processes configuration in nginx.conf
- Set up CPU monitoring with an alert at 80%
Why gpt-4o? The final response requires synthesizing information from all previous steps and formulating recommendations. The powerful model does this better.
messages[] Evolution¶
| After iteration | Elements in messages[] | What was added |
|---|---|---|
| — | 2 | system + user |
| 1 | 4 | + assistant(tool_call: check_cpu) + tool("CPU: 94%") |
| 2 | 6 | + assistant(tool_call: list_top) + tool("nginx: 82%") |
| 3 | 8 | + assistant(tool_call: restart) + tool("restarted") |
| 4 | 9 | + assistant("CPU on web-03 was at 94%...") |
Each iteration adds 2 messages (assistant + tool), the last one adds only 1 (just assistant with text).
Cost Calculation¶
Single model (gpt-4o for all iterations):
| Iteration | Model | Input tokens | Output tokens | Cost |
|---|---|---|---|---|
| 1 | gpt-4o | ~300 | ~30 | $0.0040 |
| 2 | gpt-4o | ~450 | ~40 | $0.0060 |
| 3 | gpt-4o | ~650 | ~35 | $0.0085 |
| 4 | gpt-4o | ~800 | ~120 | $0.0120 |
| Total: | $0.0305 |
Multi-model strategy:
| Iteration | Model | Input tokens | Output tokens | Cost |
|---|---|---|---|---|
| 1 | gpt-4o-mini | ~300 | ~30 | $0.0003 |
| 2 | gpt-4o-mini | ~450 | ~40 | $0.0004 |
| 3 | gpt-4o | ~650 | ~35 | $0.0085 |
| 4 | gpt-4o | ~800 | ~120 | $0.0120 |
| Total: | $0.0212 |
Savings: 30% in this example. At 10,000 tasks per day: $305 vs $212 → \(93/day → **\)2,800/month**.
For tasks with more tool calls (6–8 iterations instead of 4), savings reach 50–60%.
When Does Runtime Switch the Model?¶
Model selection logic in this example:
func selectModel(iteration int, lastHadToolCalls bool, lastHadError bool) string {
// After an error — powerful model (reasoning needed)
if lastHadError {
return "gpt-4o"
}
// First 2 iterations with tool calls — cheap model
// Starting from iteration 3 — powerful (enough context accumulated for a decision)
if lastHadToolCalls && iteration < 3 {
return "gpt-4o-mini"
}
// Everything else — powerful model
return "gpt-4o"
}
Alternative strategy — simpler, but less precise:
func selectModelSimple(lastHadToolCalls bool) string {
if lastHadToolCalls {
return "gpt-4o-mini" // Still gathering data
}
return "gpt-4o" // Formulating response
}
Problem with the simple strategy: iteration 3 (where a restart decision is needed) will use the cheap model. It might get it wrong.
Full Working Code¶
package main
import (
"context"
"encoding/json"
"fmt"
"os"
"strings"
"github.com/sashabaranov/go-openai"
)
func main() {
config := openai.DefaultConfig(os.Getenv("OPENAI_API_KEY"))
if baseURL := os.Getenv("OPENAI_BASE_URL"); baseURL != "" {
config.BaseURL = baseURL
}
client := openai.NewClientWithConfig(config)
ctx := context.Background()
// Tools
tools := []openai.Tool{
{Type: openai.ToolTypeFunction, Function: &openai.FunctionDefinition{
Name: "check_cpu", Description: "Check CPU usage of a server",
Parameters: json.RawMessage(`{"type":"object","properties":{"host":{"type":"string"}},"required":["host"]}`),
}},
{Type: openai.ToolTypeFunction, Function: &openai.FunctionDefinition{
Name: "list_top_processes", Description: "List top CPU-consuming processes",
Parameters: json.RawMessage(`{"type":"object","properties":{"host":{"type":"string"},"limit":{"type":"integer"}},"required":["host"]}`),
}},
{Type: openai.ToolTypeFunction, Function: &openai.FunctionDefinition{
Name: "restart_service", Description: "Restart a systemd service. CAUTION: causes brief downtime.",
Parameters: json.RawMessage(`{"type":"object","properties":{"host":{"type":"string"},"service":{"type":"string"}},"required":["host","service"]}`),
}},
}
messages := []openai.ChatCompletionMessage{
{Role: "system", Content: "You are a DevOps agent. Diagnose and fix server issues. Use tools to investigate, then act."},
{Role: "user", Content: "Server web-03 has high CPU. Investigate."},
}
// Multi-model loop
const (
fastModel = "gpt-4o-mini"
smartModel = "gpt-4o"
maxIter = 10
)
lastHadToolCalls := true
lastHadError := false
for i := 0; i < maxIter; i++ {
// --- Model selection ---
model := fastModel
if lastHadError || (lastHadToolCalls && i >= 3) || !lastHadToolCalls {
model = smartModel
}
fmt.Printf("\n[iter=%d model=%s msgs=%d]\n", i+1, model, len(messages))
resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: model,
Messages: messages,
Tools: tools,
Temperature: 0,
})
if err != nil {
fmt.Printf(" ERROR: %v\n", err)
return
}
msg := resp.Choices[0].Message
messages = append(messages, msg)
fmt.Printf(" tokens: prompt=%d completion=%d\n", resp.Usage.PromptTokens, resp.Usage.CompletionTokens)
// --- Final response? ---
if len(msg.ToolCalls) == 0 {
fmt.Printf(" FINAL ANSWER:\n%s\n", msg.Content)
break
}
// --- Execute tools ---
lastHadToolCalls = true
lastHadError = false
for _, tc := range msg.ToolCalls {
fmt.Printf(" tool_call: %s(%s)\n", tc.Function.Name, tc.Function.Arguments)
result := executeToolStub(tc.Function.Name, tc.Function.Arguments)
if strings.HasPrefix(result, "Error") {
lastHadError = true
}
fmt.Printf(" result: %s\n", truncate(result, 80))
messages = append(messages, openai.ChatCompletionMessage{
Role: openai.ChatMessageRoleTool,
Content: result,
ToolCallID: tc.ID,
})
}
}
}
// Tool stubs (replace with real calls)
func executeToolStub(name, argsJSON string) string {
switch name {
case "check_cpu":
return "CPU usage: 94%, load average: 12.3, uptime: 14 days"
case "list_top_processes":
return `PID COMMAND CPU%
18234 nginx-worker 82.1%
18235 nginx-worker 4.3%
892 postgres 3.2%
451 node 2.1%
1 systemd 0.1%`
case "restart_service":
var args struct{ Service string `json:"service"` }
json.Unmarshal([]byte(argsJSON), &args)
return fmt.Sprintf("Service %s restarted successfully. New PID: 19012", args.Service)
default:
return fmt.Sprintf("Error: unknown tool %s", name)
}
}
func truncate(s string, maxLen int) string {
s = strings.ReplaceAll(s, "\n", " | ")
if len(s) > maxLen {
return s[:maxLen] + "..."
}
return s
}
Expected output:
[iter=1 model=gpt-4o-mini msgs=2]
tokens: prompt=285 completion=28
tool_call: check_cpu({"host":"web-03"})
result: CPU usage: 94%, load average: 12.3, uptime: 14 days
[iter=2 model=gpt-4o-mini msgs=4]
tokens: prompt=432 completion=38
tool_call: list_top_processes({"host":"web-03","limit":5})
result: PID COMMAND CPU% | 18234 nginx-worker 82.1% | 18235 ngi...
[iter=3 model=gpt-4o msgs=6]
tokens: prompt=648 completion=32
tool_call: restart_service({"host":"web-03","service":"nginx"})
result: Service nginx restarted successfully. New PID: 19012
[iter=4 model=gpt-4o msgs=8]
tokens: prompt=795 completion=118
FINAL ANSWER:
CPU on web-03 was at 94%. Root cause: nginx-worker process (82.1% CPU).
Restarted nginx (new PID: 19012).
Recommend checking worker_processes in nginx.conf and setting up a CPU alert at 80%.
Notice the model= column: iterations 1–2 use the cheap model, iterations 3–4 use the powerful one. Runtime makes this decision before calling the LLM, based on the iteration number and the presence of tool calls.
Loop Stopping Strategies¶
The iteration limit (maxIterations) is protection against infinite loops, not a stopping strategy. The agent should stop when the task is solved.
Strategy 1: The Model Decides (Default)¶
The model stops calling tools and returns text. This works when the System Prompt contains an instruction:
Strategy 2: A "task_complete" Tool¶
Add a special tool that the agent calls to signal completion:
{
Name: "task_complete",
Description: "Call this when the task is fully resolved. Provide a summary of what was done.",
Parameters: json.RawMessage(`{
"type": "object",
"properties": {
"summary": {"type": "string", "description": "Summary of completed work"},
"success": {"type": "boolean", "description": "Whether the task was successful"}
},
"required": ["summary", "success"]
}`),
}
Advantage: you get a structured result (success/failure + description), not just text.
Strategy 3: Stuck Detection¶
If the last N actions are identical, the agent is stuck:
func detectStuck(messages []openai.ChatCompletionMessage, windowSize int) bool {
var recentCalls []string
for i := len(messages) - 1; i >= 0 && len(recentCalls) < windowSize; i-- {
msg := messages[i]
if msg.Role == openai.ChatMessageRoleAssistant && len(msg.ToolCalls) > 0 {
call := msg.ToolCalls[0].Function.Name + ":" + msg.ToolCalls[0].Function.Arguments
recentCalls = append(recentCalls, call)
}
}
if len(recentCalls) < windowSize {
return false
}
// Are all calls identical?
for _, c := range recentCalls[1:] {
if c != recentCalls[0] {
return false
}
}
return true
}
Strategy 4: Token Budget¶
Stop if too many tokens have been spent:
totalTokens += resp.Usage.TotalTokens
if totalTokens > maxTokenBudget {
// Budget exhausted — ask the model to give a final answer
messages = append(messages, openai.ChatCompletionMessage{
Role: openai.ChatMessageRoleUser,
Content: "Token budget exhausted. Provide your best answer based on information gathered so far.",
})
// Last call without tools — model must respond with text
resp, _ := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: model,
Messages: messages,
// Don't pass tools — model can only respond with text
})
return resp.Choices[0].Message.Content, nil
}
For more on token budgets, see Chapter 20: Cost & Latency Engineering.
Common Errors¶
Error 1: Infinite Loop¶
Symptom: Agent repeats the same action infinitely.
Cause: No iteration limit and no detection of repeating actions.
Solution:
// GOOD: Iteration limit + stuck detection
for i := 0; i < maxIterations; i++ {
// ...
// Detect repeating actions
if lastNActionsAreSame(history, 3) {
break
}
}
// GOOD: Improve prompt
systemPrompt := `... If action doesn't help, try a different approach.`
Error 2: Tool Result Not Added to History¶
Symptom: Agent doesn't see tool result and continues performing the same action.
Cause: Tool execution result is not added to messages[].
Solution:
// BAD: Result not added
result := executeTool(toolCall)
// History not updated!
// GOOD: ALWAYS add result!
messages = append(messages, openai.ChatCompletionMessage{
Role: openai.ChatMessageRoleTool,
Content: result,
ToolCallID: toolCall.ID, // Important for linking!
})
Error 3: Agent Doesn't Stop¶
Symptom: Agent continues calling tools even when the task is solved.
Cause: System Prompt doesn't instruct the agent to stop when the task is solved.
Solution:
// GOOD: Add to System Prompt
systemPrompt := `... If task is solved, answer user with text. Don't call tools unnecessarily.`
Error 4: One Model for All Iterations¶
Symptom: Agent is slow and expensive. Every tool call costs as much as the final response.
Cause: A single powerful model (GPT-4o) is used for all iterations, including simple tool selection.
Solution:
// BAD: GPT-4o for every iteration
model := "gpt-4o" // $0.01 per call × 8 iterations = $0.08
// GOOD: Cheap model for tool calls, powerful for analysis
if lastHadToolCalls {
model = "gpt-4o-mini" // $0.0002 per tool call × 7 = $0.0014
} else {
model = "gpt-4o" // $0.01 for final response × 1 = $0.01
}
// Total: $0.0114 instead of $0.08 — 7x savings
For more details, see Chapter 03: Model Selection Strategy.
Mini-Exercises¶
Exercise 1: Add Loop Detection¶
Implement a check that the last 3 actions are the same:
func isStuck(history []ChatCompletionMessage) bool {
// Check that last 3 actions are the same
// ...
}
Expected result:
- Function returns
trueif last 3 actions are the same - Function returns
falseotherwise
Exercise 2: Add Logging¶
Log each loop iteration:
fmt.Printf("[Iteration %d] Agent decided: %s\n", i, action)
fmt.Printf("[Iteration %d] Tool result: %s\n", i, result)
Expected result:
- Each iteration is logged with number and action
- Tool results are logged
Completion Criteria / Checklist¶
Completed:
- Agent executes loop autonomously
- Tool results are added to history
- Agent stops when task is solved
- Protection against loops (iteration limit + detection)
- Tool errors are handled and added to history
Not completed:
- Agent doesn't continue loop after tool execution
- Tool results are not visible to agent (not added to history)
- Agent loops infinitely (no protection)
- Agent doesn't stop when task is solved
Connection with Other Chapters¶
- Tools: How tools are called and return results, see Chapter 03: Tools
- Memory: How message history (
messages[]) grows and is managed, see Chapter 09: Agent Anatomy - Safety: How to stop the loop for confirmation, see Chapter 05: Safety
What's Next?¶
After studying autonomy, proceed to:
- 05. Safety and Human-in-the-Loop — how to protect the agent from dangerous actions