19. Observability and Tracing¶
Why This Chapter?¶
Without observability, you work blind. An agent performed an action, but you can't understand:
- Why did it choose this particular tool?
- How long did execution take?
- How many tokens were used?
- Where did the error occur?
Observability gives your agent "vision". Without it, you can't debug problems, optimize performance, or understand how the agent behaves in production.
Real-World Case Study¶
Situation: Agent in production performed a data deletion operation without confirmation. User complained.
Problem: You cannot understand why the agent didn't request confirmation. Logs only show "Agent executed delete_database". No information about which tools were available, what prompt was used, what arguments were passed.
Solution: Structured logging with tracing shows the full picture: input request → tool selection → execution → result. Then you can see whether the agent missed a confirmation tool, or whether the prompt was wrong.
Theory in Simple Terms¶
What Is Observability?¶
Observability is the ability to understand what's happening inside a system by observing its outputs (logs, metrics, traces).
Three pillars of observability:
- Logs — what happened (events, errors, actions)
- Metrics — how much and how often (latency, token usage, error rate)
- Traces — how it's connected (full request path through the system)
How Does Tracing Work in Agents?¶
Each agent run is a chain of events:
- User sends request →
run_idcreated - Agent analyzes request → log input data
- Agent selects tools → log
tool_calls - Tools execute → log results
- Agent formulates response → log final response
All these events are linked through run_id, so you can reconstruct the full execution flow.
How It Works (Step by Step)¶
Step 1: Agent Run Logging Structure¶
Create a structure to store agent run information:
type AgentRun struct {
RunID string `json:"run_id"`
UserInput string `json:"user_input"`
ToolCalls []ToolCall `json:"tool_calls"`
ToolResults []ToolResult `json:"tool_results"`
FinalAnswer string `json:"final_answer"`
TokensUsed int `json:"tokens_used"`
Latency time.Duration `json:"latency"`
Timestamp time.Time `json:"timestamp"`
}
type ToolCall struct {
ID string `json:"id"`
Name string `json:"name"`
Arguments string `json:"arguments"`
Timestamp time.Time `json:"timestamp"`
}
type ToolResult struct {
ToolCallID string `json:"tool_call_id"`
Result string `json:"result"`
Error string `json:"error,omitempty"`
Latency time.Duration `json:"latency"`
Timestamp time.Time `json:"timestamp"`
}
Step 2: Generate Unique run_id¶
At the start of each agent run, generate a unique ID:
import (
"crypto/rand"
"encoding/hex"
"time"
)
func generateRunID() string {
bytes := make([]byte, 8)
rand.Read(bytes)
return hex.EncodeToString(bytes) + "-" + time.Now().Format("20060102-150405")
}
Step 3: Logging in Agent Loop¶
Insert logging into the agent loop (see labs/lab04-autonomy/main.go):
func runAgent(ctx context.Context, client *openai.Client, userInput string) (string, error) {
runID := generateRunID()
startTime := time.Now()
// Create structure for logging
run := AgentRun{
RunID: runID,
UserInput: userInput,
ToolCalls: []ToolCall{},
Timestamp: time.Now(),
}
messages := []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleSystem, Content: "You are an autonomous DevOps agent."},
{Role: openai.ChatMessageRoleUser, Content: userInput},
}
// THE LOOP with logging
for i := 0; i < 5; i++ {
req := openai.ChatCompletionRequest{
Model: "gpt-4o-mini",
Messages: messages,
Tools: tools,
}
resp, err := client.CreateChatCompletion(ctx, req)
if err != nil {
logAgentRunError(runID, err)
return "", err
}
msg := resp.Choices[0].Message
messages = append(messages, msg)
// Log token usage
run.TokensUsed += resp.Usage.TotalTokens
if len(msg.ToolCalls) == 0 {
run.FinalAnswer = msg.Content
run.Latency = time.Since(startTime)
logAgentRun(run)
return msg.Content, nil
}
// Log tool calls
for _, toolCall := range msg.ToolCalls {
toolCallLog := ToolCall{
ID: toolCall.ID,
Name: toolCall.Function.Name,
Arguments: toolCall.Function.Arguments,
Timestamp: time.Now(),
}
run.ToolCalls = append(run.ToolCalls, toolCallLog)
// Execute tool with logging
result, err := executeToolWithLogging(runID, toolCall)
if err != nil {
run.ToolResults = append(run.ToolResults, ToolResult{
ToolCallID: toolCall.ID,
Error: err.Error(),
Timestamp: time.Now(),
})
} else {
run.ToolResults = append(run.ToolResults, ToolResult{
ToolCallID: toolCall.ID,
Result: result,
Timestamp: time.Now(),
})
}
messages = append(messages, openai.ChatCompletionMessage{
Role: openai.ChatMessageRoleTool,
Content: result,
ToolCallID: toolCall.ID,
})
}
}
run.Latency = time.Since(startTime)
logAgentRun(run)
return run.FinalAnswer, nil
}
Step 4: Structured Logging¶
Implement logging function in JSON format:
import (
"encoding/json"
"log"
)
func logAgentRun(run AgentRun) {
logJSON, err := json.Marshal(run)
if err != nil {
log.Printf("Failed to marshal agent run: %v", err)
return
}
log.Printf("AGENT_RUN: %s", string(logJSON))
}
func logAgentRunError(runID string, err error) {
logJSON, _ := json.Marshal(map[string]any{
"run_id": runID,
"error": err.Error(),
"timestamp": time.Now(),
})
log.Printf("AGENT_RUN_ERROR: %s", string(logJSON))
}
Step 5: Tool Execution Logging¶
In tool execution functions (see labs/lab02-tools/main.go) add logging:
func executeToolWithLogging(runID string, toolCall openai.ToolCall) (string, error) {
startTime := time.Now()
log.Printf("TOOL_CALL_START: run_id=%s tool_id=%s tool_name=%s args=%s",
runID, toolCall.ID, toolCall.Function.Name, toolCall.Function.Arguments)
var result string
var err error
switch toolCall.Function.Name {
case "check_disk":
result = checkDisk()
case "clean_logs":
result = cleanLogs()
default:
err = fmt.Errorf("unknown tool: %s", toolCall.Function.Name)
}
latency := time.Since(startTime)
if err != nil {
log.Printf("TOOL_CALL_ERROR: run_id=%s tool_id=%s tool_name=%s error=%s latency=%v",
runID, toolCall.ID, toolCall.Function.Name, err.Error(), latency)
return "", err
}
log.Printf("TOOL_CALL_SUCCESS: run_id=%s tool_id=%s tool_name=%s latency=%v",
runID, toolCall.ID, toolCall.Function.Name, latency)
return result, nil
}
Where to Integrate This in Our Code¶
Integration Point 1: Agent Loop¶
In labs/lab04-autonomy/main.go add logging to the agent loop:
// At start of main():
runID := generateRunID()
startTime := time.Now()
// In loop before CreateChatCompletion:
log.Printf("AGENT_ITERATION: run_id=%s iteration=%d", runID, i)
// After getting response:
log.Printf("AGENT_RESPONSE: run_id=%s tokens_used=%d", runID, resp.Usage.TotalTokens)
// After executing tools:
log.Printf("TOOL_EXECUTED: run_id=%s tool=%s result=%s", runID, toolCall.Function.Name, result)
Integration Point 2: Tool Execution¶
In labs/lab02-tools/main.go add logging when executing tools:
// In tool execution function:
func executeTool(runID string, toolCall openai.ToolCall) (string, error) {
log.Printf("TOOL_START: run_id=%s tool=%s", runID, toolCall.Function.Name)
// ... execution ...
log.Printf("TOOL_END: run_id=%s tool=%s result=%s", runID, toolCall.Function.Name, result)
return result, nil
}
Mini Code Example¶
Complete example with observability based on labs/lab04-autonomy/main.go:
package main
import (
"context"
"crypto/rand"
"encoding/hex"
"encoding/json"
"fmt"
"log"
"os"
"time"
"github.com/sashabaranov/go-openai"
)
type AgentRun struct {
RunID string `json:"run_id"`
UserInput string `json:"user_input"`
ToolCalls []ToolCall `json:"tool_calls"`
ToolResults []ToolResult `json:"tool_results"`
FinalAnswer string `json:"final_answer"`
TokensUsed int `json:"tokens_used"`
Latency string `json:"latency"`
Timestamp time.Time `json:"timestamp"`
}
type ToolCall struct {
ID string `json:"id"`
Name string `json:"name"`
Arguments string `json:"arguments"`
Timestamp time.Time `json:"timestamp"`
}
type ToolResult struct {
ToolCallID string `json:"tool_call_id"`
Result string `json:"result"`
Error string `json:"error,omitempty"`
Latency string `json:"latency"`
Timestamp time.Time `json:"timestamp"`
}
func generateRunID() string {
bytes := make([]byte, 8)
rand.Read(bytes)
return hex.EncodeToString(bytes) + "-" + time.Now().Format("20060102-150405")
}
func logAgentRun(run AgentRun) {
logJSON, err := json.Marshal(run)
if err != nil {
log.Printf("Failed to marshal: %v", err)
return
}
log.Printf("AGENT_RUN: %s", string(logJSON))
}
func checkDisk() string { return "Disk Usage: 95% (CRITICAL). Large folder: /var/log" }
func cleanLogs() string { return "Logs cleaned. Freed 20GB." }
func main() {
token := os.Getenv("OPENAI_API_KEY")
baseURL := os.Getenv("OPENAI_BASE_URL")
if token == "" {
token = "dummy"
}
config := openai.DefaultConfig(token)
if baseURL != "" {
config.BaseURL = baseURL
}
client := openai.NewClientWithConfig(config)
ctx := context.Background()
userInput := "I'm out of disk space. Fix it."
// Generate run_id
runID := generateRunID()
startTime := time.Now()
run := AgentRun{
RunID: runID,
UserInput: userInput,
Timestamp: time.Now(),
}
tools := []openai.Tool{
{
Type: openai.ToolTypeFunction,
Function: &openai.FunctionDefinition{
Name: "check_disk",
Description: "Check current disk usage",
},
},
{
Type: openai.ToolTypeFunction,
Function: &openai.FunctionDefinition{
Name: "clean_logs",
Description: "Delete old logs to free space",
},
},
}
messages := []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleSystem, Content: "You are an autonomous DevOps agent."},
{Role: openai.ChatMessageRoleUser, Content: userInput},
}
fmt.Printf("Starting Agent Loop (run_id: %s)...\n", runID)
for i := 0; i < 5; i++ {
log.Printf("AGENT_ITERATION: run_id=%s iteration=%d", runID, i)
req := openai.ChatCompletionRequest{
Model: "gpt-4o-mini",
Messages: messages,
Tools: tools,
}
resp, err := client.CreateChatCompletion(ctx, req)
if err != nil {
log.Printf("AGENT_ERROR: run_id=%s error=%v", runID, err)
panic(fmt.Sprintf("API Error: %v", err))
}
msg := resp.Choices[0].Message
messages = append(messages, msg)
run.TokensUsed += resp.Usage.TotalTokens
log.Printf("AGENT_RESPONSE: run_id=%s tokens_used=%d", runID, resp.Usage.TotalTokens)
if len(msg.ToolCalls) == 0 {
run.FinalAnswer = msg.Content
run.Latency = time.Since(startTime).String()
logAgentRun(run)
fmt.Println("AI:", msg.Content)
break
}
for _, toolCall := range msg.ToolCalls {
toolCallStart := time.Now()
toolCallLog := ToolCall{
ID: toolCall.ID,
Name: toolCall.Function.Name,
Arguments: toolCall.Function.Arguments,
Timestamp: time.Now(),
}
run.ToolCalls = append(run.ToolCalls, toolCallLog)
fmt.Printf("Executing tool: %s\n", toolCall.Function.Name)
log.Printf("TOOL_START: run_id=%s tool_id=%s tool_name=%s", runID, toolCall.ID, toolCall.Function.Name)
var result string
if toolCall.Function.Name == "check_disk" {
result = checkDisk()
} else if toolCall.Function.Name == "clean_logs" {
result = cleanLogs()
}
toolCallLatency := time.Since(toolCallStart)
log.Printf("TOOL_END: run_id=%s tool_id=%s tool_name=%s latency=%v", runID, toolCall.ID, toolCall.Function.Name, toolCallLatency)
run.ToolResults = append(run.ToolResults, ToolResult{
ToolCallID: toolCall.ID,
Result: result,
Latency: toolCallLatency.String(),
Timestamp: time.Now(),
})
fmt.Println("Tool Output:", result)
messages = append(messages, openai.ChatCompletionMessage{
Role: openai.ChatMessageRoleTool,
Content: result,
ToolCallID: toolCall.ID,
})
}
}
}
Distributed Tracing — OpenTelemetry for Agents¶
So far we've been logging events as a flat list: TOOL_START, TOOL_END, AGENT_ITERATION. This works for a single agent. But once the agent calls external services (APIs, databases, other agents), flat logs won't help. You can't see what exactly slowed down the chain.
Distributed Tracing solves this problem. Each operation becomes a span — a time segment with a start, end, and context. Spans nest inside each other, forming a tree.
Span-Based Tracing for Agents¶
For an agent, the span tree looks like this:
agent_run (root span)
├── llm_call (iteration 1)
│ └── openai_api_request
├── tool_call: check_disk
│ └── ssh_connection
├── llm_call (iteration 2)
│ └── openai_api_request
├── tool_call: clean_logs
│ └── file_system_operation
└── llm_call (iteration 3 — final answer)
Each span contains: operation name, start and end time, attributes (run_id, tool_name, tokens_used), and a reference to its parent span.
OpenTelemetry: Initialization¶
OpenTelemetry (OTEL) is the standard for tracing. It supports export to Jaeger, Zipkin, Grafana Tempo, and other backends.
Tracer initialization:
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
"go.opentelemetry.io/otel/sdk/resource"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)
func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
exporter, err := otlptracehttp.New(ctx,
otlptracehttp.WithEndpoint("localhost:4318"),
otlptracehttp.WithInsecure(),
)
if err != nil {
return nil, fmt.Errorf("create exporter: %w", err)
}
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String("devops-agent"),
)),
)
otel.SetTracerProvider(tp)
return tp, nil
}
Spans for Agent Iterations and Tool Calls¶
Wrap each agent step in a span:
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/trace"
)
var tracer = otel.Tracer("devops-agent")
func runAgentWithTracing(ctx context.Context, client *openai.Client, userInput string) (string, error) {
// Root span — the entire agent run
ctx, rootSpan := tracer.Start(ctx, "agent_run",
trace.WithAttributes(
attribute.String("user_input", userInput),
attribute.String("run_id", generateRunID()),
),
)
defer rootSpan.End()
messages := []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleSystem, Content: systemPrompt},
{Role: openai.ChatMessageRoleUser, Content: userInput},
}
for i := 0; i < maxIterations; i++ {
// Span for each loop iteration
ctx, iterSpan := tracer.Start(ctx, "agent_iteration",
trace.WithAttributes(attribute.Int("iteration", i)),
)
// Span for LLM call
llmCtx, llmSpan := tracer.Start(ctx, "llm_call",
trace.WithAttributes(attribute.String("model", model)),
)
resp, err := client.CreateChatCompletion(llmCtx, req)
if err != nil {
llmSpan.RecordError(err)
llmSpan.End()
iterSpan.End()
return "", err
}
llmSpan.SetAttributes(
attribute.Int("tokens.prompt", resp.Usage.PromptTokens),
attribute.Int("tokens.completion", resp.Usage.CompletionTokens),
attribute.Int("tokens.total", resp.Usage.TotalTokens),
)
llmSpan.End()
msg := resp.Choices[0].Message
if len(msg.ToolCalls) == 0 {
rootSpan.SetAttributes(attribute.String("final_answer", msg.Content))
iterSpan.End()
return msg.Content, nil
}
// Span for each tool call
for _, toolCall := range msg.ToolCalls {
_, toolSpan := tracer.Start(ctx, "tool_call",
trace.WithAttributes(
attribute.String("tool.name", toolCall.Function.Name),
attribute.String("tool.args", toolCall.Function.Arguments),
),
)
result, err := executeTool(toolCall)
if err != nil {
toolSpan.RecordError(err)
toolSpan.SetAttributes(attribute.Bool("tool.error", true))
} else {
toolSpan.SetAttributes(attribute.String("tool.result", result))
}
toolSpan.End()
}
iterSpan.End()
}
return "", nil
}
Propagation: Passing Context Between Services¶
When your agent calls an external HTTP service, the tracing context must travel with it. OTEL uses traceparent and tracestate HTTP headers:
import (
"go.opentelemetry.io/otel/propagation"
"net/http"
)
// Sender: inject context into HTTP headers
func callExternalService(ctx context.Context, url string) (*http.Response, error) {
req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
// OTEL injects traceparent into request headers
otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))
return http.DefaultClient.Do(req)
}
// Receiver: extract context from HTTP headers
func handleRequest(w http.ResponseWriter, r *http.Request) {
// Extract trace context from incoming request
ctx := otel.GetTextMapPropagator().Extract(r.Context(), propagation.HeaderCarrier(r.Header))
// Spans in this service now belong to the original trace
ctx, span := tracer.Start(ctx, "external_service.handle")
defer span.End()
// ... handle request ...
}
After this, you see a single trace spanning all services in Jaeger or Grafana Tempo.
OTEL exporters: For local development, Jaeger is convenient (
docker run -p 16686:16686 jaegertracing/all-in-one). For production — Grafana Tempo or Zipkin. OTEL supports all of them through a single API.
Multi-Agent Tracing¶
In Chapter 07: Multi-Agent Systems we split agents into a Supervisor and Workers. Each Worker is a separate agent loop with its own context. Without tracing, you can't tell which Worker stalled, which one failed, or how the task flowed through the entire system.
Parent-Child: Supervisor Span → Worker Span¶
The key idea: the Supervisor's span becomes the parent of Worker spans. This gives you the full execution tree:
supervisor_run
├── llm_call (Supervisor decides who to delegate to)
├── worker_call: network_expert
│ ├── llm_call
│ └── tool_call: ping_host
├── worker_call: db_expert
│ ├── llm_call
│ └── tool_call: check_db_version
└── llm_call (Supervisor assembles final answer)
Code: Passing Trace Context from Supervisor to Worker¶
// Worker receives context with trace information from Supervisor
func runWorker(ctx context.Context, client *openai.Client, workerName string, task string, tools []openai.Tool) (string, error) {
// Create child span — it automatically attaches to the parent from ctx
ctx, span := tracer.Start(ctx, "worker_call",
trace.WithAttributes(
attribute.String("worker.name", workerName),
attribute.String("worker.task", task),
),
)
defer span.End()
messages := []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleSystem, Content: fmt.Sprintf("You are %s.", workerName)},
{Role: openai.ChatMessageRoleUser, Content: task},
}
for i := 0; i < 5; i++ {
_, llmSpan := tracer.Start(ctx, "worker_llm_call",
trace.WithAttributes(attribute.Int("iteration", i)),
)
resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "gpt-4o",
Messages: messages,
Tools: tools,
})
llmSpan.SetAttributes(
attribute.Int("tokens.total", resp.Usage.TotalTokens),
)
llmSpan.End()
if err != nil {
span.RecordError(err)
return "", err
}
msg := resp.Choices[0].Message
if len(msg.ToolCalls) == 0 {
span.SetAttributes(attribute.String("worker.result", msg.Content))
return msg.Content, nil
}
for _, tc := range msg.ToolCalls {
_, toolSpan := tracer.Start(ctx, "worker_tool_call",
trace.WithAttributes(
attribute.String("tool.name", tc.Function.Name),
),
)
result, _ := executeTool(tc)
toolSpan.End()
messages = append(messages, openai.ChatCompletionMessage{
Role: openai.ChatMessageRoleTool, Content: result, ToolCallID: tc.ID,
})
}
messages = append(messages, msg)
}
return "", fmt.Errorf("worker %s exceeded max iterations", workerName)
}
The Supervisor calls Workers by passing ctx:
func runSupervisor(ctx context.Context, client *openai.Client, userInput string) (string, error) {
ctx, span := tracer.Start(ctx, "supervisor_run",
trace.WithAttributes(attribute.String("user_input", userInput)),
)
defer span.End()
// Supervisor decides who to delegate to
// ... (LLM call for routing) ...
// ctx already contains the Supervisor's trace context.
// Worker creates a child span automatically.
networkResult, err := runWorker(ctx, client, "network_expert", "Check connectivity to db-host:5432", networkTools)
if err != nil {
span.RecordError(err)
return "", err
}
dbResult, err := runWorker(ctx, client, "db_expert", "Get PostgreSQL version", dbTools)
if err != nil {
span.RecordError(err)
return "", err
}
span.SetAttributes(
attribute.String("network_result", networkResult),
attribute.String("db_result", dbResult),
)
// ... Supervisor assembles final answer ...
return finalAnswer, nil
}
Cross-Agent Correlation¶
Trace ID propagates automatically through ctx. All spans within a single trace share the same trace_id. In Jaeger you search by this ID and see the full path: from user request through Supervisor to each Worker and back.
If a Worker runs as a separate HTTP service, use propagation from the previous section — pass traceparent through headers.
Component-Level Metrics¶
Aggregate metrics (total_tokens, avg_latency) show the overall picture. But for diagnostics you need granularity: which tool is slow? Which LLM call burns the most tokens? Which search query returns irrelevant results?
Per-Component Metrics¶
Three categories of component-level metrics:
| Component | Metrics | Purpose |
|---|---|---|
| Tool | Latency, error rate, calls/min | Find slow tools |
| LLM | Tokens, latency, cost per call | Optimize token spend |
| Retrieval | Relevance score, docs found, latency | Improve RAG quality |
Code: Collecting Component Metrics¶
// ComponentMetrics collects metrics for each agent component
type ComponentMetrics struct {
mu sync.Mutex
tools map[string]*ToolMetrics
llm *LLMMetrics
retrieval *RetrievalMetrics
}
type ToolMetrics struct {
Name string
TotalCalls int64
TotalErrors int64
TotalLatency time.Duration
}
type LLMMetrics struct {
TotalCalls int64
TotalTokens int64
TotalPromptTokens int64
TotalCompletionTokens int64
TotalLatency time.Duration
TotalCost float64
}
type RetrievalMetrics struct {
TotalQueries int64
TotalDocuments int64
TotalLatency time.Duration
RelevanceScores []float64 // for computing average
}
func NewComponentMetrics() *ComponentMetrics {
return &ComponentMetrics{
tools: make(map[string]*ToolMetrics),
llm: &LLMMetrics{},
retrieval: &RetrievalMetrics{},
}
}
Recording metrics on tool call:
func (cm *ComponentMetrics) RecordToolCall(name string, latency time.Duration, err error) {
cm.mu.Lock()
defer cm.mu.Unlock()
m, ok := cm.tools[name]
if !ok {
m = &ToolMetrics{Name: name}
cm.tools[name] = m
}
m.TotalCalls++
m.TotalLatency += latency
if err != nil {
m.TotalErrors++
}
}
func (cm *ComponentMetrics) RecordLLMCall(promptTokens, completionTokens int, latency time.Duration, costPerToken float64) {
cm.mu.Lock()
defer cm.mu.Unlock()
cm.llm.TotalCalls++
cm.llm.TotalPromptTokens += int64(promptTokens)
cm.llm.TotalCompletionTokens += int64(completionTokens)
cm.llm.TotalTokens += int64(promptTokens + completionTokens)
cm.llm.TotalLatency += latency
cm.llm.TotalCost += float64(promptTokens+completionTokens) * costPerToken
}
func (cm *ComponentMetrics) RecordRetrieval(docsFound int, relevanceScore float64, latency time.Duration) {
cm.mu.Lock()
defer cm.mu.Unlock()
cm.retrieval.TotalQueries++
cm.retrieval.TotalDocuments += int64(docsFound)
cm.retrieval.TotalLatency += latency
cm.retrieval.RelevanceScores = append(cm.retrieval.RelevanceScores, relevanceScore)
}
Usage in agent loop:
metrics := NewComponentMetrics()
// On LLM call
llmStart := time.Now()
resp, err := client.CreateChatCompletion(ctx, req)
metrics.RecordLLMCall(
resp.Usage.PromptTokens,
resp.Usage.CompletionTokens,
time.Since(llmStart),
0.000003, // $3 per 1M tokens
)
// On tool call
toolStart := time.Now()
result, err := executeTool(toolCall)
metrics.RecordToolCall(toolCall.Function.Name, time.Since(toolStart), err)
Linking Traces to Evals¶
Traces show how the agent performed the task. Evals show how well it did. The trace → eval link gives you the full picture:
type TracedEvalResult struct {
TraceID string `json:"trace_id"` // OpenTelemetry trace ID
EvalLevel string `json:"eval_level"` // task, tool, trajectory, topic
Score float64 `json:"score"`
ToolsUsed []string `json:"tools_used"`
TotalTokens int `json:"total_tokens"`
TotalLatency string `json:"total_latency"`
}
// After agent run completes: record eval result linked to trace
func recordTracedEval(ctx context.Context, result TracedEvalResult) {
span := trace.SpanFromContext(ctx)
result.TraceID = span.SpanContext().TraceID().String()
// Now you can find the full trace in Jaeger by TraceID
// and understand WHY the eval failed
log.Printf("EVAL_RESULT: %+v", result)
}
When an eval in CI/CD shows a regression (see Chapter 23: Evals in CI/CD), you take the trace_id from the result, find the trace in Jaeger, and see the full chain: which tool returned an unexpected result, which LLM call burned abnormally many tokens, where the error occurred.
Common Errors¶
Error 1: No Structured Logging¶
Symptom: Logs in plain text: "Agent executed tool check_disk". Impossible to automatically parse or analyze logs.
Cause: Using fmt.Printf instead of structured JSON logging.
Solution:
// BAD
fmt.Printf("Agent executed tool %s\n", toolName)
// GOOD
logJSON, _ := json.Marshal(map[string]any{
"run_id": runID,
"tool": toolName,
"timestamp": time.Now(),
})
log.Printf("TOOL_EXECUTED: %s", string(logJSON))
Error 2: No run_id for Correlation¶
Symptom: Cannot link logs from different components (agent, tools, external systems) for one request.
Cause: Each component logs independently, without a common identifier.
Solution:
// BAD
log.Printf("Tool executed: %s", toolName)
// GOOD
runID := generateRunID() // At start of agent run
log.Printf("TOOL_EXECUTED: run_id=%s tool=%s", runID, toolName)
Error 3: Tokens and Latency Not Logged¶
Symptom: Cannot understand execution cost or performance bottlenecks.
Cause: Token usage and execution time are not tracked.
Solution:
// BAD
resp, _ := client.CreateChatCompletion(ctx, req)
// Tokens not logged
// GOOD
resp, _ := client.CreateChatCompletion(ctx, req)
log.Printf("TOKENS_USED: run_id=%s tokens=%d", runID, resp.Usage.TotalTokens)
startTime := time.Now()
// ... execution ...
log.Printf("LATENCY: run_id=%s latency=%v", runID, time.Since(startTime))
Error 4: Only Successful Cases Logged¶
Symptom: Errors are not logged, cannot understand why agent failed.
Cause: Logging only in successful code branches.
Solution:
// BAD
result, err := executeTool(toolCall)
if err != nil {
return "", err // Error not logged
}
// GOOD
result, err := executeTool(toolCall)
if err != nil {
log.Printf("TOOL_ERROR: run_id=%s tool=%s error=%v", runID, toolCall.Function.Name, err)
return "", err
}
Error 5: No Metrics (Only Logs)¶
Symptom: Cannot build graphs of latency, error rate, token usage over time.
Cause: Only events are logged, but metrics are not aggregated.
Solution: Use metrics (Prometheus, StatsD) in addition to logs:
// Log event
log.Printf("TOOL_EXECUTED: run_id=%s tool=%s", runID, toolName)
// Send metric
metrics.IncrementCounter("tool_executions_total", map[string]string{"tool": toolName})
metrics.RecordLatency("tool_execution_duration", latency, map[string]string{"tool": toolName})
Mini-Exercises¶
Exercise 1: Implement Structured Logging¶
Add structured logging to labs/lab04-autonomy/main.go:
func logAgentRun(runID string, userInput string, toolCalls []ToolCall, result string) {
// Your code here
// Log in JSON format
}
Expected result:
- Logs in JSON format
- Contains all necessary fields: run_id, user_input, tool_calls, result, timestamp
Exercise 2: Add Tool Call Tracing¶
Implement a function to log tool call with latency measurement:
func logToolCall(runID string, toolCallID string, toolName string, latency time.Duration) {
// Your code here
// Log in format: TOOL_CALL: run_id=... tool_id=... tool_name=... latency=...
}
Expected result:
- Each tool call logged with run_id
- Latency measured and logged
- Format allows easy log parsing
Exercise 3: Implement Metrics¶
Add metric counting (can use simple in-memory counter):
type Metrics struct {
TotalRuns int64
TotalTokens int64
TotalErrors int64
AvgLatency time.Duration
}
func (m *Metrics) RecordRun(tokens int, latency time.Duration, err error) {
// Your code here
// Update metrics
}
Expected result:
- Metrics updated on each agent run
- Can get average values through methods like
GetAvgLatency()
Completion Criteria / Checklist¶
Completed (production ready):
- Structured logging implemented (JSON format)
- Each agent run has unique
run_id - All stages logged: input request → tool selection → execution → result
- Tokens and latency logged for each request
- Tool calls logged with
run_idfor correlation - Errors logged with context
- Metrics tracked (latency, token usage, error rate)
Not completed:
- Logs in plain text without structure
- No
run_idfor log correlation - Tokens and latency not logged
- Errors not logged
- No metrics
Connection with Other Chapters¶
- Best Practices: General logging practices studied in Chapter 16: Best Practices
- Agent Loop: Basic agent loop studied in Chapter 04: Autonomy and Loops
- Tools: Tool execution studied in Chapter 03: Tools and Function Calling
- Cost Engineering: Token usage for cost control — Chapter 20: Cost & Latency Engineering