Designing Autonomous AI Agents¶

For programmers who want to build production AI agents

Table of contents¶

Part I: Fundamentals¶

00. Preface — How to use this handbook, requirements, and what an agent is
01. LLM Physics — Tokens, context, temperature, determinism, probabilistic nature
02. Prompting as Programming — ICL, Few-Shot, CoT, task structuring, SOP

Part II: Practice-first (build an agent)¶

03. Tools and Function Calling — JSON Schema, validation, error handling, tool↔runtime contract
04. Autonomy and Loops — ReAct loop, stopping, anti-loops, observability
05. Safety and Human-in-the-Loop — Confirmation, Clarification, Risk Scoring, Prompt Injection
06. RAG and Knowledge Base — Chunking, Retrieval, Grounding, search modes, limits
07. Multi-Agent Systems — Supervisor/Worker, context isolation, task routing
08. Evals and Reliability — Evals, prompt regressions, quality metrics, test datasets

Part III: Architecture and Runtime Core¶

09. Agent Anatomy — Memory, Tools, Planning, Runtime
10. Planning and Workflow Patterns — Plan→Execute, Plan-and-Revise, task decomposition, DAG/workflow, stop conditions
11. State Management — Tool idempotency, retries with exponential backoff, deadlines, persist state, task resumption
12. Agent Memory Systems — Short/long-term memory, episodic/semantic memory, forgetting/TTL, memory verification, storage/retrieval
13. Context Engineering — Context layers, fact selection policies, summarization, token budgets, context assembly from state+memory+retrieval
14. Ecosystem and Frameworks — Choosing between custom runtime and frameworks, portability, avoiding vendor lock-in

Part IV: Practice (case studies/practices)¶

15. Real-World Case Studies — Examples of agents in different domains (DevOps, Support, Data, Security, Product)
16. Best Practices and Application Areas — Best practices for creating and maintaining agents, application areas

Part V: Platform Infrastructure/Security¶

17. Security and Governance — Threat modeling, risk scoring, prompt injection protection (canonical), tool sandboxing, allowlists, policy-as-code, RBAC, dry-run modes, audit
18. Tool Protocols and Tool Servers — Tool↔runtime contract at process/service level, schema versioning, authn/authz

Part VI: Production Readiness¶

19. Observability and Tracing — Structured logging, tracing agent runs and tool calls, metrics, log correlation
20. Cost & Latency Engineering — Token budgets, iteration limits, caching, fallback models, batching, timeouts
21. Workflow and State Management in Production — Queues and asynchrony, scaling, distributed state
22. Prompt and Program Management — Prompt versioning, prompt regressions via evals, configs and feature flags, A/B testing
23. Evals in CI/CD — Quality gates in CI/CD, dataset versioning, handling flaky cases, security tests
24. Data and Privacy — PII detection and masking, secret protection, log redaction, log storage and TTL
25. Production Readiness Index — Prioritization guide (1 day / 1–2 weeks) and quick links to production topics

Appendices¶

Appendix: Reference Guides — Glossary, checklists, SOP templates, decision tables, Capability Benchmark

Reading path¶

For Beginners (recommended path — practice-first)¶

Start with Preface — learn what an agent is and how to use this handbook
Study LLM Physics — the foundation for understanding everything else
Master Prompting — the foundation of working with agents
Build a working agent:
- Tools and Function Calling — the agent's "hands"
- Autonomy and Loops — how agents work in loops
- Safety and Human-in-the-Loop — protecting against dangerous actions
Expand capabilities:
- RAG and Knowledge Base — working with documentation
- Multi-Agent Systems — teams of specialized agents
- Evals and Reliability — testing agents
Dive deeper into architecture:
- Agent Anatomy — components and their interactions
- Planning and Workflow Patterns — planning complex tasks
- State Management — execution reliability
- Agent Memory Systems — long-term memory
- Context Engineering — context management
Practice: Complete laboratory assignments alongside reading chapters

For Experienced Programmers¶

You can skip basic chapters and go directly to:

Tools and Function Calling
Autonomy and Loops
Case Studies — for understanding real-world applications

Quick Track: Core Concepts in 10 Minutes¶

If you're an experienced developer and want to quickly understand the essence:

What is an agent?
- Agent = LLM + Tools + Memory + Planning
- LLM is the "brain" that makes decisions
- Tools are the "hands" that perform actions
- Memory is history and long-term storage
- Planning is the ability to break down a task into steps

How does the agent loop work?

While (task not solved):
  1. Send history to LLM
  2. Get response (text or tool_call)
  3. If tool_call → execute tool → add result to history → repeat
  4. If text → show user and stop

Key points:
- LLM doesn't execute code. It generates JSON with an execution request.
- Runtime (your code) executes real Go functions.
- LLM doesn't "remember" the past. It processes it in messages[], which Runtime collects.
- Temperature = 0 for deterministic agent behavior.

Minimal example:

// 1. Define tool
tools := []openai.Tool{{
    Function: &openai.FunctionDefinition{
        Name: "check_status",
        Description: "Check server status",
    },
}}

// 2. Request to model
resp, _ := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
    Model: "gpt-4o-mini",
    Messages: []openai.ChatCompletionMessage{
        {Role: "system", Content: "You are a DevOps engineer"},
        {Role: "user", Content: "Check server status"},
    },
    Tools: tools,
})

// 3. Check tool_call
if len(resp.Choices[0].Message.ToolCalls) > 0 {
    // 4. Execute tool (Runtime)
    result := checkStatus()
    // 5. Add result to history
    messages = append(messages, openai.ChatCompletionMessage{
        Role: "tool",
        Content: result,
    })
    // 6. Send updated history back to model
}

What to read next:
- Chapter 03: Tools — detailed protocol
- Chapter 04: Autonomy — agent loop
- Chapter 09: Agent Anatomy — architecture

After Completing the Main Course¶

After studying chapters 1-16, proceed to:

Part V: Platform Infrastructure/Security — security, governance, tool protocols
Part VI: Production Readiness — practical guide to production readiness with step-by-step implementation recipes

Connection with laboratory assignments¶

Handbook Chapter	Corresponding Laboratory Assignments
01. LLM Physics	Lab 00 (Capability Check)
02. Prompting	Lab 01 (Basics)
03. Tools	Lab 02 (Tools), Lab 03 (Architecture)
04. Autonomy	Lab 04 (Autonomy)
05. Safety	Lab 05 (Human-in-the-Loop)
02. Prompting (SOP)	Lab 06 (Incident)
06. RAG	Lab 07 (RAG)
07. Multi-Agent	Lab 08 (Multi-Agent)
09. Agent Anatomy	Lab 01 (Basics), Lab 09 (Context Optimization)
10. Planning and Workflow Patterns	Lab 10 (Planning & Workflow)
11. State Management	Lab 10 (Planning & Workflow) — partially
12. Agent Memory Systems, 13. Context Engineering	Lab 11 (Memory & Context Engineering)
18. Tool Protocols and Tool Servers	Lab 12 (Tool Server Protocol)
17. Security and Governance	Lab 13 (Agent Security Hardening) — Optional
22. Prompt and Program Management	Lab 01 (Basics) — partially

How to use this handbook¶

Read sequentially — each chapter builds on previous ones
Practice alongside reading — complete the corresponding laboratory assignment after each chapter
Use as a reference — return to relevant sections when working on projects
Study examples — each chapter includes examples from different domains (DevOps, Support, Data, Security, Product)
Complete exercises — mini-exercises in each chapter help reinforce the material
Check your understanding — use checklists for self-assessment

Happy learning.