Designing Autonomous AI Agents¶

For programmers who want to build production AI agents

Table of contents¶

Part I: Fundamentals¶

00. Preface — How to use this handbook, requirements, and what an agent is
01. LLM Physics — Tokens, context, temperature, determinism, probabilistic nature
02. Prompting as Programming — ICL, Few-Shot, CoT, task structuring, SOP

Part II: Practice-first (build an agent)¶

03. Tools and Function Calling — JSON Schema, validation, error handling, tool↔runtime contract
04. Autonomy and Loops — ReAct loop, stopping, anti-loops, observability
05. Safety and Human-in-the-Loop — Confirmation, Clarification, Risk Scoring, Prompt Injection
06. RAG and Knowledge Base — Chunking, Retrieval, Grounding, search modes, limits
07. Multi-Agent Systems — Supervisor/Worker, context isolation, task routing
08. Evals and Reliability — Evals, prompt regressions, quality metrics, test datasets

Part III: Architecture and Runtime Core¶

09. Agent Anatomy — Memory, Tools, Planning, Runtime
10. Planning and Workflow Patterns — Plan→Execute, Plan-and-Revise, task decomposition, DAG/workflow, stop conditions
11. State Management — Tool idempotency, retries with exponential backoff, deadlines, persist state, task resumption
12. Agent Memory Systems — Two memory horizons (in-Run working memory vs cross-session long-term), linear history as an immutable log, prompt cache and why the system prompt stays stable, compact vs condense vs recall, optional block-based memory
13. Context Engineering — Stable system prefix, single threshold and single reaction (condense with a 1/Run cap), usage.PromptTokens as the primary source, common over-engineering traps (LayeredContext, message scoring, dynamic system prompt)
14. Ecosystem and Frameworks — Choosing between custom runtime and frameworks, portability, avoiding vendor lock-in

Part IV: Practice (case studies/practices)¶

15. Real-World Case Studies — Examples of agents in different domains (DevOps, Support, Data, Security, Product)
16. Best Practices and Application Areas — Best practices for creating and maintaining agents, application areas

Part V: Platform Infrastructure/Security¶

17. Security and Governance — Threat modeling, risk scoring, prompt injection protection (canonical), tool sandboxing, allowlists, policy-as-code, RBAC, dry-run modes, audit
18. Tool Protocols and Tool Servers — Tool↔runtime contract at process/service level, schema versioning, authn/authz

Part VI: Production Readiness¶

19. Observability and Tracing — Structured logging, tracing agent runs and tool calls, metrics, log correlation
20. Cost & Latency Engineering — Token budgets, iteration limits, caching, fallback models, batching, timeouts
21. Workflow and State Management in Production — Queues and asynchrony, scaling, distributed state
22. Prompt and Program Management — Prompt versioning, prompt regressions via evals, configs and feature flags, A/B testing
23. Evals in CI/CD — Quality gates in CI/CD, dataset versioning, handling flaky cases, security tests
24. Data and Privacy — PII detection and masking, secret protection, log redaction, log storage and TTL
25. Production Readiness Index — Prioritization guide (1 day / 1–2 weeks) and quick links to production topics

Appendices¶

Appendix: Reference Guides — Glossary, checklists, SOP templates, decision tables, Capability Benchmark

Reading path¶

For Beginners (recommended path — practice-first)¶

Start with Preface — what an agent is, the Brain + Tools + Memory + Planning equation, and don't skip the section «Mental Model: an Agent Is a New Employee». Without it the security chapters read as "special LLM machinery" instead of "common sense you already know".
Study LLM Physics — the foundation for understanding everything else
Master Prompting — the foundation of working with agents
Build a working agent:
- Tools and Function Calling — the agent's "hands"
- Autonomy and Loops — how agents work in loops
- Safety and Human-in-the-Loop — protecting against dangerous actions
Expand capabilities:
- RAG and Knowledge Base — working with documentation
- Multi-Agent Systems — teams of specialized agents
- Evals and Reliability — testing agents
Dive deeper into architecture:
- Agent Anatomy — components and their interactions
- Planning and Workflow Patterns — planning complex tasks
- State Management — execution reliability
- Agent Memory Systems — linear in-Run memory, cross-session long-term memory, prompt cache
- Context Engineering — stable prefix, condense on overflow, no over-engineering
Practice: Complete laboratory assignments alongside reading chapters

For Experienced Programmers¶

You can skip basic chapters and go directly to:

Preface → Mental Model: an Agent Is a New Employee — 5 minutes; explains the entire security part of the course with one idea
Tools and Function Calling
Autonomy and Loops
Agent Memory Systems and Context Engineering — memory and context without over-engineering
Case Studies — for understanding real-world applications

Quick Track: Core Concepts in 10 Minutes¶

If you're an experienced developer and want to quickly understand the essence:

What is an agent?
- Agent = LLM + Tools + Memory + Planning
- LLM is the "brain" that makes decisions
- Tools are the "hands" that perform actions
- Memory is history and long-term storage
- Planning is the ability to break down a task into steps
Mental model (more important than it sounds right now):
- An agent is a new employee on probation, not "new software".
- Role-based access, sign-off for dangerous actions, audit log, gradual trust expansion — same as for a person.
- Four asymmetries (where the "like with a person" model breaks): 1000× higher speed, parallelism, no sense of consequences, prompt injection as social engineering.
- Full version: Preface → Mental Model.

How does the agent loop work?

While (task not solved):
  1. Send history to LLM
  2. Get response (text or tool_call)
  3. If tool_call → execute tool → add result to history → repeat
  4. If text → show user and stop

Key points:
- LLM doesn't execute code. It generates JSON with an execution request.
- Runtime (your code) executes real Go functions.
- LLM doesn't "remember" the past. It processes it in messages[], which Runtime collects.
- Temperature = 0 for deterministic agent behavior.
- History is an immutable log; the system prompt stays stable across iterations (otherwise prompt cache is lost — see Ch. 12, Ch. 13).
- Take token counts from usage.PromptTokens in the provider's response, not from your own counters.

Minimal example:

// 1. Define tool
tools := []openai.Tool{{
    Function: &openai.FunctionDefinition{
        Name: "check_status",
        Description: "Check server status",
    },
}}

// 2. Request to model
resp, _ := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
    Model: "gpt-4o-mini",
    Messages: []openai.ChatCompletionMessage{
        {Role: "system", Content: "You are a DevOps engineer"},
        {Role: "user", Content: "Check server status"},
    },
    Tools: tools,
})

// 3. Check tool_call
if len(resp.Choices[0].Message.ToolCalls) > 0 {
    // 4. Execute tool (Runtime)
    result := checkStatus()
    // 5. Add result to history
    messages = append(messages, openai.ChatCompletionMessage{
        Role: "tool",
        Content: result,
    })
    // 6. Send updated history back to model
}

What to read next:
- Chapter 03: Tools — detailed protocol
- Chapter 04: Autonomy — agent loop
- Chapter 09: Agent Anatomy — architecture
- Chapter 12: Agent Memory and Chapter 13: Context Engineering — without over-engineering

After Completing the Main Course¶

After studying chapters 1-16, proceed to:

Part V: Platform Infrastructure/Security — security, governance, tool protocols
Part VI: Production Readiness — practical guide to production readiness with step-by-step implementation recipes

Connection with laboratory assignments¶

Handbook Chapter	Corresponding Laboratory Assignments
01. LLM Physics	Lab 00 (Capability Check)
02. Prompting	Lab 01 (Basics)
03. Tools	Lab 02 (Tools), Lab 03 (Architecture)
04. Autonomy	Lab 04 (Autonomy)
05. Safety	Lab 05 (Human-in-the-Loop)
02. Prompting (SOP)	Lab 06 (Incident)
06. RAG	Lab 07 (RAG)
07. Multi-Agent	Lab 08 (Multi-Agent)
09. Agent Anatomy	Lab 01 (Basics), Lab 09 (Context Optimization)
10. Planning and Workflow Patterns	Lab 10 (Planning & Workflow)
11. State Management	Lab 10 (Planning & Workflow) — partially
12. Agent Memory Systems, 13. Context Engineering	Lab 11 (Memory & Context Engineering)
18. Tool Protocols and Tool Servers	Lab 12 (Tool Server Protocol), Lab 13 (Tool Retrieval & Pipelines) — Optional
17. Security and Governance	— (read as a theory capstone; security practice is embedded in Lab 02 / Lab 05 / Lab 12)
22. Prompt and Program Management	Lab 01 (Basics) — partially

How to use this handbook¶

Read sequentially — each chapter builds on previous ones
Practice alongside reading — complete the corresponding laboratory assignment after each chapter
Use as a reference — return to relevant sections when working on projects
Study examples — each chapter includes examples from different domains (DevOps, Support, Data, Security, Product)
Complete exercises — mini-exercises in each chapter help reinforce the material
Check your understanding — use checklists for self-assessment

Happy learning.