Skip to content

Appendix: Reference Guides

Why This Chapter?

This section contains reference information: glossary of terms, checklists, SOP templates, and decision tables. Use it as a reference when working on projects.

Glossary

Core Concepts

Agent — a system that uses an LLM as a "reasoning engine" to perceive its environment, make decisions, and perform actions. Consists of: LLM (brain) + Tools (hands) + Memory (memory) + Planning (planning).

See also: Chapter 00: Preface

Runtime (Execution Environment) — the agent code you write in Go. It connects the LLM with tools and manages the agent work loop. It performs LLM response parsing, validation, tool execution, and dialogue history management.

Important: Runtime is not a separate system or framework. It's your code in main.go or separate modules.

See also: Chapter 09: Agent Anatomy

Tool — a Go function, API call, or command that an agent can execute to interact with the real world. It's described in JSON Schema format and passed to the model in the tools[] field.

Synonyms: Function (in Function Calling context)

See also: Chapter 03: Tools and Function Calling

Tool Call / Function Call — a structured JSON request that the LLM generates to call a tool. It contains the tool name and arguments in JSON format. It's returned in the tool_calls field of the model response.

Note: "Tool Call" and "Function Call" are the same. "Function Calling" is the mechanism name in API, "Tool Call" is a specific tool invocation.

See also: Chapter 03: Tools and Function Calling

MCP (Model Context Protocol) — protocol for connecting tools and data to LLM agents (Anthropic). Provides a standard interface for Resources, Tools, and Prompts.

Artifact / artifact_id — a large tool result (file, logs, JSON, HTML, etc.) that the runtime stores outside messages[]. The model only receives a short excerpt plus an artifact_id. This reduces cost/latency and helps avoid context overflows.

See also: Chapter 20: Cost & Latency Engineering

AgentState — a canonical agent run state structure: goal, constraints (including HITL), budgets, plan, known facts, open questions, artifacts, and risk flags. It is a contract between agent loop iterations and between components (for example, an "orchestrator" and an "analyzer").

StatePatch — a structured update to AgentState (for example: append facts, replace plan, add open questions). It's useful when one component normalizes observations and another decides the next action.

Risk Level — classification of side effects used by policy and HITL gates. A typical minimal set is: read_only, write_local, external_action.

ReAct Loop (Reasoning and Action Loop) — autonomous agent work pattern: Reason (reasons) → Act (acts) → Observe (observes) → repeats. Agent analyzes situation, performs action, receives result, and decides what to do next.

Etymology: ReAct = Reason + Act

See also: Chapter 04: Autonomy and Loops

Skills — reusable agent behavior modules. Contain instructions loaded into context depending on the task.

LLM and Context

Context Window — maximum number of tokens a model can process in one request. Limits dialogue history size. Examples: GPT-4o: 128k tokens, GPT-4o-mini: 128k tokens, Claude 3.5 Sonnet: 200k tokens.

See also: Chapter 01: LLM Physics

Token — unit of text processed by the model. One token ≈ 0.75 words (English) or ≈ 1.5 tokens per word (Russian). Everything the agent "knows" about the current task is limited by the number of tokens in the context window.

See also: Chapter 01: LLM Physics

System Prompt — instructions for the model that set agent role, goal, constraints, and work process. Passed in messages[0].content with role "system". Consists of: Role (Persona), Goal, Constraints, Format, SOP.

See also: Chapter 02: Prompting

Temperature — entropy parameter of token probability distribution. Temperature = 0 — deterministic behavior (for agents), Temperature > 0 — random behavior (for creative tasks).

Rule: For all agents, set Temperature = 0.

See also: Chapter 01: LLM Physics

Prompt Cache — an optimization on the LLM provider side: an identical messages[] prefix (system prompt + first messages) is cached and billed at a 50–90% discount. To hit the cache, the prefix must be byte-for-byte stable between requests in the same agent run. Any mutation of the system prompt (inserting the time, dynamic facts) or replacement of early messages busts the cache — iterations become several times more expensive and slower. So dynamic state (current time, facts) goes into the last user message or into tool results, not into messages[0].

Rule: messages[0] (system prompt) is append-only and is not mutated within a single run.

See also: Chapter 12: Agent Memory Systems, Chapter 13: Context Engineering

Prompting Techniques

Chain-of-Thought (CoT) — prompting technique "think step by step", forcing the model to generate intermediate reasoning before final answer. Critical for agents solving complex multi-step tasks.

See also: Chapter 02: Prompting

Few-Shot Learning / Few-Shot — prompting technique where examples of desired behavior are added to the prompt. Model adapts to format based on examples in context.

Antonym: Zero-Shot (instruction only, no examples)

See also: Chapter 02: Prompting

Zero-Shot Learning / Zero-Shot — prompting technique where only instruction is given to the model without examples. Saves tokens, but requires precise instructions.

Antonym: Few-Shot (with examples)

See also: Chapter 02: Prompting

In-Context Learning (ICL) — model's ability to adapt behavior based on examples within the prompt, without changing model weights. Works in Zero-shot and Few-shot modes.

See also: Chapter 02: Prompting

SOP (Standard Operating Procedure) — action algorithm encoded in the prompt. Sets sequence of steps to solve a task. CoT helps follow SOP step by step.

See also: Chapter 02: Prompting

Memory and Context

Memory (Agent Memory) — the set of mechanisms that let the agent retain and use information. In this course we split memory by horizon, not by "levels":

  • In-Run — a linear messages[] in the LLM context + at most one condense if needed. All in RAM, lives until the end of the Run.
  • Across Runs in one session — session-level state (plan, files read, last actions) passed between REPL iterations.
  • Across sessions — long-term memory the agent manages itself through tools (memory_save / memory_recall / memory_delete); stored in DB / files.

There is no separate "short-term memory" structure — that role is played by messages[] itself.

See also: Chapter 12: Agent Memory Systems

Working Memory — in this course this term means two different things, don't confuse them:

  1. The LLM context within a single Run — just the linear messages[] (system + dialogue + tool results) up to the next condense. No "layers" (Working/Summary/Facts) on top — that's a deprecated pattern, see Ch. 13 and Lab 11.
  2. Session-level state — plan, files read, last actions, that survive between REPL cycles. This isn't the LLM context, it's a structure serialized to disk and reloaded on restart. See Ch. 11.

See also: Chapter 11: State Management, Chapter 12: Agent Memory Systems, Chapter 13: Context Engineering

Long-term Memory — persistent storage between sessions: facts, preferences, past decisions. Principle from this course: the agent manages long-term memory itself through tools (memory_save / recall / delete), not via auto-extraction of facts from every message and not via mutation of the system prompt. Stored in DB / files; for a large corpus — vector search (RAG).

See also: Chapter 12: Agent Memory Systems, Lab 11

Episodic Memory — memory of specific events: "user asked about disk space on 2026-01-06". For most agents this is just records in memory_save with a date — no separate infrastructure needed. Useful for debugging.

See also: Chapter 12: Agent Memory Systems

Semantic Memory — general knowledge about the environment / user ("prefers JSON responses"). In practice — the same long-term memory records, just more general. No need to build a separate "semantic store".

See also: Chapter 12: Agent Memory Systems

Context Engineering — techniques for managing context effectively: counting tokens via usage.PromptTokens, a single compression threshold (~80% of the context window), one condense per run, and protecting tool_call ↔ tool_result pairs during truncation.

See also: Chapter 13: Context Engineering

Condense — an on-demand operation that summarizes the older part of messages[] through a separate LLM call to free room in the context window. Triggered by a single threshold (lastTokens > contextMax * 0.80 or reactively on ContextOverflowError). Principles of a correct condense: (1) the summary is inserted as a user message ("Context of previous work: …"), not as system — otherwise the prompt cache breaks and the model gets confused; (2) safeTail preserves tool_call ↔ tool_result pairs — you can't leave a tool_call without its tool_result or vice versa; (3) limit — one condense per run, otherwise the agent loops. See also Compact and Recall.

See also: Chapter 13: Context Engineering, Lab 09, Lab 11

RAG (Retrieval Augmented Generation) — technique for augmenting agent context with relevant documents from knowledge base via vector search. Documents are split into chunks, converted to vectors (embeddings), similar vectors are searched on query.

See also: Chapter 06: RAG and Knowledge Base

Agentic RAG — RAG integrated into the Agent Loop. The agent decides when and where to search, evaluates result quality, and can search again.

Self-RAG — RAG with self-assessment. The model evaluates relevance of retrieved documents and decides whether to answer, refine the query, or search again.

Hybrid Search — combination of keyword search (BM25) and vector search. Merges results via Reciprocal Rank Fusion.

Planning and Architecture

Planning — the agent's ability to break down a complex task into a sequence of simple steps and execute them in the correct order. Levels: implicit (ReAct), explicit (Plan-and-Solve), hierarchical.

See also: Chapter 09: Agent Anatomy

State Management — managing task execution state: progress, what's done, what's pending, resumption capability. Includes tool idempotency, retries with exponential backoff, deadlines, persist state.

See also: Chapter 11: State Management

Checkpoint — a snapshot of agent state at a given point in time. Allows resuming work after a failure without losing progress.

Reflexion — agent self-correction technique through error analysis. Cycle: Act → Observe → Fail → REFLECT → Plan Again. Agent analyzes why action didn't work and plans again.

See also: Chapter 09: Agent Anatomy

Security and Reliability

Junior-Employee Model — the central mental model of the course: the agent is a new employee, an intern. It can do useful work, but you wouldn't hand it a production root, the ability to send emails to customers, or the right to delete files without confirmation on day one. Four asymmetries make it more dangerous than a human intern: (1) speed — thousands of actions per minute, no time for "wait, what am I doing"; (2) parallelism — N copies in N sessions in parallel; (3) no consequences — no career, no fear of being fired, no "common sense"; (4) prompt injection as social engineering — any external text (issue, web page, email) can become an "instruction from the boss". From this follow all the rules: HITL by default for irreversible actions, RBAC per tool, sandboxes, audit, dry-run, rate limits.

See also: Preface: Mental Model, Chapter 17: Security and Governance

Blast Radius — the breadth of damage from a single agent action if it turns out wrong. A read-only kubectl get pods has small radius (just CPU); kubectl delete deployment in prod — huge (downtime). The rule: the larger the radius, the higher the requirements — HITL, dry-run, separate confirmation, narrow RBAC, separate service account. Used in risk classification of tools (Ch. 03, Ch. 17).

See also: Chapter 03: Tools and Function Calling, Chapter 17: Security and Governance

Recovery State Machine — the agent's structured response to an error from the LLM or a tool. Instead of "got an error → blew up", we move into the state Retry / Backoff / Fallback / Escalate. Examples: tool returned a 5xx → exponential backoff and retry (idempotent only); LLM returned malformed JSON → ask for repair; the same tool errors out 3 times → switch to fallback / report to the user; critical action without confirmation → escalate to HITL. Without an explicit state machine, the agent gets stuck in retry loops or silently corrupts data.

See also: Chapter 04: Agent Loop and ReAct, Chapter 08: Evals and Reliability, Chapter 19: Observability and Tracing

Grounding — anchoring agent to real data through Tools/RAG to avoid hallucinations. Agent must use tools to get facts, not invent them.

See also: Chapter 01: LLM Physics

Human-in-the-Loop (HITL) — mechanism for human confirmation of critical actions before execution. Includes Confirmation (confirmation), Clarification (clarification), Risk Scoring (risk assessment).

See also: Chapter 05: Safety and Human-in-the-Loop

Prompt Injection — attack on agent through input manipulation. Mental model: treat it as social engineering against an intern. Any external text (issue body, web page, email, file from a download) can become an "instruction from the boss" if it lands in the LLM context. Defense: (1) architectural — never give irreversible tools without HITL; (2) wrap external text in <external_data>...</external_data> and instruct the model to treat it as data, not as instructions; (3) allowlist of tools per source; (4) output filtering. Don't rely on a blacklist of "bad words" — it's bypassed in 5 minutes.

See also: Chapter 05: Safety and Human-in-the-Loop, Chapter 17: Security and Governance

Defense in Depth — multi-layered security strategy. Each layer (input validation, runtime checks, output filtering, monitoring) protects against different types of attacks.

Red Teaming — systematic testing of an agent for vulnerabilities. A team of "attackers" tries to make the agent break its rules.

Eval (Evaluation) — test to check agent work quality. Can check answer correctness, tool selection, SOP following, safety. In production systems used in CI/CD.

See also: Chapter 08: Evals and Reliability

DeepEval — framework for evaluating LLM application quality. Provides metrics: faithfulness, answer relevancy, context precision.

Multi-turn Evaluation — evaluating an agent in multi-step dialogues. Checks context coherence and correctness of actions at each step.

RAGAS — metrics framework for RAG systems. Measures context precision, context recall, faithfulness, and answer relevance.

Multi-Agent Systems

Multi-Agent System (MAS) — system of multiple agents working together. Can use Supervisor/Worker patterns, context isolation, task routing between specialized agents.

See also: Chapter 07: Multi-Agent Systems

A2A (Agent-to-Agent) — inter-agent communication protocol (Google). Standardizes task exchange between agents via Agent Card, Task lifecycle, and Message/Artifact.

Handoff — transferring control from one agent to another with part of the context. Used during escalation or domain switch.

Router Agent — an agent that classifies requests and routes them to the appropriate specialist.

Subagent — a dynamically created agent for a subtask. Unlike a Worker, it's spawned on the fly for a specific task.

Checklists

Checklist: Model Setup for Agent

  • Model supports Function Calling (checked via Lab 00)
  • Temperature = 0 set
  • Context window large enough (minimum 4k tokens)
  • System Prompt prohibits hallucinations
  • Dialogue history managed (truncated on overflow)

Checklist: Creating System Prompt

  • Role (Persona) clearly defined
  • Goal (Goal) specific and measurable
  • Constraints (Constraints) explicitly stated
  • Response format (Format) described
  • SOP (if applicable) detailed
  • CoT included for complex tasks
  • Few-Shot examples added (if needed)

Capability Benchmark (Characterization)

Before building agents, you must scientifically confirm that the selected model has necessary capabilities. In engineering, this is called Characterization.

Why Is This Needed?

We don't trust labels ("Super-Pro-Max Model"). We trust tests.

Problem without checking: You downloaded "Llama-3-8B-Instruct" model and started building an agent. After an hour of work, discovered that the model doesn't call tools, only writes text. You wasted time debugging code, though the problem was in the model.

Solution: Run capability benchmark before starting work. This saves hours.

What Do We Check?

1. Basic Sanity

  • Model responds to requests
  • No critical API errors
  • Basic response coherence

2. Instruction Following

  • Model can strictly adhere to constraints
  • Important for agents: they must return strictly defined formats
  • Test: "Write a poem, but don't use letter 'a'"
  • Why: Agent must return strictly defined formats, not "thoughts"

3. JSON Generation

  • Model can generate valid syntax
  • All tool interaction is built on JSON
  • If model forgets to close bracket }, agent crashes
  • Test: "Return JSON with fields name and age"

4. Function Calling

  • Specific model skill to recognize function definitions and generate special call token
  • Without this, tools are impossible (see Chapter 03: Tools)
  • Why: This is the foundation for Lab 02 and all subsequent labs

Why Don't All Models Know Tools?

LLM (Large Language Model) is a probabilistic text generator. It doesn't "know" about functions out of the box.

Function Calling mechanism is a result of special training (Fine-Tuning). Model developers add thousands of examples to training set:

User: "Check weather"
Assistant: <special_token>call_tool{"name": "weather"}<end_token>

If you downloaded "bare" Llama 3 (Base model), it hasn't seen these examples. It will just continue dialogue with text.

How to check: Run Lab 00 before starting work with tools.

Why Is Temperature = 0 Critical for Agents?

Temperature regulates "randomness" of next token selection:

  • High Temp (0.8+): Model chooses less probable words. Good for poems, creative tasks.
  • Low Temp (0): Model always chooses most probable word (ArgMax). Maximum determinism.

For agents that must output strict JSON or function calls, maximum determinism is needed. Any "creative" error in JSON breaks the parser.

Rule: For all agents, set Temperature = 0.

How to Interpret Results?

All tests passed

Model is ready for the course. Can continue work.

Warning: 3 out of 4 tests passed

Can continue, but with caution. Problems possible in edge cases.

Function calling failed

Critical: Model is not suitable for Lab 02-08. Need different model.

What to do:

  1. Download model with tools support:
    • Hermes-2-Pro-Llama-3-8B
    • Mistral-7B-Instruct-v0.2
    • Llama-3-8B-Instruct (some versions)
    • Gorilla OpenFunctions
  2. Restart tests

JSON generation failed

Model generates broken JSON (missing brackets, quotes).

What to do:

  1. Try different model
  2. Or use Temperature = 0 (but this doesn't always help)

Connection with Evals

Capability Benchmark is a primitive Eval (Evaluation). In production systems (LangSmith, PromptFoo), there are hundreds of such tests.

Topic development: See Chapter 08: Evals and Reliability to understand how to build comprehensive evals for checking agent work quality.

Practice

To perform capability benchmark, see Lab 00: Model Capability Benchmark.

SOP Templates

SOP for Incident (DevOps)

SOP for service failure:
1. Check Status: Check HTTP response code
2. Check Logs: If 500/502 — read last 20 log lines
3. Analyze: Find keywords:
   - "Syntax error" → Rollback
   - "Connection refused" → Check Database
   - "Out of memory" → Restart
4. Action: Apply fix according to analysis
5. Verify: Check HTTP status again

SOP for Ticket Processing (Support)

SOP for ticket processing:
1. Read: Read ticket completely
2. Context: Gather context (version, OS, browser)
3. Search: Search knowledge base for similar cases
4. Decide:
   - If solution found → Draft reply
   - If complex problem → Escalate
5. Respond: Send response to user

Decision Tables

Decision Table for Incident

Symptom Hypothesis Check Action Verification
HTTP 502 Service down check_http() → 502 - -
HTTP 502 Error in logs read_logs() → "Syntax error" rollback_deploy() check_http() → 200
HTTP 502 Error in logs read_logs() → "Connection refused" restart_service() check_http() → 200

FAQ: Why Is This Not Magic?

Q: Agent decides what to do itself. Is this magic?

A: No. Agent works by a simple algorithm:

  1. LLM receives tool descriptions in tools[]
  2. LLM generates JSON with tool name and arguments
  3. Your code (Runtime) parses JSON and executes real function
  4. Result is added to history
  5. LLM receives result in context and generates next step

There's no magic here — it's just a loop where the model receives results of previous actions.

See also: Chapter 04: Autonomy

Q: How does the model "know" which tool to call?

A: Model doesn't "know". It selects tool based on:

  1. Tool description (Description in JSON Schema)
  2. User query (semantic match)
  3. Context of previous results (if any)

The more accurate the Description, the better the selection.

See also: Chapter 03: Tools

Q: Does the model "remember" past conversations?

A: No. Model is stateless. It only processes the past in messages[] that you pass in each request. If you don't pass history, the model remembers nothing.

See also: Chapter 01: LLM Physics

Q: Why does the agent sometimes do different actions on the same request?

A: This happens due to probabilistic nature of LLM. If Temperature > 0, the model selects random token from distribution. For agents, always use Temperature = 0 for deterministic behavior.

See also: Chapter 01: LLM Physics

Q: Model invents facts. How to fix this?

A: Use Grounding:

  1. Prohibit inventing facts in System Prompt
  2. Give agent access to real data through Tools
  3. Use RAG for documentation access

See also: Chapter 01: LLM Physics

Q: Agent "forgets" beginning of conversation. What to do?

A: This happens when dialogue history exceeds the context window. The right solution is one, simple thing: condense.

  1. Count tokens via usage.PromptTokens from the provider response (not your own counter).
  2. When lastTokens > contextMax * 0.80 — do one condense: make a separate LLM call that summarizes the older part of the history into a single block, insert it as a user message ("Context of previous work: …"), leave the system prompt unchanged and the tail of history intact — with protection of tool_call ↔ tool_result pairs (safeTail).
  3. One condense per Run; if it doesn't help — the task should be split into sub-Runs.

What you don't need: importance scoring of messages, reordering of history, adaptive ladders prioritize → summarize → truncate, Working/Summary/Facts layers, a dynamic system prompt. These are classic forms of over-engineering — expensive, fragile, and they kill the prompt cache.

Long-term knowledge that has to survive a condense is not context anymore, it's long-term memory: the agent itself saves it via the memory_save tool and reads it back via memory_recall.

See also: Chapter 12: Agent Memory Systems, Chapter 13: Context Engineering, Lab 09, Lab 11

Q: Model doesn't call tools. Why?

A: Possible causes:

  1. Model doesn't support Function Calling (check via Lab 00)
  2. Poor tool description (Description unclear)
  3. Temperature > 0 (too random)

Solution: Use model with tools support, improve Description, set Temperature = 0.

See also: Chapter 03: Tools

Mini-Exercises

Exercise 1: Create Your SOP

Create an SOP for your domain following the "SOP Templates" section:

SOP for [your task]:
1. [Step 1]
2. [Step 2]
3. [Step 3]

Expected result:

  • SOP clearly describes action process
  • Steps are sequential and logical
  • Checks and verification included

Exercise 2: Create Decision Table

Create a decision table for your task following the "Decision Tables" section:

Symptom Hypothesis Check Action Verification
... ... ... ... ...

Expected result:

  • Table covers main scenarios
  • For each symptom there is hypothesis, check, action, and verification

Connection with Other Chapters