02. Prompting as an Engineering Discipline¶
Why Is This Needed?¶
A prompt is code for a neural network. Unlike regular code, a prompt works through probability management.
Without proper prompting, an agent will:
- Perform actions randomly, without logic
- Skip important analysis steps
- Perform dangerous actions without confirmation
- Respond in the wrong format
This chapter teaches you how to create effective prompts that control agent behavior.
Real-World Case Study¶
Situation: You've created an agent for incident handling. A user writes: "Service is down, fix it"
Problem: The agent immediately restarts the service without checking logs. Or vice versa, it will analyze for a long time without applying a fix.
Solution: A proper System Prompt with SOP (Standard Operating Procedure) sets the sequence of actions: first check status, then logs, then analysis, then fix, then verification.
TL;DR: What to Remember¶
- System Prompt — the agent behavior specification. Consists of: Role, Goal, Constraints, Format, SOP.
- In-Context Learning (ICL): Zero-shot (instruction-only) — compact, flexible, but model may misinterpret format. Few-shot (demonstration) — instruction + examples, more accurate for complex formats. In practice — combination.
- CoT (Chain-of-Thought): forces the model to think step by step. Critical for agents.
- SOP: an action algorithm encoded in the prompt. It sets the process, and CoT helps follow it.
- Task Decomposition: complex tasks are broken down into subtasks.
- Tools Schema is passed as a separate field
tools[], not inside the System Prompt. - Few-shot examples must be consistent — one format, otherwise the model will get confused.
System Prompt Structure¶
A good System Prompt consists of five blocks:
1. Role (Persona) — who you are
2. Goal — what needs to be achieved
3. Constraints — what cannot be done
4. Format — what format to respond in
5. SOP — action algorithm
System Prompt Template¶
You are [Role] with [experience/qualification].
Your goal is [Goal].
Constraints:
- [Constraint 1]
- [Constraint 2]
Response format:
- [Format rule 1]
- [Format rule 2]
SOP for [task type]:
1. [Step 1]
2. [Step 2]
3. [Step 3]
Example for DevOps¶
You are a Senior DevOps Engineer with 10 years of experience.
Your goal is to restore service functionality as quickly as possible.
Constraints:
- Never use commands like `rm -rf /`
- Always ask for confirmation before deleting data
- If unsure about an action — ask the user
Response format:
- If you need to call a tool — use Tool Calling
- If you need to clarify — respond with text
SOP for incidents:
1. Check service status (check_http)
2. If not 200 — read logs (read_logs)
3. Analyze errors
4. Apply fix (restart/rollback)
5. Verify (check_http again)
Example for Support¶
You are a Tier 2 Customer Support Agent.
Your goal is to solve the user's problem quickly and politely.
Constraints:
- Always be polite
- If the problem is complex — escalate
- Don't give technical details if the user is not technical
Response format:
- Use structured format: {"action": "...", "user_id": "..."}
SOP for ticket processing:
1. Read the ticket completely (get_ticket)
2. Gather context (software version, OS, browser)
3. Search knowledge base (search_kb)
4. If solution found — formulate response
5. If not — escalate (escalate_ticket)
Example for Data Analytics¶
You are a Data Analyst with experience in SQL and BI tools.
Your goal is to provide accurate data and analytics.
Constraints:
- Use ONLY read-only SQL (SELECT)
- Always check data quality before analysis
- If data is incorrect — report it
Response format:
- First show SQL query
- Then results and analysis
SOP for analysis:
1. Understand user's question
2. Formulate SQL query
3. Check table schema (describe_table)
4. Execute query (sql_select)
5. Analyze results
6. Generate report
In-Context Learning (ICL): Zero-shot and Few-shot¶
After defining the System Prompt structure, the next question is: how do we best convey the desired behavior to the model?
In-Context Learning (ICL) is the model's ability to adapt behavior based on examples within the prompt, without changing model weights.
ICL works in two modes:
Zero-Shot (instruction only)¶
We provide the model with only instruction, without examples:
System Prompt: "You are a DevOps engineer. When the user asks to check logs, use the read_logs tool."
User: "Check nginx logs"
Assistant: [Model performs task based on instruction]
When to use:
- Simple tasks where the model understands instructions well
- Limited context (need to save tokens)
- Model is already trained on similar tasks
Few-Shot (instruction + examples)¶
We provide the model with instruction + examples of desired behavior:
System Prompt: "You are a DevOps engineer. Examples:
Example 1:
User: "Check nginx logs"
Assistant: {"tool": "read_logs", "args": {"service": "nginx"}}
Example 2:
User: "Restart server web-01"
Assistant: {"tool": "restart", "args": {"name": "web-01"}}
User: "Check server status"
Assistant: {"tool": "check_status", "args": {"hostname": "web-01"}}
Note: This is an educational demonstration of response format in prompt text. When actually using Function Calling (see Chapter 03: Tools) the model returns a tool call in a separate
tool_callsfield, not as text in the response.
When to use:
- Complex response format (JSON, structured data)
- High accuracy in following format needed
- Model may misinterpret instruction
Terminology: Instruction vs Demonstration¶
The terms Instruction-only and Demonstration (or Few-shot) are alternative ways to describe the same ICL modes:
- Instruction-only = Zero-shot: We describe to the model what to do (compact, flexible, but model may misinterpret format)
- Demonstration/Few-shot = Few-shot: We show the model examples of desired behavior (model accurately copies format, but takes more tokens)
Default rule: Use a combination — instruction + 1-2 examples for complex formats or edge cases.
ICL Mode Selection Table¶
| Mode | What we pass to model | When to use | Example |
|---|---|---|---|
| Zero-shot (Instruction-only) | Only instruction | Simple tasks, model understands instructions well, limited context, token savings | "You are a DevOps engineer. Check logs via read_logs." |
| Few-shot (Demonstration) | Instruction + examples | Complex response format (JSON, structured data), high accuracy in following format needed | Instruction + 2-3 response format examples |
Practical Example: Few-Shot for Support¶
// System Prompt with Few-Shot examples
systemPrompt := `You are a Support agent.
Response format examples:
Example 1:
User: "My account is blocked"
Assistant: {"action": "check_account", "user_id": "extract_from_ticket"}
Example 2:
User: "Can't log into system"
Assistant: {"action": "check_login", "user_id": "extract_from_ticket"}`
messages := []openai.ChatCompletionMessage{
{Role: "system", Content: systemPrompt},
{Role: "user", Content: "Access problem"},
}
// Model processes examples in System Prompt and generates response in the same format:
// {"action": "check_account", "user_id": "..."}
Anti-Example: Inconsistent Few-Shot Examples¶
❌ Bad: Examples in different formats
Example 1:
User: "Check logs"
Assistant: {"tool": "read_logs", "service": "nginx"}
Example 2:
User: "Restart server"
Assistant: {"action": "restart", "target": "web-01"} // Different format!
Example 3:
User: "Status"
Assistant: check_status("web-01") // Another format!
Problem: The model receives three different formats and doesn't understand which to use. The result is unpredictable.
✅ Good: All examples in one format
Example 1:
User: "Check logs"
Assistant: {"tool": "read_logs", "args": {"service": "nginx"}}
Example 2:
User: "Restart server"
Assistant: {"tool": "restart", "args": {"name": "web-01"}} // Same format
Example 3:
User: "Status"
Assistant: {"tool": "check_status", "args": {"hostname": "web-01"}} // Same format
Result: Model clearly understands the pattern and follows it.
Few-shot sets not only format but also hidden hints
Examples in few-shot can not only show the response format but also implicitly hint at the desired conclusion. The model may "pick up" patterns from examples and shift responses toward what's shown in demonstrations, even if it doesn't match actual data or the task.
Problem: If correct answers in examples are always in position "A", or all examples lead to one type of solution, the model may adopt this positional or thematic bias.
Solution: Use diverse examples, shuffle positions of correct answers, include counterexamples. See section "How not to hint at the answer" below.
Connection of ICL with Other Techniques¶
- ICL → CoT: Demonstrations can set not only response format, but also reasoning format. For example, you can show examples with "Thought: ... Action: ... Observation: ...".
- CoT → SOP: SOP is a process fixed by instruction and/or examples. CoT helps the model follow this process step by step.
See more:
- Lab 01: Basics — working with context and memory
- Lab 02: Tools — response format via Function Calling
- Lab 06: Incident (SOP) — SOP as action algorithm
End-to-End Example: What Exactly the Agent Sends to LLM¶
For those who need to understand the protocol: This section shows the technical structure of LLM API requests. If you're just starting, you can skip and return later.
When developing an agent, it's important to understand exactly where different parts of the prompt are located and how they get into the LLM request.
LLM Request Structure¶
graph LR
A[SystemPrompt] --> D[ChatCompletionRequest]
B[ToolsSchema] --> D
C[UserInput] --> D
D --> E[LLM API]
E --> F{Response}
F -->|Text| G[assistant content]
F -->|Tool Call| H[tool_calls]
H --> I[Runtime executes]
I --> J[tool result]
J --> K[Adds to messages]
K --> D
Where Is What Located?¶
1. System Prompt — in field Messages[0].Role = "system":
- Instructions (Role, Goal, Constraints)
- Few-shot examples (if used)
- SOP (action algorithm)
2. Tools Schema — in separate field Tools (NOT inside prompt!):
- JSON Schema tool descriptions
- Separate from prompt, but model sees them together
3. User Input — in field Messages[N].Role = "user":
- Current user request
- Conversation history (previous user/assistant messages)
4. Tool Results — added by runtime to Messages:
- After tool execution
Role = "tool",ToolCallIDlinks to call
Example 1: Text-only Response (without tools)¶
Request:
{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a DevOps engineer.\n\nExamples:\nUser: \"How are you?\"\nAssistant: \"All good, how can I help?\""
},
{
"role": "user",
"content": "Hello!"
}
]
}
Where is what located:
- System Prompt (instructions + few-shot examples) →
messages[0].content - User Input →
messages[1].content - Tools → absent
Response:
{
"choices": [{
"message": {
"role": "assistant",
"content": "Hello! How can I help?",
"tool_calls": null
}
}]
}
Example 2: Tool-call Response (with tools, 2 moves)¶
Move 1: Request with tool call
{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a DevOps engineer. Use tools to check services."
},
{
"role": "user",
"content": "Check nginx status"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "check_status",
"description": "Check service status",
"parameters": {
"type": "object",
"properties": {
"service": {
"type": "string",
"description": "Service name"
}
},
"required": ["service"]
}
}
}
]
}
Where is what located:
- System Prompt →
messages[0].content - User Input →
messages[1].content - Tools Schema → separate field
tools[](NOT inside prompt!)
Response #1 (tool call):
{
"choices": [{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "check_status",
"arguments": "{\"service\": \"nginx\"}"
}
}
]
}
}]
}
Runtime executes tool:
- Parses
tool_calls[0].function.arguments→{"service": "nginx"} - Calls
check_status("nginx")→ result:"nginx is ONLINE"
Move 2: Request with tool result
{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a DevOps engineer. Use tools to check services."
},
{
"role": "user",
"content": "Check nginx status"
},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "check_status",
"arguments": "{\"service\": \"nginx\"}"
}
}
]
},
{
"role": "tool",
"content": "nginx is ONLINE",
"tool_call_id": "call_abc123"
}
],
"tools": [...]
}
Response #2 (final answer):
{
"choices": [{
"message": {
"role": "assistant",
"content": "nginx is working normally, service ONLINE",
"tool_calls": null
}
}]
}
What happens:
- System Prompt contains instructions (may contain few-shot examples of tool selection)
- Tools Schema is passed as a separate field
tools[](not inside prompt!) - User Input — current request
- The model processes all three parts and generates tool_call (tool name + arguments)
- Runtime (your agent code) validates tool_call, executes tool and adds result to
messageswithrole = "tool". Runtime is the code you write in Go to manage the agent loop (see Chapter 00: Preface) - Second request includes tool result, model formulates final answer
Key Points¶
-
System Prompt — this is text in
Messages[0].Content. Can contain instructions, few-shot examples, SOP. -
Tools Schema — this is a separate field
Toolsin the request. Not located inside prompt, but the model receives it together with the prompt. -
Few-shot examples — located inside System Prompt (text). They show the model response format or tool selection.
-
User Input — this is
Messages[N].Role = "user". Can be multiple messages (conversation history). -
Tool Results — added by runtime to
MessageswithRole = "tool"after tool execution.
Important: Tools Schema and System Prompt are different things:
- System Prompt — text for model (instructions, examples)
- Tools Schema — structured tool descriptions (JSON Schema)
See detailed protocol: Chapter 03: Tools and Function Calling
Chain-of-Thought (CoT): Think Step by Step¶
Chain-of-Thought (CoT) is the "Think step by step" technique. For agents, CoT is critical.
When Is CoT Needed?¶
❌ Bad: "Fix server" (one step)
✅ Good: "Analyze situation, form hypothesis, check it, propose solution" (chain)
CoT is needed for:
- Complex tasks requiring analysis
- Tasks with multiple steps
- Tasks where hypothesis needs to be checked before action
CoT is not needed for:
- Simple single-step tasks
- Tasks where response format is more important than process
How to Set CoT?¶
CoT can be set via instruction (Zero-shot) or via examples (Few-shot):
Zero-shot CoT:
Think step by step:
1. Analyze the situation
2. Form hypothesis
3. Check hypothesis
4. Apply solution
Few-shot CoT:
Reasoning example (this is an example of how agent works in a loop):
Thought: User complains about slow performance. I'll start by checking metrics.
Action: get_cpu_metrics()
Observation: CPU 95%, process: ffmpeg
Thought: ffmpeg is consuming resources. I'll check what this process is.
Action: get_process_info(pid=12345)
The model processes the reasoning format example and follows it. In practice, this happens in a loop: each Observation is added by Runtime (your code) to history, and the model receives it on the next iteration.
CoT and Agent Loop¶
The "Thought-Action-Observation" format is a formalization of the Agent Loop (iterative agent process).
How it works:
- Thought — model analyzes situation and thinks what to do next
- Action — model selects tool and parameters (generates
tool_call) - Observation — Runtime (your code) executes tool and adds result to history (
role = "tool") - Loop repeats: model receives Observation in context, thinks again (Thought), selects next Action
Iteration example:
Iteration 1:
Thought: "Need to check CPU metrics"
Action: get_cpu_metrics()
Observation: "CPU 95%, process: ffmpeg" ← runtime added to history
Iteration 2 (model receives Observation in context):
Thought: "ffmpeg is consuming resources. I'll check process details"
Action: get_process_info(pid=12345)
Observation: "This is video conversion" ← runtime added to history
Iteration 3:
Thought: "Legitimate process, but blocking system. I'll suggest limiting priority"
Action: [final answer to user]
Important: This isn't "magic" — the model simply processes results of previous tools in context (messages[]) and generates the next step based on this context.
CoT can be post-hoc (after the fact)
The chain of reasoning (CoT) that the model generates does not necessarily reflect the actual inference process. The model may first "solve" the task (based on probabilistic mechanism) and then generate a "nice explanation" for that solution.
Problem: A nice CoT is not proof of correctness. The model may confidently rationalize an incorrect conclusion.
Solution: Don't rely on CoT as the sole source of truth. Verify answers through tools, evals, and runtime validation. For critical decisions, use Human-in-the-Loop regardless of explanation quality.
See more: Chapter 04: Autonomy and Loops — how ReAct Loop and iterative agent process work.
CoT Examples in Different Domains¶
DevOps¶
Thought: User complains about slow performance. I'll start by checking metrics.
Action: get_cpu_metrics()
Observation: CPU 95%, process: ffmpeg
Thought: ffmpeg is consuming resources. I'll check what this process is.
Action: get_process_info(pid=12345)
Observation: This is video conversion started by user
Thought: This is a legitimate process, but it's blocking the system. I'll suggest user limit priority.
Security¶
Thought: Received alert about suspicious activity. I'll start with triage.
Action: query_siem(query="host=192.168.1.10 AND time>now-1h")
Observation: Multiple failed login attempts
Thought: This looks like brute-force. I'll check the source.
Action: get_source_ip()
Observation: IP from unfamiliar country
Thought: High risk. I'll isolate host, but first request confirmation.
Data Analytics¶
Thought: User asks about sales. Need to understand what data is available.
Action: describe_table("sales")
Observation: Table contains fields: date, region, amount
Thought: Now I'll formulate SQL query to get sales for the last month.
Action: sql_select("SELECT region, SUM(amount) FROM sales WHERE date > NOW() - INTERVAL '1 month' GROUP BY region")
Observation: Results: Region A: 100k, Region B: 150k
Thought: I'll analyze results and formulate conclusions.
SOP and Task Decomposition: Agent Process¶
SOP (Standard Operating Procedure) — is an action algorithm encoded in the prompt. Sets the process, CoT helps follow it step by step.
Connection: Goal → Constraints → SOP → CoT¶
- Goal/Constraints set boundaries (what needs to be achieved, what cannot be done)
- SOP sets the process (what steps to perform)
- CoT helps go step by step (think at each stage)
Why Is This Needed?¶
Without SOP, an agent may "wander": immediately restart, then read logs, then restart again.
Why SOP is important: directing model attention
Without SOP, the model receives: User: Fix it. Its probabilistic mechanism may output: Call: restart_service. This is the most "popular" action.
With SOP, the model is forced to generate text:
- "Step 1: I need to check HTTP status." → This increases probability of calling
check_http - "HTTP is 502. Step 2: I need to check logs." → This increases probability of calling
read_logs
We direct model attention along the needed path through explicit step indication in the prompt.
Chain-of-Thought in action:
Notice the System Prompt with SOP:
"Think step by step following this SOP: 1. Check HTTP... 2. Check Logs..."
Why is this needed?
Without this prompt, the model sees: User: Fix it.
Its probabilistic mechanism may output: Call: restart_service. This is the most "popular" action.
With this prompt, the model is forced to generate text:
- "Step 1: I need to check HTTP status." → This increases probability of calling
check_http - "HTTP is 502. Step 2: I need to check logs." → This increases probability of calling
read_logs
We direct model attention along the needed path.
Example SOP for Incident (DevOps)¶
SOP for service failure:
1. Check Status: Check HTTP response code
2. Check Logs: If 500/502 — read last 20 log lines
3. Analyze: Find keywords:
- "Syntax error" → Rollback
- "Connection refused" → Check Database
- "Out of memory" → Restart
4. Action: Apply fix according to analysis
5. Verify: Check HTTP status again
Example SOP for Support¶
SOP for ticket processing:
1. Read: Read ticket completely
2. Context: Gather context (version, OS, browser)
3. Search: Search knowledge base for similar cases
4. Decide:
- If solution found → Draft reply
- If complex problem → Escalate
5. Respond: Send response to user
Example SOP for Security¶
SOP for alert triage:
1. Severity: Determine criticality (Low/Medium/High/Critical)
2. Evidence: Gather evidence (logs, metrics, network traffic)
3. Triage:
- If False Positive → Close alert
- If True Positive → Containment (with confirmation!)
4. Report: Generate report for SOC
Task Structuring (Task Decomposition)¶
Complex tasks need to be broken down into subtasks.
Without decomposition:
With decomposition:
User: "Figure out database problem"
Assistant:
1. Check DB availability (ping_db)
2. Check metrics (cpu, memory, connections)
3. Read logs (read_db_logs)
4. Analyze errors
5. Propose solution
Common Prompting Mistakes¶
Mistake 1: Too Generic Prompt¶
Symptom: Model doesn't understand what to do, responds with generic phrases.
Cause: Prompt doesn't set specific role, goal, and action process.
Solution:
// BAD
You are an assistant.
// GOOD
You are a Senior DevOps Engineer with 10 years of experience.
Your goal is to restore service functionality.
Follow SOP: 1. Check Status 2. Check Logs 3. Analyze 4. Fix 5. Verify
Mistake 2: Missing CoT¶
Symptom: Model tries to do everything at once, skips analysis steps.
Cause: Prompt doesn't set reasoning format (Chain-of-Thought), model doesn't "think" step by step.
Solution:
// BAD
Fix server.
// GOOD
Think step by step:
1. Analyze the situation
2. Form hypothesis
3. Check hypothesis
4. Apply solution
Mistake 3: Missing Constraints¶
Symptom: Model may perform dangerous actions without confirmation.
Cause: Prompt doesn't set constraints on agent actions.
Solution:
// BAD
You are a DevOps engineer. Do what's needed.
// GOOD
You are a DevOps engineer.
Constraints:
- Never delete data without confirmation
- Always check logs before action
Mistake 4: Suggested Answer (Anchoring Bias)¶
Symptom: Model shifts answer toward what's explicitly suggested in the prompt or user request, even if incorrect.
Cause: Prompt or request contains explicit hints about the desired answer ("I think the correct answer is A", "most likely this is X"). Models trained on maximizing human preferences (RLHF) may "flatter" and agree.
Solution:
// BAD
User: "I think the problem is in the database. Check logs."
System Prompt: "User thinks the problem is in DB. Check logs."
// GOOD
User: "I think the problem is in the database. Check logs."
System Prompt: "Check logs and analyze all possible causes.
Analyze data objectively, don't limit yourself to user assumptions."
Practice: Separate facts (data, logs, metrics) from user preferences/hypotheses. Don't include preferences in context as facts.
Mistake 5: Bias in few-shot ("answer always A")¶
Symptom: Model adopts positional pattern from examples: if correct answers in few-shot are always in position "A", model starts choosing "A" more often than it should.
Cause: Few-shot examples aren't diverse; correct answers are always in one position or follow one pattern.
Solution:
// BAD
Example 1: Question → Answer A (correct)
Example 2: Question → Answer A (correct)
Example 3: Question → Answer A (correct)
// GOOD
Example 1: Question → Answer A (correct)
Example 2: Question → Answer B (correct)
Example 3: Question → Answer C (correct)
// Or shuffle positions, add counterexamples
Practice: Shuffle positions of correct answers, include diverse examples, add counterexamples (incorrect answers with explanation of why they're wrong).
Mistake 6: Blind Trust in CoT¶
Symptom: Developer relies on model's "nice explanation" as proof of answer correctness without verifying the answer itself.
Cause: CoT looks logical and convincing but may be post-hoc rationalization of incorrect conclusion.
Solution:
// BAD
Model: "Thought: Problem is in DB. Action: restart_database()"
Developer: "Great, logical!" → Executes action without verification
// GOOD
Model: "Thought: Problem is in DB. Action: restart_database()"
Developer: Verifies via tools (check_db_status, read_logs),
validates via evals, requests confirmation for critical actions
Practice: CoT is a tool for improving model reasoning, but not a source of truth. Always verify answers through tools and evals, especially for critical actions.
How Not to Hint at the Answer¶
Few-shot and CoT can inadvertently "hint" at the desired conclusion. Here's a checklist to avoid this:
Checklist: Neutral Formulations¶
✅ Good:
- Use neutral request formulations
- Separate facts from user preferences
- Don't include assumptions as facts in context
❌ Bad:
- "I think the correct answer is A, check it"
- "User thinks X, use this as fact"
- "Most likely problem is Y, check only Y"
Checklist: Diverse Few-Shot Examples¶
✅ Good:
- Shuffle positions of correct answers (A, B, C, D)
- Include diverse task types
- Add counterexamples (incorrect answers with explanation)
❌ Bad:
- All correct answers in position "A"
- All examples of one type
- Only positive examples
Checklist: Separating Preferences and Facts¶
✅ Good:
Facts (include in context):
- Logs show error 500
- CPU metrics: 95%
Preferences/hypotheses (separate, not as facts):
- User assumes problem is in DB
- Hypothesis: problem is in network
❌ Bad:
Checklist: Verification Through Tools¶
✅ Good:
- Verify answers through tools (read_logs, check_status)
- Use evals to check robustness to hints
- Don't rely only on CoT as proof
❌ Bad:
- Accept model answer only because CoT looks logical
- Don't verify answers with tools
- No evals on robustness to anchoring
Example: Anti-Prompt and Fixed Version¶
❌ Bad (hints at answer):
You are a DevOps engineer. User thinks the problem is in the database.
Check logs and confirm that the problem is in the DB.
Problems:
- Includes user assumption as fact
- Hints at desired conclusion ("confirm that problem is in DB")
- Model may agree even if problem isn't in DB
✅ Good (neutral):
You are a DevOps engineer. User reported a problem.
Check logs, metrics, and all possible causes.
Analyze data objectively, don't limit yourself to assumptions.
Improvements:
- Doesn't include assumption as fact
- Doesn't hint at desired conclusion
- Requires objective analysis of all causes
Completion Criteria / Checklist¶
✅ Completed:
- Role (Persona) clearly defined
- Goal is specific and measurable
- Constraints explicitly stated
- Response format (Format) described
- SOP (if applicable) detailed
- CoT included for complex tasks
- Few-Shot examples added (if complex format needed)
- Few-Shot examples consistent (one format)
- Few-Shot examples diverse (no positional bias)
- Prompt doesn't hint at desired answer (neutral formulations)
- User preferences separated from facts
- Answers verified through tools and evals, not only through CoT
❌ Not completed:
- Prompt too generic (no specific role and goal)
- Missing CoT for complex tasks
- No constraints on dangerous actions
- Few-Shot examples in different formats
- Few-Shot examples have positional bias (all answers in position "A")
- Prompt hints at desired answer ("confirm that X")
- User preferences included as facts
- Blind trust in CoT without verification through tools
Mini-Exercises¶
Exercise 1: Create System Prompt¶
Create a System Prompt for an agent that processes support tickets:
Expected result:
- Role: Support Agent
- Goal: solve user's problem
- Constraints: politeness, escalation of complex cases
- Format: structured JSON with fields
actionanduser_id - SOP: Read → Context → Search → Decide → Respond
Exercise 2: Add CoT¶
Improve the prompt from exercise 1 by adding Chain-of-Thought:
Expected result:
- Added reasoning format: "Thought: ... Action: ... Observation: ..."
- Few-shot examples with CoT (if using few-shot)
- Or instruction "Think step by step" (if zero-shot)
For the Curious¶
This section explains the mechanics of few-shot and CoT at the model level. Can be skipped if you're only interested in practice.
Why Does Few-Shot Work?¶
Self-Attention Mechanism¶
The Self-Attention mechanism in the transformer allows the model to identify connections between tokens in context. When the model processes examples:
- Intuitively: Model "adjusts" to the pattern it processes in context
- Mechanically: Vector representation of tokens shifts toward the needed format, because previous tokens (examples) set this context
- Practically: Result quality strongly depends on example quality and their consistency
Why Does CoT Work?¶
Imagine a task: "What is 23 * 41 + 12?"
Without CoT:
- Model must output answer "955" immediately
- This requires enormous computational power in one step
- Error probability is high (model may "guess" incorrectly)
With CoT:
- Model generates: "23 * 40 = 920... 23 * 1 = 23... sum 943... plus 12... answer 955"
- By generating intermediate tokens ("920", "943"), the model offloads computations to context
- Next token is predicted based on expanded context containing intermediate results
- This turns a complex task into a series of simple tasks
For agents, CoT is critical, because agents solve complex multi-step tasks where each step depends on the previous one.
Connection to Other Chapters¶
- LLM Physics: Understanding probabilistic nature of LLM helps create effective prompts, see Chapter 01: LLM Physics
- Tools: How prompt affects tool selection, see Chapter 03: Tools
- Autonomy: How SOP and CoT work in agent loop, see Chapter 04: Autonomy
What's Next?¶
After studying prompting, proceed to:
- 03. Tools and Function Calling — how agent interacts with the real world
Navigation: ← LLM Physics | Table of Contents | Tools →