10. Planning and Workflow Patterns¶
Why This Chapter?¶
Simple ReAct loops work well for straightforward tasks. Once the task becomes multi-step, you usually need planning: break the work into steps, respect dependencies, handle failures, and keep track of progress.
This chapter covers planning patterns that help agents handle complex, long-running work without getting lost.
Real-World Case Study¶
Situation: User asks: "Deploy new microservice: create VM, install dependencies, configure network, deploy application, configure monitoring."
Problem: A simple ReAct loop may:
- Jump between steps randomly
- Skip dependencies (try to deploy before creating VM)
- Not track which steps are completed
- Fail and start from scratch
Solution: Use a planning pattern: first create a plan (steps + dependencies), then execute it while tracking state and handling failures.
Theory in Simple Terms¶
What Is Planning?¶
Planning is the process of breaking down a complex task into smaller, manageable steps with clear dependencies and execution order.
Key components:
- Task decomposition — Break down large tasks into steps
- Dependency graph — Understand which steps depend on others
- Execution order — Determine the sequence (or parallel execution)
- State tracking — Know what's done, what's in progress, and what failed
- Failure handling — Retry, skip, or abort on errors
Planning Patterns¶
Pattern 1: Plan→Execute
- Agent creates complete plan upfront
- Executes steps sequentially
- Simple, but inflexible
Pattern 2: Plan-and-Revise
- Agent creates initial plan
- Revises plan as it learns (e.g., step failed, new information discovered)
- More adaptive, but more complex
Pattern 3: DAG/Workflow
- Steps form a directed acyclic graph
- Some steps can execute in parallel
- Handles complex dependencies
How It Works (Step by Step)¶
Step 1: Task Decomposition¶
Agent receives a high-level task and breaks it into steps:
type Plan struct {
Steps []Step
}
type Step struct {
ID string
Description string
Dependencies []string // IDs of steps that must complete first
Status StepStatus
Result any
Error error
}
type StepStatus string
const (
StepStatusPending StepStatus = "pending"
StepStatusRunning StepStatus = "running"
StepStatusCompleted StepStatus = "completed"
StepStatusFailed StepStatus = "failed"
StepStatusSkipped StepStatus = "skipped"
)
Example: "Deploy microservice" is broken into:
- Create VM (no dependencies)
- Install dependencies (depends on: Create VM)
- Configure network (depends on: Create VM)
- Deploy application (depends on: Install dependencies, Configure network)
- Configure monitoring (depends on: Deploy application)
Step 2: Create Plan¶
Agent uses LLM for task decomposition:
func createPlan(ctx context.Context, client *openai.Client, task string) (*Plan, error) {
prompt := fmt.Sprintf(`Break this task into steps with dependencies:
Task: %s
Return JSON with array of steps. Each step has: id, description, dependencies (array of step IDs).
Example:
{
"steps": [
{"id": "step1", "description": "Create VM", "dependencies": []},
{"id": "step2", "description": "Install dependencies", "dependencies": ["step1"]}
]
}`, task)
messages := []openai.ChatCompletionMessage{
{Role: "system", Content: "You are a planning agent. Break tasks into steps."},
{Role: "user", Content: prompt},
}
resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "gpt-4o-mini",
Messages: messages,
Temperature: 0, // Deterministic planning
})
if err != nil {
return nil, err
}
// Parse JSON response into Plan
var plan Plan
json.Unmarshal([]byte(resp.Choices[0].Message.Content), &plan)
return &plan, nil
}
Step 3: Execute Plan¶
Execute steps considering dependencies:
func executePlan(ctx context.Context, plan *Plan, executor StepExecutor) error {
for {
// Find steps ready to execute (all dependencies completed)
readySteps := findReadySteps(plan)
if len(readySteps) == 0 {
// Check if all completed or stuck
if allStepsCompleted(plan) {
return nil
}
if allRemainingStepsBlocked(plan) {
return fmt.Errorf("plan blocked: some steps failed")
}
// Wait for async steps or retry failed steps
continue
}
// Execute ready steps (can be parallel)
for _, step := range readySteps {
step.Status = StepStatusRunning
result, err := executor.Execute(ctx, step)
if err != nil {
step.Status = StepStatusFailed
step.Error = err
// Decide: retry, skip, or abort
if shouldRetry(step) {
step.Status = StepStatusPending
continue
}
} else {
step.Status = StepStatusCompleted
step.Result = result
}
}
}
}
func findReadySteps(plan *Plan) []*Step {
ready := make([]*Step, 0, len(plan.Steps))
for i := range plan.Steps {
step := &plan.Steps[i]
if step.Status != StepStatusPending {
continue
}
// Check if all dependencies are completed
allDepsDone := true
for _, depID := range step.Dependencies {
dep := findStep(plan, depID)
if dep == nil || dep.Status != StepStatusCompleted {
allDepsDone = false
break
}
}
if allDepsDone {
ready = append(ready, step)
}
}
return ready
}
Step 4: Failure Handling¶
Implement retry logic with exponential backoff:
type StepExecutor interface {
Execute(ctx context.Context, step *Step) (any, error)
}
func executeWithRetry(ctx context.Context, executor StepExecutor, step *Step, maxRetries int) (any, error) {
var lastErr error
backoff := time.Second
for attempt := 0; attempt <= maxRetries; attempt++ {
if attempt > 0 {
// Exponential backoff
time.Sleep(backoff)
backoff *= 2
}
result, err := executor.Execute(ctx, step)
if err == nil {
return result, nil
}
lastErr = err
// Check if error is retryable
if !isRetryableError(err) {
return nil, err
}
}
return nil, fmt.Errorf("failed after %d attempts: %w", maxRetries, lastErr)
}
Step 5: Plan State Persistence¶
IMPORTANT: State persistence for resuming execution is described in State Management. Here, only the plan state structure is described.
// Plan state is used to track progress
// Persistence and resumption described in State Management
type PlanState struct {
PlanID string
Steps []Step
UpdatedAt time.Time
}
LivePlan: Plan as Program State¶
In a typical implementation the plan lives inside messages — as text generated by the model. The problem: when context is condensed (summarised to save tokens), the plan is lost together with other old messages.
In practice the plan lives in a Go struct, not in messages:
type LivePlan struct {
Goal string
Steps []PlanStep
Notes string
}
type PlanStep struct {
ID int
Description string
Status StepStatus // pending, in_progress, completed, cancelled
Result string
}
type StepStatus string
const (
StepPending StepStatus = "pending"
StepInProgress StepStatus = "in_progress"
StepCompleted StepStatus = "completed"
StepCancelled StepStatus = "cancelled"
)
LivePlan is injected into the system prompt on every iteration:
func (p *LivePlan) Render() string {
var sb strings.Builder
sb.WriteString("[PLAN]\n")
sb.WriteString("Goal: " + p.Goal + "\n")
for _, step := range p.Steps {
sb.WriteString(fmt.Sprintf("%d. [%s] %s", step.ID, step.Status, step.Description))
if step.Result != "" {
sb.WriteString(" → " + step.Result)
}
sb.WriteString("\n")
}
return sb.String()
}
The model updates the plan via a update_plan tool call. Step status changes automatically when a subagent runs:
- Before execution →
in_progress - After success →
completed - After error →
pending(rollback)
Rule: only one step may be in_progress at a time. When all steps are completed or cancelled the plan is automatically cleared, freeing space in the system prompt.
Task Routing: Choosing a Planning Strategy¶
Not every task needs a plan. A simple question like "What Go version?" does not require decomposition. Task Routing determines the strategy before work begins:
3-level routing¶
func routeTask(msg string, llmRouter Router) RoutingResult {
// Level 1: Pre-filter (no LLM)
if isSimpleRequest(msg) {
return RoutingResult{Strategy: "direct"}
}
// Level 2: LLM classification
result, err := llmRouter.Classify(msg)
if err == nil {
return result
}
// Level 3: Keyword fallback
return keywordFallback(msg)
}
func isSimpleRequest(msg string) bool {
words := strings.Fields(msg)
return len(words) <= 15 &&
len(msg) < 150 &&
containsFilePath(msg) &&
!containsBroadWords(msg) // "all files", "entire project", "refactoring"
}
| Routing result | What the agent does |
|---|---|
direct |
Executes the task immediately, without planning |
plan |
Calls the plan tool first, then executes |
plan_and_subagent |
Creates a plan and launches a subagent for each step |
Two Planning Modes¶
| Mode | Who controls | When to use |
|---|---|---|
| Flexible (guided loop) | LLM follows the plan in the system prompt | Medium tasks, adaptability needed |
| Strict (orchestrator) | Go code iterates over steps | Large tasks, reliability needed |
Flexible mode: the plan is injected into a [PLAN] section of the system prompt. The LLM decides which step to execute and updates statuses via update_plan.
Strict mode: Go code picks the next pending step, forms the task, launches a subagent, receives the result, and updates the status. The LLM does not control the order — the runtime does.
Task Routing selects the mode: plan → flexible, plan_and_subagent → strict.
Common Errors¶
Error 1: No Dependency Tracking¶
Symptom: Agent tries to execute steps out of order, causing failures.
Cause: Dependencies between steps are not tracked.
Solution:
// BAD: Execute steps in order without checking dependencies
for _, step := range plan.Steps {
executor.Execute(ctx, step)
}
// GOOD: Check dependencies first
readySteps := findReadySteps(plan)
for _, step := range readySteps {
executor.Execute(ctx, step)
}
Error 2: No State Persistence¶
Symptom: Agent starts from scratch after failure, losing progress.
Cause: Plan state is not persisted.
Solution: Use techniques from State Management to persist and resume plan execution.
Error 3: Infinite Retries¶
Symptom: Agent retries failed step forever, wasting resources.
Cause: No retry limits or backoff.
Solution: Implement maximum retry count and exponential backoff.
Error 4: No Parallel Execution¶
Symptom: Agent executes independent steps sequentially, wasting time.
Cause: Steps that can execute in parallel are not identified.
Solution: Use findReadySteps to get all ready steps, execute them concurrently:
// Execute ready steps in parallel
var wg sync.WaitGroup
for _, step := range readySteps {
wg.Add(1)
go func(s *Step) {
defer wg.Done()
executor.Execute(ctx, s)
}(step)
}
wg.Wait()
Error 5: Plan as Free Text¶
Symptom: The model skips steps — starts Step 3 before finishing Step 2. There is no objective criterion for "step completed".
Cause: The plan is generated as free text with no machine-readable structure. Step statuses are not tracked.
Solution: Use a structured plan with explicit statuses:
// BAD: plan as text in messages
plan := "1. Check logs\n2. Find error\n3. Fix\n4. Test"
// GOOD: LivePlan with tracking
plan := &LivePlan{
Goal: "Fix authorization error",
Steps: []PlanStep{
{ID: 1, Description: "Check nginx logs", Status: StepCompleted, Result: "401 on /api/auth"},
{ID: 2, Description: "Find root cause in code", Status: StepInProgress},
{ID: 3, Description: "Fix middleware", Status: StepPending},
{ID: 4, Description: "Test", Status: StepPending},
},
}
Pattern: Controller + Processor (orchestrator + normalizer)¶
When a workflow grows, it's useful to separate two concerns:
- Controller (orchestrator) selects the next step: call a tool or respond to the user.
- Processor (analyzer/normalizer) turns tool results and user answers into a structured state update (for example: "append facts", "replace plan", "add open questions").
This reduces noise in the agent loop. The controller does not get buried in large outputs. The processor does not decide on side effects.
Mini-trace (read-only search + file read):
1) Controller calls search.
2) ToolRunner stores the raw output as an artifact and returns a short payload (top-k matches).
3) Processor returns a state_patch:
{
"replace_plan": [
"Read the file with the best match",
"Write a short explanation for the user"
],
"append_known_facts": [
{
"key": "client_error_candidate",
"value": "pkg/errors/client_error.go:12",
"source": "tool",
"artifact_id": "srch_123",
"confidence": 0.9
}
]
}
4) Controller reads the file and produces the final answer.
Mini-Exercises¶
Exercise 1: Task Decomposition¶
Implement a function that breaks a task into steps:
func decomposeTask(task string) (*Plan, error) {
// Use LLM to create plan
// Return Plan with steps and dependencies
}
Expected result:
- Plan contains logical steps
- Dependencies correctly defined
- Steps can execute in valid order
Exercise 2: Dependency Resolution¶
Implement findReadySteps that returns steps whose all dependencies are completed:
Expected result:
- Returns only steps with all satisfied dependencies
- Handles cyclic dependencies (detects and errors)
Exercise 3: Plan Execution with Retries¶
Implement plan execution with retry logic:
func executePlanWithRetries(ctx context.Context, plan *Plan, executor StepExecutor, maxRetries int) error {
// Execute plan with retry logic
// Handle failures correctly
}
Expected result:
- Steps execute considering dependencies
- Failed steps retry up to maxRetries
- Plan completes or fails correctly
Completion Criteria / Checklist¶
Completed:
- Can break complex tasks into steps
- Understand dependency graphs
- Can execute plans considering dependencies
- Handle failures with retries
- Persist plan state for resumption
Not completed:
- Step execution without dependency checks
- No state persistence
- Infinite retries without limits
- Sequential execution when parallel is possible
Connection with Other Chapters¶
- Chapter 04: Autonomy and Loops — Planning extends ReAct loop for complex tasks
- Chapter 07: Multi-Agent Systems — Planning can coordinate multiple agents
- Chapter 11: State Management — Reliable plan execution (idempotency, retries, persist)
- Chapter 21: Workflow and State Management in Production — Production workflow patterns
IMPORTANT: Planning focuses on task decomposition and dependency graphs. Execution reliability (persist, retries, deadlines) is described in State Management.
What's Next?¶
After mastering planning patterns, proceed to:
- 11. State Management — Learn how to guarantee reliable plan execution