12. Agent Memory Systems¶
Why This Chapter?¶
Agents need memory to keep context across conversations, learn from past interactions, and avoid repeating mistakes. Without it, they forget important details and waste tokens re-explaining the same things.
This chapter covers memory systems that help agents remember, retrieve, and forget information efficiently.
Real-World Case Study¶
Situation: User asks agent: "What was the database problem we fixed last week?" Agent responds: "I don't have information about that."
Problem: Agent has no memory of past conversations. Each interaction starts from scratch, wasting context and user time.
Solution: A memory system stores important facts, retrieves them when needed, and forgets outdated information so the agent stays within context limits.
Theory in Simple Terms¶
Memory Types¶
Short-term memory:
- Current conversation history (stored in runtime, not in long-term storage)
- Limited by the LLM context window
- Lost when the conversation ends
- Note: Short-term memory management (summarization, selection) is described in Context Engineering. The term "working memory" is used in Context Engineering to denote recent conversation turns in context.
Long-term memory (persistent storage):
- Facts, preferences, past decisions
- Stored in database/files
- Persists between conversations
Episodic memory:
- Specific events: "User asked about disk space on 2026-01-06"
- Useful for debugging and learning
Semantic memory:
- General knowledge: "User prefers JSON responses"
- Extracted from episodes
Memory Operations¶
- Store — Save information for future
- Retrieve — Find relevant information
- Forget — Delete outdated information
- Update — Change existing information
How It Works (Step by Step)¶
Step 1: Memory Interface¶
type Memory interface {
Store(key string, value any, metadata map[string]any) error
Retrieve(query string, limit int) ([]MemoryItem, error)
Forget(key string) error
Update(key string, value any) error
}
type MemoryItem struct {
Key string
Value any
Metadata map[string]any
Created time.Time
Accessed time.Time
TTL time.Duration // Time to live
}
Step 2: Storing Information¶
type SimpleMemory struct {
store map[string]MemoryItem
mu sync.RWMutex
}
func (m *SimpleMemory) Store(key string, value any, metadata map[string]any) error {
m.mu.Lock()
defer m.mu.Unlock()
m.store[key] = MemoryItem{
Key: key,
Value: value,
Metadata: metadata,
Created: time.Now(),
Accessed: time.Now(),
TTL: 24 * time.Hour, // Default TTL
}
return nil
}
Step 3: Retrieval with Search¶
func (m *SimpleMemory) Retrieve(query string, limit int) ([]MemoryItem, error) {
m.mu.RLock()
defer m.mu.RUnlock()
// Simple keyword search (in production use embeddings)
results := make([]MemoryItem, 0, len(m.store))
queryLower := strings.ToLower(query)
for _, item := range m.store {
// Check if expired
if item.TTL > 0 && time.Since(item.Created) > item.TTL {
continue
}
// Simple keyword matching
valueStr := fmt.Sprintf("%v", item.Value)
if strings.Contains(strings.ToLower(valueStr), queryLower) {
item.Accessed = time.Now() // Update access time
results = append(results, item)
}
}
// Sort by access time (most recent first)
sort.Slice(results, func(i, j int) bool {
return results[i].Accessed.After(results[j].Accessed)
})
if len(results) > limit {
results = results[:limit]
}
return results, nil
}
Step 4: Forgetting Expired Items¶
func (m *SimpleMemory) Cleanup() error {
m.mu.Lock()
defer m.mu.Unlock()
now := time.Now()
for key, item := range m.store {
if item.TTL > 0 && now.Sub(item.Created) > item.TTL {
delete(m.store, key)
}
}
return nil
}
Step 5: Integration with Agent¶
func runAgentWithMemory(ctx context.Context, client *openai.Client, memory Memory, userInput string) (string, error) {
// Retrieve relevant memories
memories, _ := memory.Retrieve(userInput, 5)
// Build context with memories
messages := []openai.ChatCompletionMessage{
{Role: "system", Content: "You are a helpful assistant with access to memory."},
}
// Add relevant memories as context
if len(memories) > 0 {
memoryContext := "Relevant past information:\n"
for _, mem := range memories {
memoryContext += fmt.Sprintf("- %s: %v\n", mem.Key, mem.Value)
}
messages = append(messages, openai.ChatCompletionMessage{
Role: "system",
Content: memoryContext,
})
}
messages = append(messages, openai.ChatCompletionMessage{
Role: "user",
Content: userInput,
})
resp, err := client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: "gpt-4o-mini",
Messages: messages,
})
if err != nil {
return "", err
}
answer := resp.Choices[0].Message.Content
// Store important information from conversation
if shouldStore(userInput, answer) {
key := generateKey(userInput)
memory.Store(key, answer, map[string]any{
"user_input": userInput,
"timestamp": time.Now(),
})
}
return answer, nil
}
Checkpoint and Resume¶
An agent can work for hours on a complex task. If the process crashes mid-way, all progress is lost. Checkpoint saves the conversation state periodically. On failure, the agent resumes from the last saved point.
The basic Checkpoint implementation (structure, save/load, integration with agent loop) is described in Chapter 09: Agent Anatomy. Advanced strategies (granularity, validation, rotation) are covered in Chapter 11: State Management.
Here we look at how Checkpoint relates to agent memory:
- What to save: message history (
messages[]), memory contents, tool state, current execution step. - When to save: after each significant step (tool call, user response). For short tasks (2-3 iterations) Checkpoint is overkill. For long tasks (10+ iterations) it's mandatory.
- TTL: set a TTL on checkpoints (e.g. 24 hours) so stale snapshots don't accumulate.
Shared Memory Between Agents¶
In multi-agent systems, agents exchange information through a shared memory store. Each agent reads and writes to a common store, separating data by namespace:
type SharedMemoryStore struct {
store CheckpointStore
}
// Write with agent namespace
func (s *SharedMemoryStore) Put(ctx context.Context, agentID, key string, value any) error {
fullKey := fmt.Sprintf("shared:%s:%s", agentID, key)
data, _ := json.Marshal(value)
return s.store.Set(ctx, fullKey, data, 0)
}
// Read another agent's data
func (s *SharedMemoryStore) Get(ctx context.Context, agentID, key string) (any, error) {
fullKey := fmt.Sprintf("shared:%s:%s", agentID, key)
data, err := s.store.Get(ctx, fullKey)
if err != nil {
return nil, err
}
var result any
return result, json.Unmarshal(data, &result)
}
// List all entries from all agents (for supervisor)
func (s *SharedMemoryStore) ListAll(ctx context.Context) (map[string]any, error) {
keys, _ := s.store.Keys(ctx, "shared:*")
result := make(map[string]any)
for _, key := range keys {
val, _ := s.store.Get(ctx, key)
var parsed any
json.Unmarshal(val, &parsed)
result[key] = parsed
}
return result, nil
}
Connection: For more on agent state management, see Chapter 11: State Management. Checkpoint is a special case of state persistence.
Block Memory¶
A simple map-based SimpleMemory works for learning purposes. In a production agent, memory is more complex. The key idea: block architecture.
A Block is a unit of interaction. A user request, a chain of tool calls, the final response — all of that is one block. A block stores both original messages and their compact version.
Block Lifecycle¶
flowchart LR
A[Prepare] --> B[Append]
B --> C[BuildContext]
C --> D[Close]
D --> E["Compact + Summary"]
- Prepare — create a new block, bind it to the store
- Append — add messages (user, assistant, tool results)
- BuildContext — assemble history from current and past blocks
- Close — finalize: create compact version, count tokens, generate summary
Compact: 91% Tool Chain Compression¶
When a block is closed, tool call chains are collapsed into a compact form:
type Block struct {
ID int
Query string
Messages []Message // original
Compact []Message // compressed version
Summary string
Tokens int
}
func (b *Block) Close() {
b.Query = extractQuery(b.Messages) // first 120 chars of user message
b.Compact = compactMessages(b.Messages)
b.Tokens = estimateTokens(b.Compact) // chars / 3
b.Summary = buildSummary(b.Query, b.Messages, b.Compact)
}
The compactMessages function collapses assistant(tool_calls) → tool_result chains into a single message with <prior_tool_use> tags:
func compactMessages(msgs []Message) []Message {
var result []Message
var toolBuf strings.Builder
for _, msg := range msgs {
if msg.HasToolCalls() {
for _, tc := range msg.ToolCalls {
args := truncate(tc.Arguments, 80)
res := truncate(findToolResult(msgs, tc.ID), 200)
line := fmt.Sprintf("%s %s -- %s", tc.Name, args, res)
if !isDuplicate(toolBuf.String(), line) {
toolBuf.WriteString(line + "\n")
}
}
continue
}
if msg.Role == "tool" {
continue // already processed above
}
if toolBuf.Len() > 0 {
result = append(result, Message{
Role: "user", // not assistant — the model must not imitate its own content
Content: "<prior_tool_use>\n" + toolBuf.String() + "</prior_tool_use>",
})
toolBuf.Reset()
}
result = append(result, msg)
}
return result
}
Compact messages are assigned the user role, not assistant. This prevents the model from perceiving compressed content as its own previous words and starting to imitate them.
BuildContext: Block Budgeting¶
When assembling context, we take compact versions of past blocks (from newest to oldest) until the budget runs out:
func (m *Memory) BuildContext(currentBlock *Block, contextSize int) []Message {
budget := contextSize - currentBlock.Tokens - 4096 // reserve
var history []Message
// from newest blocks to oldest
for i := len(m.blocks) - 1; i >= 0 && budget > 0; i-- {
block := m.blocks[i]
if block.Tokens > budget {
break
}
history = append(block.Compact, history...)
budget -= block.Tokens
}
return append(history, currentBlock.Messages...)
}
Block Catalog and Recall¶
The model does not see past block contents directly. Instead, a catalog — a list of blocks with brief descriptions — is rendered in the system prompt:
[CONTEXT BLOCKS]
Blocks from prior interactions (use recall tool to get full details):
#0: check disk usage — exec x3, read x1 → "Disk /data at 92%, cleaned logs" (~5K tokens)
#1: deploy service — edit x2, exec x4 → "Deployed v2.3.1 to staging" (~8K tokens)
#2: fix nginx config — read x2, edit x1 → "Updated proxy_pass for /api" (~3K tokens)
When the model needs details, it calls the recall tool:
type RecallTool struct {
memory *Memory
}
func (t *RecallTool) Execute(ctx context.Context, args RecallArgs) (string, error) {
msgs := t.memory.BlockMessages(args.BlockID)
if msgs == nil {
return "Block not found", nil
}
return formatMessages(msgs), nil // full original messages
}
Recall returns original block messages, not the compact version. This is critical: compact loses details, and deep analysis requires full information.
Working Memory¶
Working Memory is a dynamic section of the system prompt that holds the context of the current task. Unlike block memory (past interactions), Working Memory represents "what the agent knows right now."
Working Memory Components¶
| Component | Purpose |
|---|---|
| TaskContext | Current task (first 50 words), read/modified files, recent actions |
| LivePlan | Goal, steps with statuses (pending/in_progress/completed/cancelled) |
| Budget | Warnings about context window filling up |
| ContextBlocks | Catalog of past blocks for recall |
Rendering in System Prompt¶
type WorkingMemory struct {
Task string
FilesRead *Ring[string] // ring buffer, MRU semantics
FilesModified *Ring[string]
LastActions *Ring[string]
Plan *LivePlan
Budget *BudgetTracker
Blocks []BlockSummary
maxChars int // ~6000 chars
}
func (wm *WorkingMemory) Render() string {
var sb strings.Builder
sb.WriteString("[TASK CONTEXT]\n")
sb.WriteString("Task: " + wm.Task + "\n")
sb.WriteString("Files read: " + wm.FilesRead.Join(", ") + "\n")
sb.WriteString("Files modified: " + wm.FilesModified.Join(", ") + "\n")
sb.WriteString("Recent: " + wm.LastActions.Join(", ") + "\n")
if wm.Plan != nil {
sb.WriteString("\n" + wm.Plan.Render() + "\n")
}
if wm.Budget.ShouldWarn() {
sb.WriteString("\n[BUDGET]\n" + wm.Budget.Warning() + "\n")
}
if len(wm.Blocks) > 0 {
sb.WriteString("\n[CONTEXT BLOCKS]\n")
for _, b := range wm.Blocks {
sb.WriteString(fmt.Sprintf("#%d: %s (~%dK tokens)\n", b.ID, b.Summary, b.Tokens/1000))
}
}
// Trim when over budget
if sb.Len() > wm.maxChars {
wm.LastActions.Trim(3)
wm.FilesRead.Trim(5)
}
return sb.String()
}
The Problem: Working Memory Is Not Persistent¶
Working Memory lives in RAM. Between sessions it is lost: the agent forgets the plan, re-reads files, and starts from scratch.
Solution: pass Working Memory as a parameter to Run() so it survives across REPL cycles. For persistence between sessions, implement Export/Import:
func (wm *WorkingMemory) Export() WorkingMemorySnapshot {
return WorkingMemorySnapshot{
Task: wm.Task,
FilesRead: wm.FilesRead.Items(),
FilesModified: wm.FilesModified.Items(),
Plan: wm.Plan.Export(),
}
}
func (wm *WorkingMemory) Restore(snap WorkingMemorySnapshot) {
wm.Task = snap.Task
for _, f := range snap.FilesRead {
wm.FilesRead.Push(f)
}
// ...
}
4 levels of memory in a production agent:
- Working Memory — task, plan, budget, files (in system prompt)
- Block Memory — completed interactions (original + compact)
- Recall — model-driven retrieval of full block data
- Condense — emergency LLM compression on overflow (see Context Engineering)
Common Errors¶
Error 1: No TTL (Time To Live)¶
Symptom: Memory grows infinitely, consuming storage and context.
Cause: Outdated information is not forgotten.
Solution: Implement TTL and periodic cleanup.
Error 2: Storing Everything¶
Symptom: Memory fills with irrelevant information, making retrieval noisy.
Cause: No filtering of what to store.
Solution: Store only important facts, not every conversation turn.
Error 3: No Retrieval Optimization¶
Symptom: Retrieval returns irrelevant results or misses important information.
Cause: Simple keyword matching instead of semantic search.
Solution: Use embeddings for semantic similarity search.
Error 4: Compact Destroys Originals¶
Symptom: After compaction, it is impossible to recover tool call details. Recall returns the compressed version.
Cause: Original messages were deleted when creating the compact version.
Solution:
// BAD: overwriting the original
block.Messages = compactMessages(block.Messages)
// GOOD: store both versions
block.Compact = compactMessages(block.Messages)
// block.Messages remains unchanged
The Never destroy originals principle: condensation is a view, not a mutation. Originals are needed for recall over the full history.
Error 5: Programmatic Compaction Mid-Loop¶
Symptom: Agent loses context in the middle of a task, restarts or repeats actions.
Cause: Compaction is triggered inside the agent loop while the task is still running.
Solution: Compact only on Block.Close(), when the interaction is complete. Inside the loop, use LLM condensation (condense), which creates a meaningful summary instead of mechanical compression.
Mini-Exercises¶
Exercise 1: Implement Memory Store¶
Create a memory store that persists to disk:
type FileMemory struct {
filepath string
// Your implementation
}
func (m *FileMemory) Store(key string, value any, metadata map[string]any) error {
// Save to file
}
Expected result:
- Memory persists between restarts
- Can load from file on startup
Exercise 2: Semantic Search¶
Implement retrieval using embeddings:
func (m *Memory) RetrieveSemantic(query string, limit int) ([]MemoryItem, error) {
// Use embeddings to find semantically similar items
}
Expected result:
- Finds relevant items even without exact keyword match
- Returns most similar items first
Exercise 3: Implement Block Memory with Compact¶
Implement block memory with tool chain compaction:
type BlockMemory struct {
blocks []*Block
current *Block
}
func (m *BlockMemory) NewBlock() *Block {
// Create a new block
}
func (m *BlockMemory) CloseBlock() {
// Close current block: compact + summary
}
func (m *BlockMemory) BuildContext(contextSize int) []Message {
// Assemble history from compact versions of past blocks + current block
}
Expected result:
- Compact compresses a chain of 10 tool calls into 1-2 messages
- BuildContext fits within the token budget
- Original messages are preserved for recall
Completion Criteria / Checklist¶
Completed:
- Understand different memory types
- Can store and retrieve information
- Implement TTL and cleanup
- Integrate memory with agent
- Understand block memory architecture and compaction
- Can implement block catalog and recall
- Understand Working Memory and its components
Not completed:
- No TTL, memory grows infinitely
- Storing everything without filtering
- Only simple keyword search
- Compact destroys original messages
- Compaction triggered mid-loop instead of on block close
Connection with Other Chapters¶
- Chapter 09: Agent Anatomy — Memory is a key agent component
- Chapter 13: Context Engineering — Memory feeds context management (facts from memory are used when assembling context)
- Chapter 08: Evals and Reliability — Memory affects agent consistency
IMPORTANT: Memory (this chapter) is responsible for storing and retrieving information. Managing how this information is included in LLM context is described in Context Engineering.
What's Next?¶
After understanding memory systems, proceed to:
- 13. Context Engineering — Learn how to efficiently manage context from memory, state, and retrieval