Skip to main content
Context Engineering Architecture Author: Danial Hasan, CTO @ Squad

The Convergence

Three independent teams just validated the same architectural insight: Google ADK (3 days ago):
“Context is a compiled view over a richer stateful system.”
Stanford/SambaNova ACE paper (October 2025):
“Treat contexts as evolving playbooks that accumulate, refine, and organize strategies.”
Squad (months ago, in production):
“We learned this a few months ago when building our active context management systems.”
When Google’s agent framework team, Stanford researchers, and a startup building multi-agent systems all independently arrive at the same architecture, that architecture is probably correct.

The Problem We Hit

Month 1 of Squad:
Agent A gathers context (5,000 tokens)
Agent A passes everything to Agent B
Agent B receives 5,000 tokens of "history"
Agent B starts saying "As I mentioned earlier..."
Agent B never mentioned anything. Agent A did.
Identity confusion. Agent B hallucinated that it had Agent A’s conversation. This wasn’t a prompt problem. Our prompts were clear: “You are Agent B, the Engineer.” It was a context problem. We flooded Agent B with Agent A’s history, and the model couldn’t distinguish “context I’m receiving” from “conversation I’m having.” The failure rate: 39% of multi-agent handoffs had identity confusion artifacts.

The Wrong Mental Model

Most agent frameworks handle context like this:
context = ""
context += system_prompt
context += user_message
context += tool_result_1
context += tool_result_2
context += agent_response
# ... keep appending forever
This is prompt engineering thinking applied to context. It treats context as a string to optimize, not a system to architect.

The Compiler Mental Model

Source code (what you store):
  • Sessions
  • Memory
  • Artifacts (files)
  • Full structured state
Compiler pipeline (how you transform):
  • Named processors
  • Sequence of passes
  • Observable transformations
Compiled output (what the model sees):
  • Working context
  • Minimal, relevant, scoped to this call

Squad’s Three-Tier Architecture

┌─────────────────────────────────────────────┐
│  TIER 3: IMMUTABLE (Audit Log)             │
│  - All receipts ever generated              │
│  - Storage: S3 / long-term                  │
├─────────────────────────────────────────────┤
│  TIER 2: PERSISTENT (Shared Database)       │
│  - Current task context                     │
│  - Frozen contracts                         │
│  - Storage: Vector DB + Relational DB       │
├─────────────────────────────────────────────┤
│  TIER 1: EPHEMERAL (Working Context)        │
│  - What THIS agent sees for THIS call       │
│  - Compiled from Tier 2 + Tier 3            │
│  - Storage: LLM context window              │
└─────────────────────────────────────────────┘

The Multi-Agent Identity Fix

Wrong: Pass Agent A’s conversation to Agent B as history. Right: Transform Agent A’s outputs into context FOR Agent B.
// Wrong: Copy conversation
const engineerContext = scoutConversation

// Right: Transform to third-person context
const engineerContext = {
  role: "system",
  content: `
    Context from Scout Agent (separate agent):

    - Files identified: ${scout.outputs.files}
    - Patterns detected: ${scout.outputs.patterns}

    You are the Engineer Agent. Use this context to implement.
  `
}
The difference:
  • Scout’s outputs become Engineer’s context, not history
  • Clear attribution: “Scout found…” not “I found…”
  • No identity confusion
Results:
  • Before: 39% identity confusion
  • After: 2% identity confusion

Evidence: Before vs After

Before Context Engineering (Month 1-2)

MetricValue
Average context size180K tokens
Context relevance34%
Identity confusion39%
Task success rate61%
Cost per task$2.40

After Context Engineering (Month 4+)

MetricValueChange
Average context size48K tokens-73%
Context relevance91%+168%
Identity confusion2%-95%
Task success rate94%+54%
Cost per task$0.77-68%

The Reach, Don’t Flood Principle

Google’s ADK: “Agents should reach for information via tools, not get flooded with everything upfront.”
// Bad: Flood agent with all possible context
const context = {
  allFiles: await readAllFiles(),           // 50,000 tokens
  allTests: await getAllTestResults(),       // 10,000 tokens
  allDocs: await getAllDocumentation(),      // 30,000 tokens
  // Total: 90,000+ tokens (most irrelevant)
}

// Good: Minimal default + tools to reach for more
const context = {
  task: currentTask,                          // 500 tokens
  contracts: frozenContracts,                 // 300 tokens
  recentContext: last3Turns,                  // 800 tokens
  // Total: 1,600 tokens
}

const tools = [
  readFile,      // Agent reaches for specific files
  runTests,      // Agent reaches for test results
  searchDocs,    // Agent reaches for relevant docs
]
Results:
  • 75% token reduction
  • +3.2 tool calls per task (agents reaching for what they need)
  • 94% task success rate
  • 68% cost reduction

This is a scaffold post. Full content will include:
  • Complete compilation pipeline code
  • Processor architecture details
  • Google ADK comparison table
  • Stanford ACE paper insights
  • Meta MSL validation
  • Practical implementation guide

The Meta-Point

Four independent teams arrived at the same architecture:
PrincipleGoogle ADKACE PaperMeta MSLSquad
Storage ≠ PresentationSessions vs Working ContextPlaybooks vs Delta UpdatesEnvironments vs EvaluationsTiers vs Compiled View
Explicit TransformationsLLM Flows + ProcessorsGenerator → Reflector → CuratorVerifier PipelineNamed Processor Chain
Scope by DefaultTools reach for moreIncremental updatesConstraint-scopedProtocol-based access
This isn’t coincidence. This is convergent evolution toward correct architecture.
Related Reading: