# Treating AI Like LEGO Bricks

In the Claude Code codebase, the same `query()` function (the "universal engine" of the entire system—everywhere an AI model needs to be invoked, whether it's the main conversation, a sub-agent, or background memory compaction) is used in at least seven different scenarios.

> 🌍 **Industry Context**: The architectural pattern of "one universal execution engine + parameterized configuration" has a long history in software engineering—the Strategy Pattern and parameterized factory patterns in the GoF design patterns are its classic expressions. In the AI framework space, different products have made very different architectural choices for the same problem of "multiple AI invocation scenarios":
>
> | Framework/Product | Architectural Approach | Core Mechanism | Advantages | Trade-offs |
> |-----------|---------|---------|------|------|
> | **Claude Code** | Unified engine + parameterization | The same `query()` function distinguishes 7 roles via `querySource`, permissions, toolsets, etc. | All roles automatically inherit engine optimizations (streaming, caching, retries); change one place, benefit everywhere | Single-path complexity rises; one bug can affect all roles |
> | **Cursor Agent** | Independent paths + shared infrastructure | Tab completion, Inline Edit, Chat, and Agent modes each have independent invocation paths, sharing the underlying LLM communication layer | Each mode can be optimized independently; good fault isolation | Cross-cutting improvements (those needing to be applied uniformly to all paths, like caching strategies) must be implemented on each path separately |
> | **Aider** | Explicit multi-model collaboration | Configurable architect model + editor model, different models assume different roles, each with independent system prompts and output parsing | Can select the best model for each role (e.g., GPT-4o for architecture, Claude for editing) | Inter-model interaction protocols require extra design; model switching adds latency |
> | **LangGraph** | State machine orchestration | Each Agent is a node in a state graph, connected by conditional edges, with state passed explicitly between nodes | Workflow is visualizable and traceable; complex orchestration logic is expressed clearly | Boilerplate is heavy for simple scenarios; the state graph itself becomes an abstraction layer to maintain |
> | **OpenAI Agents SDK** | Object instantiation + Handoff | Each `Agent` is an independent instance, and control is transferred to another Agent via the `handoff` mechanism | Object-oriented intuition is clear; Agent boundaries are explicit | Common logic (error retries, streaming) must be maintained in the base class; handoff state transfer requires explicit design |
>
> None of these choices are absolutely better or worse—they depend on team size, how similar the AI roles are, and the priority of iteration speed. Claude Code chose the extreme unification route because its engineering context is: all roles call the same model (Claude), use the same message format and tool protocol, and the differences are mainly at the parameter level rather than the execution logic level. Under these conditions, the unified engine maximizes benefits and minimizes costs. But if your system needs to collaborate across model providers (e.g., main Agent uses Claude, sub-agent uses GPT), Aider's multi-model route or LangGraph's state machine orchestration may be more suitable.

---

## Seven AI Instances

Whenever someone says Claude Code calls Claude, they could be referring to any of the following:

1. **Main Loop AI** — the one responding to user input (`querySource: 'repl_main_thread'`)
2. **Sub-agent** — the AI executing an `Agent` tool call
3. **Speculative Execution AI** — running ahead while the user is typing (`querySource: 'speculation'`)
4. **SessionMemory AI** — the AI taking notes in the background (`querySource: 'session_memory'`)
5. **Prompt Suggestion AI** — the AI predicting the user's next message
6. **Hook Agent** — the AI used for Stop condition validation (`querySource: 'hook_agent'`)
7. **Context Compression AI** — the AI summarizing when the conversation gets too long

These seven AI instances share the same tool system, the same permission system, and the same message format—but each has a different `querySource`, a different `ToolUseContext`, and different permission constraints.

> 💡 **Plain English**: This is like a **rider dispatch system for a food delivery platform**—the main loop AI is the central dispatcher, sub-agents are riders in different zones, speculative execution AI is the rider who sets out early (betting you'll place an order), SessionMemory AI is the clerk who records daily deliveries, and Prompt Suggestion AI is the analyst predicting what you'll want to eat next. They all ride the same electric scooter model (`query()` function); they just receive different orders.

---

### Insight: Meta-Prompting—Teaching AI How to Use AI

"Seven AI instances share one engine" is horizontal reuse. But Claude Code also has a vertical recursion: **using prompts to produce prompts**—the meta-prompting architecture.

**Layer 1: AgentTool's "Prompt Writing Class"**

The `getPrompt()` function in `src/tools/AgentTool/prompt.ts` doesn't just describe what AgentTool can do; it contains an entire section `## Writing the prompt` that directly teaches Claude how to write good prompts for sub-agents:

```
Brief the agent like a smart colleague who just walked into the room —
it hasn't seen this conversation, doesn't know what you've tried,
doesn't understand why this task matters.
- Explain what you're trying to accomplish and why.
- Describe what you've already learned or ruled out.
- Give enough context about the surrounding problem that the agent
  can make judgment calls rather than just following a narrow instruction.
```

This isn't a tool description; it's a **prompt writing specification**. When Claude invokes AgentTool, it is simultaneously being trained by the system prompt on "what makes a good sub-agent prompt." The prompt is teaching the AI how to produce prompts.

Note the line "Terse command-style prompts produce shallow, generic work"—this is a pattern Anthropic engineers observed through experimentation, encoded into the tool description's teaching content, and applied to every Claude instance that uses AgentTool.

**Layer 2: AGENT_CREATION_SYSTEM_PROMPT—Turning Claude into an "AI Architect"**

`src/components/agents/generateAgent.ts` goes even further. When the user wants to create a custom Agent, the system calls Claude with `AGENT_CREATION_SYSTEM_PROMPT`, which begins:

```
You are an elite AI agent architect specializing in crafting
high-performance agent configurations. Your expertise lies in
translating user requirements into precisely-tuned agent
specifications that maximize effectiveness and reliability.
```

Claude is given a new role—not "assistant," but "AI agent architect." It must complete five tasks: extract core intent, design an expert persona, build comprehensive instructions, optimize performance, and generate an identifier. The output is a `GeneratedAgent` object (containing `identifier`, `whenToUse`, and `systemPrompt` fields), which becomes the configuration for the next AI instance—**Claude produces prompts meant to be consumed by another Claude instance**.

This system prompt also demonstrates behavioral examples of "when to call AgentTool" (with `<commentary>` tags), making it a meta-prompt recursive to a third layer: using examples to teach Claude how to write examples that teach other Claudes how to act.

**The Logical Extreme of "LEGO Bricks"**

If "seven AI instances share one engine" is treating AI as LEGO bricks (the same brick, different colors and sizes), then the meta-prompting architecture treats AI as a **brick design tool**—not just reusing the same engine, but teaching the engine to design its own brick specifications.

User describes a need → Claude (as architect) designs Agent configuration → The generated configuration drives another Claude instance → That instance calls a sub-agent while using the prompt writing skills learned from AgentTool → The sub-agent may trigger a new generation loop.

This isn't nested invocation; it's **self-extension of capability**: through the meta-prompting mechanism, the system passes Anthropic engineers' judgmental knowledge of "what makes a good Agent" to every running Claude instance, enabling them to produce high-quality Agent configurations too.

> 💡 **Plain English**: This is like a **culinary school hiring a master chef (Claude) to teach a chef training course (AgentTool prompt writing class)**, while that same master chef can also design new cuisines based on customer requests (`AGENT_CREATION_SYSTEM_PROMPT`), and the designed cuisines are then learned and refined by other chefs—knowledge flows between AI instances rather than remaining in human engineers' hands.

---

## How They Compose

**The key to modularity isn't inheritance, but parameters**.

```typescript
// query() function parameters (simplified)
type QueryParams = {
  messages: Message[]
  systemPrompt: SystemPrompt
  querySource: QuerySource    // distinguishes who is calling
  toolUseContext: ToolUseContext  // includes AppState and permissions
  canUseTool: CanUseToolFn    // each caller can override permission checks
  tools?: Tools               // can restrict the available toolset
}
```

Each AI instance achieves different behavior through different parameter configurations:

| Instance | Special Configuration |
|------|---------|
| Main loop | Full permissions, full tools, full system prompt |
| Sub-agent | `setAppState` is a no-op (cannot modify parent state) |
| Speculative execution | `CacheSafeParams` (must match main loop parameters to share cache) |
| SessionMemory | `isNonInteractiveSession: true`, can only use Edit tool |
| Hook Agent | `mode: 'dontAsk'`, extra SyntheticOutputTool added |
| Compression AI | Smaller model (Haiku), outputs only summaries |

---

## Why Choose a Unified Engine

The traditional approach is to write each AI invocation as an independent function: `callSessionMemoryAI()`, `callSpeculationAI()`, `callHookAgentAI()`... The result is massive code duplication, with each function having its own error handling, streaming, and tool invocation loop.

Of course, there's also a **middle path**: extract common streaming, caching, and error retry logic into a base class or mixin, then have each AI role inherit or compose these shared capabilities. This achieves DRY too, but with clearer separation of concerns for each role. Cursor's architecture is closer to this path—different modes (Tab completion, Chat, Agent) have independent invocation paths but share the underlying LLM communication infrastructure.

Claude Code chose the more radical unified path: making `query()` a general-purpose **AI execution engine** that any component needing AI capability uses through parameter configuration. This choice works in Claude Code's context because the execution logic across its 7 AI roles is highly similar—all are loops of "send message, call tool, process response," with differences concentrated at the parameter level rather than the flow level.

> 📚 **Course Connection**: This architectural choice directly corresponds to the **DRY (Don't Repeat Yourself)** principle in software engineering courses. It also **partially** aligns with the **Open-Closed Principle**—when a new role only needs a different parameter combination, you can indeed add it without modifying `query()` (e.g., adding a read-only analysis AI by passing in a restricted toolset and corresponding system prompt). But this "openness" has boundaries: if the new role needs entirely new execution logic (such as a completely different streaming approach or error retry strategy), you still need to modify `query()` internally—and then the "unified engine's" open-closedness no longer holds. This isn't a flaw in Claude Code's design; it's an inherent boundary of the parameterization pattern: **the dimensions of variation expressible through parameters are finite**.

---

## Two Kinds of Isolation: Memory State vs. File System

When a sub-agent is created, its `ToolUseContext` contains:

```typescript
setAppState: () => {}  // no-op
```

This means the sub-agent **cannot modify the parent agent's in-memory state** (such as UI state, session metadata, etc.). But it's important to be clear: **this is only memory-level isolation, not I/O isolation**. The sub-agent can still read files, write files, and execute commands—and these file system and external environment side effects are **real and cannot be rolled back via `setAppState`**.

Claude Code uses different granularity isolation strategies in different scenarios:

| Isolation Level | Mechanism | Applicable Scenario | Isolation Scope |
|----------|------|---------|---------|
| **Memory state isolation** | `setAppState: () => {}` | All sub-agents (AgentTool calls) | Parent agent's app state is not modified by the sub-agent, but file writes are real |
| **File system isolation** | Overlay file system | Only speculative execution scenarios | File operations happen in a virtual layer, and can be fully rolled back if the prediction is wrong |
| **Permission isolation** | `canUseTool` + toolset restrictions | SessionMemory (can only use Edit), Hook Agent (`dontAsk` mode) | Restricts the types of actions that can be executed at the source |

In other words, when a regular sub-agent is invoked via AgentTool, if it writes incorrect file content, the main agent cannot automatically roll it back—the file system changes are real. Only the speculative execution scenario has the full isolation provided by an overlay file system. This is an important engineering trade-off: providing file-system-level isolation (such as sandboxes or overlays) for all sub-agents would be too expensive, so `setAppState`'s memory isolation is a "good enough and cheap" compromise.

> 💡 **Plain English**: `setAppState: () => {}` is like sending an assistant to run errands for you but not giving them your bank card PIN (memory state)—they can't touch your savings. But you did give them the key to your house (file system permissions), so they can enter, move furniture around, and make changes. Speculative execution's overlay file system is more like "practicing in a model house first, then moving to the real house only after confirming everything is fine."

---

## Applying This to Your Own AI System

If you're building a multi-AI collaborative system, you need to choose among three paths:

| Path | Applicable Conditions | Representative |
|------|---------|------|
| **Unified engine + parameterization** | Few roles (<10), similar execution logic, single model, small team | Claude Code |
| **Independent paths + shared infrastructure** | Roles have significantly different execution logic, need independent optimization and fault isolation | Cursor |
| **State machine / object orchestration** | Complex collaboration workflows between roles, need visualization and traceability | LangGraph, OpenAI Agents SDK |

If your scenario fits the unified engine path, Claude Code's model is worth learning from:

**Define a universal execution function** whose parameters include:
- `querySource` — for distinguishing in logs, permissions, and billing
- `canUseTool` — each caller defines its own permission boundary
- `setAppState` — whether this AI is allowed to modify global state (sub-AIs usually shouldn't), but remember this only isolates memory state, not I/O side effects

**Put the AI's "personality" in parameters**, not in function implementation. System prompt, toolset, permissions—these are configuration, not code.

**Watch for the tipping point of the unified engine**:
- When you find `query()` accumulating lots of `if (querySource === 'xxx')` branches, parameterization is no longer sufficient; consider splitting
- When a new role needs a different model provider or a different streaming approach, the unified engine abstraction becomes a constraint
- When the team is developing multiple AI roles in parallel, merge conflicts on the shared code path will slow everyone down

---

## Analogy

We need to distinguish between two different design patterns:

**Unix pipes** are **composition**—`cat | grep | sort` chains multiple **different programs** through a standard interface, each doing something different. Claude Code's `query()` is not this pattern—it isn't composing multiple different programs, but rather **the same program serving multiple scenarios through different parameters**.

A more accurate Unix analogy is the `curl` command: `curl -X GET` sends a GET request, `curl -X POST -d '...'` sends a POST request, `curl -H 'Authorization: ...'` sends a request with an auth header. Same program, different flag combinations, serving various HTTP scenarios. Claude Code's `query()` is exactly this **parameterization** pattern: the same execution engine transforms into the main loop AI, sub-agent, speculative execution AI, and other roles through different combinations of `querySource`, `tools`, `canUseTool`, and other parameters.

> 💡 **Plain English**: This is like a **multifunction food processor**—you don't need to buy a separate juicer, blender, and grinder; instead you have one motor base with different blades and containers. Attach the juicer blade and it's a juicer; attach the blending cup and it's a blender. `query()` is that motor base, and the parameters are the different blades and containers.

> 📚 **Course Connection**: From the **software engineering** design pattern perspective, `query()` maps most closely to the **Strategy Pattern**—the core flow of the execution engine (streaming, tool invocation loop, error retry) is the fixed "skeleton," while system prompts, toolsets, permission functions, and other parameters act as interchangeable "strategies." Another perspective is **Dependency Injection**—each AI role doesn't construct its own execution environment; instead, the caller injects the required configuration. Both analyses are more accurate than a compiler theory analogy: different `querySource`s do not correspond to "different source languages"—they all use the same message format and tool protocol, with differences only in parameter configuration, not in fundamentally different syntax or semantics.

---

## Code Landmarks

- `src/services/api/claude.ts`: The unified `query()` function—the shared execution entry point for all 7 AI invocations
- `src/tools/AgentTool.ts`: The sub-agent tool definition—achieving Agent cloning through parameterized `query()`
- `src/services/sideQuery.ts`: Side queries—for non-main-dialogue scenarios such as memory retrieval and Yolo classification

## Trade-offs and Reflection

The benefits of sharing `query()` are real, but so are the risks:

**Expanded blast radius**. When 7 invocation scenarios share the same execution path, optimizing `CacheSafeParams` parameter handling logic for speculative execution may accidentally affect SessionMemory AI's behavior. The first question in debugging is always "which `querySource` is causing this?" If `query()` accumulates a lot of `if (querySource === 'xxx')` branches, then "unified engine" may have already degenerated at the code level into "multiple functions in the same file"—physically co-located but logically not unified.

**When should you split?** If AI roles grow to 15, 20 in the future, how bloated will `query()`'s parameter list become? The decision framework is: **when a new role needs not just parameter differences but execution flow differences** (such as a completely different streaming approach, a different model provider, or a different error retry strategy), the maintenance cost of the unified engine will rise sharply. At that point, Cursor's "independent paths + shared infrastructure" route or LangGraph's state machine orchestration may be more appropriate.

**Scenarios where you shouldn't copy this**:
- When the team grows beyond a certain size, everyone sharing the same code path means merge conflicts become a bottleneck
- When your AI roles need different model providers (e.g., main Agent uses Claude, sub-agent uses GPT-4o), the unified engine abstraction becomes strained—Aider's multi-model route is more natural here
- When the execution logic differences between AI roles exceed what parameters can express, cramming them into the same function reduces readability

Claude Code's unified `query()` is the right choice for a specific engineering context (single model, small team, highly similar execution logic across roles). Understanding the **conditions** for this choice is more valuable than imitating its **form**.