# Tool Runtime: Registration, Scheduling, and Execution

Tools are the only interface between the AI and the real world—40 built-in tool directories, spanning registration, permission checks, and concurrent scheduling, form the "syscall table" of Claude Code. This chapter dissects the complete lifecycle of a tool, from the type definitions in `Tool.ts` to the streaming parallel execution engine.

> **📏 On the counting rule for "number of tools"**
>
> Throughout this book, the authoritative figure for the number of tools is **"40 built-in tool directories"**. Here is why:
>
> - **43 = raw directory entries**: `ls src/tools/` returns 43 entries, but 3 of them are **not tools**—`shared/` (shared code), `testing/` (testing infrastructure), and `utils.ts` (utility functions). Counting them as tools is a categorical error.
> - **40 = actual tool directories**: 43 − 3 non-tools = 40 real tool directories (AgentTool, BashTool, FileReadTool, GlobTool, etc.)
> - **24 = top-level `require` count in `tools.ts`**: The top-level scope of `tools.ts` only explicitly `require`s 24 tools; the rest are brought in via feature gates (e.g., `SuggestBackgroundPRTool`, `MonitorTool`) or dynamic assembly (MCP tools, ToolSearch lazy loading). The number of tools actually loaded at runtime **varies dynamically with configuration**.
>
> **Why 40 instead of 43 or 24?** 40 is a **stable, verifiable fact** (directory count from `ls`) and excludes categorical errors; 43 inflates the number with non-tool entries; 24 only reflects static `require`s and undercounts feature-gated tools. Therefore, **40 is the figure that best represents "how many built-in tools the Claude Code product provides"**.
>
> The actual number of tools visible at runtime fluctuates with feature gates and MCP servers—in a specific session Claude might see only 30 tools (because MonitorTool/WorkflowTool/CronTool are disabled), or 50+ (because MCP servers are connected). But as a **product description**, "40 built-in tool directories" is the safest statement.

> **Source locations**: `src/tools/` (40 built-in tool directory implementations), `src/Tool.ts` (tool base interface), `src/tools/tools.ts` (tool registry and assembly chain entrypoint), `src/services/tools/` (execution core: `toolOrchestration.ts` scheduling shell, `toolExecution.ts` execution core, `StreamingToolExecutor.ts` streaming scheduler, `toolHooks.ts` hooks policy layer), `src/utils/toolResultStorage.ts` (large-result persistence)

> **🌍 Industry Context**: Tool Use / Function Calling is the **standard paradigm** in the AI Agent field, not an invention unique to Claude Code. OpenAI introduced Function Calling in June 2023, Google Gemini has a Tool Use API, and frameworks like LangChain / LlamaIndex all ship with tool-calling abstractions. The real differentiation does not lie in the concept of "AI can call tools" itself, but in Claude Code's engineering depth across three dimensions: **(1)** the fine-grained categorization and behavioral design of 40 built-in tool directories (Edit/Write separation, concurrency-safety declarations); **(2)** a ten-step permission-check chain whose granularity far exceeds comparable products (contrast with Cursor's binary allow/deny); **(3)** the MCP protocol enabling dynamic extension of the toolset rather than a compile-time-fixed set. This chapter focuses on the design logic behind these three engineering decisions.

> 🌍 Community Perspective | @wquguru — "The same nouns in tools do not mean the same skeleton in the system."

This sentence precisely captures the key to understanding Claude Code's tool system: LangChain has "Tool," OpenAI has "Function," Cursor has "Action"—the nouns look similar, but the underlying registration mechanisms, permission models, concurrent scheduling, and lifecycle management are completely different. Equating Claude Code's `Tool<Input, Output>` with the tool concept in other frameworks is like equating an operating-system syscall with an ordinary function call—superficially similar, but structurally worlds apart.

---

## Prologue: System Calls—the Only Way User-Space Programs Access Hardware

In an operating system, a user-space program cannot read from or write to the disk directly, cannot send network packets directly, and cannot control the GPU directly. It must issue a **system call** (syscall) to the kernel: "Please read the file `/etc/hosts` for me." The kernel verifies permissions, performs the operation, and returns the result to the program.

The same is true for the AI in Claude Code. Claude (the model) cannot read or write your files directly—it runs on a remote server and has no access to your computer. The only thing it can do is declare in its response: "I want to invoke such-and-such tool." Then the **local system** executes that tool and stuffs the result back into the conversation history.

Tools are the AI's system calls.

> **🔑 OS Analogy:** Tool = service window. The AI is a citizen coming to run errands, `queryLoop` is the government service hall, and tools are the individual service windows (file-read window, file-write window, command-execution window). `Tool.ts` is the hall's window directory.
>
> 💡 **Plain English**: Tools are like **employee skill certifications**—a Read certificate, a Write certificate, a Search certificate, a Bash certificate. Anything Claude wants to do, it must first "show its certificate," then pass an exam (permission check), and only then get to work. For anything without a certificate, Claude simply cannot do it.

---

## 1. The Tool "Household Register": The Tool Interface

Every tool must implement the generic `Tool<Input, Output>` interface (`Tool.ts`). This is the **most-depended-upon abstraction** in Claude Code—40 built-in tool directories, MCP dynamic tools, and Plugin tools all implement this interface.

Key fields of the interface:

```typescript
interface Tool<Input, Output> {
  // ── Identity ──
  name: string                    // Tool name (the model invokes by this name)
  aliases?: string[]              // Aliases (for backward-compatible renames)

  // ── Capability description ──
  description(input, opts): Promise<string>  // Dynamic description for the model
  inputSchema: z.ZodType          // Zod schema for input parameters

  // ── Execution ──
  call(args, ctx, canUseTool, parent, onProgress): AsyncGenerator<Progress, Result>

  // ── Permissions ──
  checkPermissions(input, ctx): Promise<PermissionResult>
  isReadOnly(input): boolean      // Read-only tools do not need write permission

  // ── Concurrency ──
  isConcurrencySafe(input): boolean  // Whether parallel execution is safe

  // ── Result handling ──
  renderResultForAssistant(data): string  // Convert result into AI-readable text
  maxResultSizeChars: number      // Result size ceiling

  // ── UI ──
  renderToolUseMessage(input, output): ReactNode  // Terminal rendering of the tool call

  // ── Metadata ──
  isDeferred?: boolean            // Lazy-loaded (schema fetched on demand)
  group?: string                  // Tool grouping
}
```

**20+ methods and properties**—this is not a simple "function interface," but a complete **tool lifecycle contract**. From "telling the AI what I can do" to "checking whether I am allowed to do it" to "how to display the result when done," everything is defined in the interface.

> 📚 **Course Connection**: The `inputSchema` (Zod schema) of `Tool<Input, Output>` is essentially an **Interface Description Language (IDL)**, playing the same role as gRPC's `.proto` files or OpenAPI's JSON Schema—telling the caller "what parameter format you should pass." The difference is that `.proto` targets human developers, whereas here the schema targets the AI model. This design is an industry-standard practice—OpenAI's Function Calling also uses JSON Schema to describe parameters; Claude Code simply swaps the validation layer for Zod (stronger typing, friendlier error messages).

### Why is `description` dynamic?

`description()` is not a static string, but an **asynchronous function**. Because in different contexts, the same tool may have a different description. For example:
- The description of `BashTool` includes the current OS information (commands differ between macOS and Linux)
- Tool descriptions in sandbox mode append security-limitation notes
- Some tools have different capability descriptions under different permission modes

Every tool description consumes token budget from the system prompt—this is why `ToolSearchTool` (the tool-search meta-tool) exists: not all tool descriptions need to live in the system prompt; the AI can search first and then invoke. Dynamic `description` itself is a reasonable engineering choice (OpenAI's function calling also supports dynamically modifying tool descriptions), and Claude Code's highlight is that it **manages token budget at the tool level**—the combination of lazy loading + ToolSearch is rare among comparable products.

---

## 2. Four Major Tool Categories

The 40 built-in tool directories can be grouped into four categories by function:

### 2.1 File Operations

| Tool | Function | Read-only? | Concurrency-safe? |
|------|----------|------------|-------------------|
| `Read` | Read file contents | ✅ | ✅ |
| `Edit` | Diff-replacement editing | ❌ | ❌ |
| `Write` | Full overwrite writing | ❌ | ❌ |
| `Glob` | Search by filename pattern | ✅ | ✅ |
| `Grep` | Search by content | ✅ | ✅ |
| `NotebookEdit` | Jupyter notebook editing | ❌ | ❌ |

**Design rationale**: The separation of `Edit` and `Write` is deliberate. `Edit` only changes a portion of a file (diff replacement), so the blast radius on failure is small; `Write` overwrites the entire file, used for creating new files or complete rewrites. The system prompt explicitly tells the AI to prefer `Edit`.

### 2.2 Execution Engine

| Tool | Function | Sandbox? |
|------|----------|----------|
| `Bash` | Execute shell commands | Config-dependent |
| `PowerShell` | Windows commands | Config-dependent |

`Bash` is the **most frequently used** tool. It is also the biggest security risk—a single `rm -rf /` can be catastrophic. Therefore, Bash has the most complex permission-check logic, the strictest sandbox restrictions, and the finest-grained command pattern matching (`Bash(git *)` allows all git commands but nothing else).

### 2.3 Agent Tools

| Tool | Function |
|------|----------|
| `Agent` | Spawn a subagent (new AI instance) |
| `SendMessage` | Send a message to an existing agent |
| `TeamCreate` | Create a Teammate (Swarm mode) |
| `TeamDelete` | Delete a Teammate |
| `TaskCreate/Get/List/Update/Stop` | Task management suite |

Agent tools are "tools within tools"—invoking one creates a **complete new `queryLoop` instance**, with its own message history, permission context, and toolset. This is like the `fork()` system call in an operating system—creating a child process.

### 2.4 Extensions and Helpers

| Tool | Function |
|------|----------|
| `WebFetch` | Fetch web page content |
| `WebSearch` | Web search |
| `MCPTool` | Invoke an MCP server tool |
| `SkillTool` | Invoke a Skill |
| `ToolSearch` | Search available tools (meta-tool) |
| `AskUserQuestion` | Ask the user a question |
| `Sleep` | Wait for a specified time |
| `Brief` | Short output |
| `EnterPlanMode/ExitPlanMode` | Enter/exit plan mode |
| `EnterWorktree/ExitWorktree` | Enter/exit Git worktree |
| `ReadMcpResource` | Read an MCP resource |
| `ListMcpResources` | List MCP resources |
| `RemoteTrigger` | Remote trigger |
| `ScheduleCron` | Scheduled tasks (see Part 3 "Cron Scheduling System Deep Dive") |

---

## 3. Tool Registry: From Scattered Files to a Unified Manifest

`tools.ts` (note the plural) is the tool registry. Its `getTools()` function returns all currently available tools.

Registration is not a simple "list in an array"—it involves conditional filtering:

```
getTools() logic:
  1. Collect all built-in tools (40)
  2. Collect MCP tools (from connected MCP servers)
  3. Apply feature-gate filtering (experimental tools only when enabled)
  4. Apply permission filtering (tools disabled by policy do not appear)
  5. Apply platform filtering (PowerShell only on Windows)
  6. Deduplicate (alias conflict handling)
  7. Return final list
```

**Key design**: The tool list is **dynamic**. Within the same session, if you connect a new MCP server, the tool list grows. If an enterprise policy disables a tool, it disappears from the list. What the AI sees is a **capability manifest that may change at any time**.

### Four-Layer Tool Assembly Chain

In reality, a tool travels through **four assembly layers** from "scattered file" to "model-visible manifest" (source located in `Tool.ts`, `tools.ts`, and `screens/REPL.tsx`):

```
buildTool() + TOOL_DEFAULTS          ← Layer 1: unified contract baseline
   Fills a half-baked tool definition into a complete Tool object (Tool.ts:783)
       ↓
getAllBaseTools()                     ← Layer 2: built-in capability truth table
   Full registry with conditional branches (feature gates, env vars,
   user types, worktree/swarm/LSP toggles, etc.) (tools.ts:193)
       ↓
getTools(permissionContext)           ← Layer 3: current-context filtering
   deny rules, REPL vs simple mode, platform restrictions (tools.ts:271)
       ↓
assembleToolPool()                   ← Layer 4: final assembly
   Merges built-in + MCP tools, sorts (built-ins in a contiguous prefix to keep
   prompt cache stable), deduplicates (uniqBy, built-ins take priority) (tools.ts:345)
```

> 💡 **Plain English**: This is like assembling a sports team—Layer 1 is "completing every player's ID photo, medical report, and skill rating" (`buildTool`); Layer 2 is "listing all candidate players" (`getAllBaseTools`); Layer 3 is "cutting players who don't fit today's tactics or opponent" (`getTools`); Layer 4 is "lining up starters and subs and handing out jerseys" (`assembleToolPool`).

`getToolUseContext()` in `REPL.tsx` is the bridge where this assembly chain enters the main `queryLoop`—it packages the final tool pool together with the permission context, MCP clients, etc., into a `ToolUseContext`, which is passed to `queryLoop()`.

> 📚 **Course Connection**: `getTools()` is a classic **Registry Pattern**—all tools register to a central manifest, and consumers need not know where each tool comes from, only query the registry. This is the same pattern as the operating system's **syscall table**: the Linux kernel maintains a `sys_call_table[]`, and user-space programs invoke by number without knowing where each syscall is implemented inside the kernel. The Registry Pattern itself is a mature engineering practice; Claude Code's unique twist is that the registry is **dynamically mutable at runtime** (MCP tools hot-pluggable), rather than a fixed static table at compile time.

---

## 4. The Complete Tool Execution Pipeline

The full path of a tool call from the AI's "request" to the final "result returned":

```
AI outputs tool_use block (JSON)
  │
  ├── 1. Input validation
  │   └── Zod schema validates input parameters
  │       → Failure: return error message to AI
  │
  ├── 2. Permission check (canUseTool)
  │   ├── bypass-immune rule check
  │   ├── PreToolUse Hooks execution
  │   ├── auto-approval rule matching
  │   ├── sandbox rule check
  │   └── ... ten steps total
  │   → Deny: return denial message to AI
  │   → Ask: pop up UI confirmation dialog
  │
  ├── 3. File history tracking (Edit/Write only)
  │   └── fileHistoryTrackEdit()
  │       → Snapshots original file content before modification
  │
  ├── 4. Execution
  │   └── tool.call(args, context, ...)
  │       → AsyncGenerator<Progress, Result>
  │       → Intermediate yield Progress events (update UI)
  │       → Final return Result
  │
  ├── 5. PostToolUse Hooks
  │   └── Custom logic after tool execution
  │
  ├── 6. Result serialization and persistence
  │   └── renderResultForAssistant(result)
  │       → Convert result into plain text the AI can read
  │       → When exceeding maxResultSizeChars:
  │         ① maybePersistLargeToolResult() writes full result to disk
  │           (destination: ~/.claude/tool-results/)
  │         ② Conversation body replaced with a reference-style preview wrapped in <persisted-output> tags
  │           (not simple truncation, but persistence + reference substitution)
  │       → When result is empty: replaced with "(toolName completed with no output)"
  │         (prevents the model from misjudging turn boundaries; source: toolResultStorage.ts:293)
  │
  └── 7. Result injection
      └── Wrapped into UserMessage { tool_result: ... }
          → Appended to message history
          → Wait to be sent to the API in the next heartbeat
```

> 📚 **Course Connection**: The ten-step permission check in Step 2 is a classic **Chain of Responsibility**—the request travels along a "check chain" node by node, and each node may Approve, Deny, or "pass to the next node." This is the same pattern as middleware pipelines in web frameworks (Express.js's `app.use()`, Django's middleware stack). The pattern itself is not new, but Claude Code applies it to AI permission control and achieves ten layers of depth—a relatively fine-grained permission-control granularity among terminal-native AI coding tools.

### Key Node Analysis

**Step 3 (File history tracking)** only runs for Edit/Write tools—this is the foundation of the `/rewind` feature. Before modifying a file, the system automatically saves a snapshot. If the AI breaks the code, the user can one-click revert to the file state after any message.

> ⚠️ **Precision note**: Although the pipeline diagram above depicts "file history snapshot" as a standalone Step 3 for clarity, in the source implementation the `fileHistoryTrackEdit()` call actually happens **inside** the logic of `tool.call()` (Step 4), not as a separate scheduler-level stage. It is listed separately to emphasize the mechanism's existence, but readers should understand it is part of the tool execution logic, not an independent step of the scheduler pipeline.

**Step 6 (Result serialization)**'s `renderResultForAssistant()` is a frequently overlooked yet critically important function. It determines what the AI "sees." For example:
- The result of the `Read` tool contains line-number prefixes (`1\tcontent`)—so the AI can precisely specify locations in subsequent `Edit` calls
- The result of the `Bash` tool contains the exit code—the AI can judge whether the command succeeded
- The result of the `Glob` tool is sorted by modification time—most recently modified files appear first, helping the AI focus on what matters

---

## 5. ToolUseContext: The "World" During Tool Execution

Every tool receives a `ToolUseContext` object when it executes—this is the entire world the tool can "see":

| Field | Contents | Analogy |
|-------|----------|---------|
| `messages` | Full message history of the current session | Process memory space |
| `appState` | Global application state | Kernel data structures |
| `permissionContext` | Permission configuration | Process UID/GID |
| `mcpClients` | List of MCP clients | Loaded device drivers |
| `model` | Name of the currently used model | CPU model |
| `readFileState` | File state cache | File descriptor table |
| `abortController` | Cancellation signal | SIGTERM |

**Design decision**: ToolUseContext is **passed read-only**—tools should not directly mutate state in the context. State changes are realized through returning Results and side effects (file modifications, etc.). This is consistent with the OS design where "syscalls pass results through return values."

---

## 6. Concurrency Model: Which Tools Can Run Simultaneously

Claude Code's tool concurrency strategy:

The scheduler's unit of work is **one batch**—all `tool_use` blocks contained in a single model response. Within the same batch:

- Tools whose `isConcurrencySafe` returns `true` (e.g., Read, Glob, Grep) **all execute in parallel**;
- Tools whose `isConcurrencySafe` returns `false` (e.g., Edit, Write) **execute serially one by one**.

| Scenario | Behavior | Reason |
|----------|----------|--------|
| Multiple Read + Glob + Grep in same batch | All parallel | All declare `isConcurrencySafe=true` |
| Multiple Edit in same batch | Serial | All declare `isConcurrencySafe=false` |
| Mixed Read + Edit in same batch | Read runs in parallel, Edit runs serially, **no dependency wait between the two** | Scheduler only splits by static flag, does no dynamic dependency analysis |
| Agent creation | Parallel | Declares `isConcurrencySafe=true` |
| Bash commands | Depends on sandbox config | — |

**Important clarification**: `isConcurrencySafe` is a **static boolean flag self-declared by each tool**, not something the scheduler dynamically computes by analyzing data dependencies between tools. There is **no DAG dependency graph** in the system, nor any dynamic dependency tracking like "Edit waits for Read to finish." The scheduler logic is simple: within the same batch, tools flagged true run in parallel, tools flagged false run serially, full stop. The system **trusts** the tool's declaration—if a tool incorrectly declares itself concurrency-safe, race conditions may occur.

> 📚 **Course Connection**: The design philosophy of `isConcurrencySafe` is similar to database **transaction isolation levels**—different operations have different concurrency-safety requirements. Read (read-only) is like `READ COMMITTED`: multiple reads do not interfere with each other; Edit (read-write) is like `SERIALIZABLE`: must execute serially to avoid races. The difference is: a database enforces isolation levels by the DBMS, whereas Claude Code **delegates the concurrency-safety declaration to the tool author**, and the system performs no secondary validation—this is an explicit "trust the developer" engineering trade-off.

### Dual-Path Scheduler: Batch vs. Streaming

Tool scheduling actually has **two parallel paths**, sharing the same underlying execution entrypoint `runToolUse()` (in `toolExecution.ts`), but with different scheduling shells:

**Batch path** — `runTools()` (`toolOrchestration.ts`):
- Used in non-streaming API calls
- Splits a batch of tool calls into concurrent groups and serial groups by `isConcurrencySafe`
- Concurrent groups execute in one `Promise.all()`, serial groups one by one
- **contextModifier handling**: contextModifiers from concurrent tools are collected into a `queuedContextModifiers` dictionary and applied sequentially after the entire concurrent batch finishes

**Streaming path** — `StreamingToolExecutor` (`StreamingToolExecutor.ts`):
- Used in streaming API calls (the vast majority of scenarios)
- Tools start executing while the model is still speaking—execute as received
- **Known limitation**: `StreamingToolExecutor`'s semantics for concurrent-tool `contextModifier` support are **incomplete** (source `StreamingToolExecutor.ts:388-395` comment admits: "we currently don't support context modifiers for concurrent tools"). Only non-concurrent (serial) tools' contextModifiers are applied; concurrent tools' contextModifiers are silently ignored. Currently no built-in tool uses contextModifier in concurrent mode, so this gap does not affect actual behavior, but if in the future an MCP tool needs to modify context after concurrent execution, this will be a problem.
- **Sibling abort propagation**: When one Bash tool fails, the `siblingAbortController` terminates sibling processes of other tools in the same batch (but does not abort the parent query)—this is a fine-grained error-isolation mechanism (source `StreamingToolExecutor.ts:48,301-318`)

> 💡 **Plain English**: The batch path is like a restaurant's "order once, serve in sequence" model; the streaming path is like a buffet's "order and serve as you go" model—you're still looking at the menu (the model is still generating), but the kitchen has already started cooking the dishes you've picked (tools are already executing).

### Full Hook Capabilities: More Than "Custom Logic"

The hooks in Step 2 (PreToolUse) and Step 5 (PostToolUse) are not simple "custom logic"—they have four precise capabilities:

**PreToolUse Hook four capabilities**:
1. **Modify input**: Adjust parameters before tool execution
2. **Provide extra context**: Inject attachment messages
3. **Block execution**: Return a block signal, tool does not execute at all
4. **Influence permission decision**: Via `resolveHookPermissionDecision()`, provide allow/block/ask suggestions to the permission system—but note, hook allow **does not equal unconditional clearance**; the permission system still runs subsequent `checkRuleBasedPermissions()` (source `toolHooks.ts:373`)

**PostToolUse Hook four capabilities**:
1. **Append messages**: Return an `AttachmentMessage`, injecting extra information into the next round of context
2. **Modify result**: Rewrite tool output (especially MCP tools' `updatedMCPToolOutput`)
3. **Block continuation**: Return `hook_stopped_continuation` control signal—the heartbeat loop stops advancing upon seeing this signal, even if the model wants to call more tools
4. **Failure-path governance**: `PostToolUseFailure` hooks let tool execution failures be caught and handled by hooks rather than thrown as exceptions directly

> **🔑 Key design**: Hooks **do not short-circuit the permission system**. Even if a PreToolUse hook returns `allow`, permission checks continue. This is a deliberate security design—hooks provide "advice," not "rulings."

---

## 7. MCP Tools: New Capabilities Brought by Diplomatic Envoys

MCP (Model Context Protocol) servers can register new tools. These tools appear in the tool list under the naming convention `mcp__serverName__toolName`.

From the system's perspective, MCP tools go through the **exact same pipeline** as built-in tools—input validation, permission check, execution, result serialization. The only difference is in the execution stage: built-in tools execute inside the local process, while MCP tools are sent via JSON-RPC to an external server. "Local and remote tools share a unified interface" is standard practice in distributed systems (e.g., gRPC's location transparency), but in the AI Agent field this is still a relatively new practice—MCP itself is an open standard led by Anthropic, and Claude Code's first-party deep integration is unsurprising. What is more noteworthy is MCP's **open-ecosystem significance**—it defines a vendor-neutral tool interface protocol, allowing third-party developers to write a tool server once and plug it into any AI client that supports MCP, rather than being locked into a single product's proprietary extension API.

> **🔑 OS Analogy:** MCP tools are like **food-delivery services**—you order through the same app (same interface), but the food is prepared at a distant restaurant and delivered to you. Same UI, different execution location.

**Security implication**: MCP tools are stricter by default than built-in tools—because external servers are outside local control. Enterprise admins can control which MCP servers may connect via `allowedDomains` (domain allowlist) and `strictPluginOnlyCustomization` (strict plugin restrictions).

---

## 8. Deferred Tools

Not every tool's schema needs to be loaded at startup. Tools marked with `isDeferred` only fetch their full schema when the AI actually needs them.

**Why**: Every tool's schema goes into the system prompt, consuming tokens. Among the 40 built-in tool directories, if half are niche tools (e.g., `ScheduleCron`, `RemoteTrigger`), their schemas waste tokens for nothing. Deferred loading only loads them after the AI explicitly searches for them via `ToolSearch`.

**Analogy**: A library cannot put every book in the lobby—obscure books stay in the warehouse and are fetched on demand. `ToolSearch` is the library's catalog system.

The rigor of this savings strategy is evident in the Skill system: Skill lists are strictly capped at 1% of the context window, each description limited to 250 characters, and large tool outputs are written to disk rather than kept in context. These numbers reflect Anthropic's meticulous token-budget management—every character is treated as a scarce resource.

**Token-economics significance**: The benefit of this design is not simply "don't send a few schemas." If the full schemas of all 40 built-in tool directories were stuffed into the system prompt, two problems would arise: (1) every API request would cost thousands more input tokens; (2) a bloated system prompt would **reduce prompt cache hit rate**—because Claude API's prompt cache matches by prefix, the more stable the system prompt, the higher the hit rate, and a dynamic tool list (e.g., MCP tools hot-plugging) would frequently change the system prompt and invalidate the cache. The combination of deferred loading + ToolSearch essentially transforms tool discovery from "static full broadcast" (all schemas dumped into the prompt at once) to "on-demand pull" (loaded only when the AI searches for them), a key optimization in Claude Code's token cost and cache efficiency.

---

## 9. Competitor Tool System Comparison

To understand Claude Code's tool design choices, we need to place it on a coordinate system of competitors to see where the real differentiation lies.

### Claude Code vs Cursor

| Dimension | Claude Code | Cursor |
|-----------|-------------|--------|
| Built-in tools | 40 built-in tool directories, four categories | ~15, mostly file operations |
| Permission granularity | Ten-step check chain, supports pattern matching (e.g., `Bash(git *)`) | Binary model: allow/deny, no fine-grained pattern matching |
| Extension mechanism | MCP protocol (open standard, can plug into any tool server) | Primarily built-in extensions, relatively closed third-party tool ecosystem |
| Tool concurrency | Tools self-declare concurrency safety, scheduler acts accordingly | Mostly serial execution |
| Runtime environment | Terminal-native, Bash directly calls system shell | Embedded in IDE, operates indirectly via editor APIs |

**Key point**: Cursor's advantage lies in IDE integration depth (code completion, inline diff); Claude Code's advantage lies in tool openness and permission-control granularity. Their architectural philosophies differ: Cursor is "an IDE with AI added"; Claude Code is "an AI equipped with an OS-level toolset."

### Claude Code vs Aider

Aider is another mainstream AI coding terminal tool, but the two have chosen radically different paths for file-editing tools:

| Dimension | Claude Code | Aider |
|-----------|-------------|-------|
| Editing method | `Edit` (diff replacement) + `Write` (full overwrite), two separate tools | Three switchable edit formats: `search/replace`, `whole file`, `diff` |
| Edit granularity | `Edit` specifies precise old_string → new_string replacement | `search/replace` is similar, but `whole file` mode makes the AI output the entire file |
| Strategy | System prompt forces AI to prefer `Edit` (local changes), `Write` only for new files | Automatically chooses format by model capability (small models use whole file, large models use diff) |
| Safety net | `fileHistoryTrackEdit()` auto-snapshots before modification, supports `/rewind` | Relies on git auto-commit rollback |

**Key point**: Aider's adaptive edit-format strategy is more flexible (adapts to different model capabilities), while Claude Code's Edit/Write separation is cleaner and more explicit (reduces choice fatigue for the model). Both are mature engineering solutions with different trade-offs.

### Claude Code vs GitHub Copilot Workspace

GitHub Copilot Workspace adopts a **plan-driven** tool invocation style: first have the AI generate a complete editing plan (which files to change, what each file should change), then execute the batch all at once. This contrasts sharply with Claude Code's **loop-driven** approach (each loop the AI decides the next step):

- **Copilot Workspace**: Planning and execution phases are separate; users can review the complete plan before execution. The downside is that if the plan is wrong, the entire batch must be redone.
- **Claude Code**: After each step the AI decides the next step based on results (`queryLoop` in Chapter 4), more flexible but the intermediate process is less predictable. The tool's `renderResultForAssistant()` is crucial here—the format of each step's result directly affects the quality of the AI's next decision.

---

## 9.5 Tool Descriptions as Implicit Instructions: Three Representative Cases

A tool's `description()` does not merely "tell the AI what this tool does"—it is an **implicit behavioral instruction** embedded in the tool metadata. Several core tools in Claude Code encode extensive operational procedures directly into their descriptions, so that the model learns behavioral norms while "discovering tool capabilities." Below are the three most representative cases.

---

### Case 1: BashTool's Git Safety Protocol

**Source location**: `src/tools/BashTool/prompt.ts`, `getCommitAndPRInstructions()` function (lines 42–161)

The BashTool description embeds a **120-line** complete Git operating procedure (`prompt.ts` totals 369 lines; `getCommitAndPRInstructions()` occupies lines 42–161, about 1/3 of the file), including step-by-step commit and PR workflows, seven safety prohibitions ("Git Safety Protocol"), parallel tool-calling strategy, and HEREDOC formatting requirements. This is the longest and most complex single tool description in Claude Code.

The following is the full external-user-facing text (the external-user branch of `getCommitAndPRInstructions()`, forming the bulk of the tool description):

```
# Committing changes with git

Only create commits when requested by the user. If unclear, ask first. When the user asks you to create a new git commit, follow these steps carefully:

You can call multiple tools in a single response. When multiple independent pieces of information are requested and all commands are likely to succeed, run multiple tool calls in parallel for optimal performance. The numbered steps below indicate which commands should be batched in parallel.

Git Safety Protocol:
- NEVER update the git config
- NEVER run destructive git commands (push --force, reset --hard, checkout ., restore ., clean -f, branch -D) unless the user explicitly requests these actions. Taking unauthorized destructive actions is unhelpful and can result in lost work, so it's best to ONLY run these commands when given direct instructions 
- NEVER skip hooks (--no-verify, --no-gpg-sign, etc) unless the user explicitly requests it
- NEVER run force push to main/master, warn the user if they request it
- CRITICAL: Always create NEW commits rather than amending, unless the user explicitly requests a git amend. When a pre-commit hook fails, the commit did NOT happen — so --amend would modify the PREVIOUS commit, which may result in destroying work or losing previous changes. Instead, after hook failure, fix the issue, re-stage, and create a NEW commit
- When staging files, prefer adding specific files by name rather than using "git add -A" or "git add .", which can accidentally include sensitive files (.env, credentials) or large binaries
- NEVER commit changes unless the user explicitly asks you to. It is VERY IMPORTANT to only commit when explicitly asked, otherwise the user will feel that you are being too proactive

1. Run the following bash commands in parallel, each using the Bash tool:
  - Run a git status command to see all untracked files. IMPORTANT: Never use the -uall flag as it can cause memory issues on large repos.
  - Run a git diff command to see both staged and unstaged changes that will be committed.
  - Run a git log command to see recent commit messages, so that you can follow this repository's commit message style.
2. Analyze all staged changes (both previously staged and newly added) and draft a commit message:
  - Summarize the nature of the changes (eg. new feature, enhancement to an existing feature, bug fix, refactoring, test, docs, etc.). Ensure the message accurately reflects the changes and their purpose (i.e. "add" means a wholly new feature, "update" means an enhancement to an existing feature, "fix" means a bug fix, etc.).
  - Do not commit files that likely contain secrets (.env, credentials.json, etc). Warn the user if they specifically request to commit those files
  - Draft a concise (1-2 sentences) commit message that focuses on the "why" rather than the "what"
  - Ensure it accurately reflects the changes and their purpose
3. Run the following commands in parallel:
   - Add relevant untracked files to the staging area.
   - Create the commit with a message ending with:
   Co-Authored-By: Claude <noreply@anthropic.com>
   - Run git status after the commit completes to verify success.
   Note: git status depends on the commit completing, so run it sequentially after the commit.
4. If the commit fails due to pre-commit hook: fix the issue and create a NEW commit

Important notes:
- NEVER run additional commands to read or explore code, besides git bash commands
- NEVER use the TodoWrite or Agent tools
- DO NOT push to the remote repository unless the user explicitly asks you to do so
- IMPORTANT: Never use git commands with the -i flag (like git rebase -i or git add -i) since they require interactive input which is not supported.
- IMPORTANT: Do not use --no-edit with git rebase commands, as the --no-edit flag is not a valid option for git rebase.
- If there are no changes to commit (i.e., no untracked files and no modifications), do not create an empty commit
- In order to ensure good formatting, ALWAYS pass the commit message via a HEREDOC, a la this example:
<example>
git commit -m "$(cat <<'EOF'
   Commit message here.

   Co-Authored-By: Claude <noreply@anthropic.com>
   EOF
   )"
</example>

# Creating pull requests
Use the gh command via the Bash tool for ALL GitHub-related tasks including working with issues, pull requests, checks, and releases. If given a Github URL use the gh command to get the information needed.

IMPORTANT: When the user asks you to create a pull request, follow these steps carefully:

1. Run the following bash commands in parallel using the Bash tool, in order to understand the current state of the branch since it diverged from the main branch:
   - Run a git status command to see all untracked files (never use -uall flag)
   - Run a git diff command to see both staged and unstaged changes that will be committed
   - Check if the current branch tracks a remote branch and is up to date with the remote, so you know if you need to push to the remote
   - Run a git log command and `git diff [base-branch]...HEAD` to understand the full commit history for the current branch (from the time it diverged from the base branch)
2. Analyze all changes that will be included in the pull request, making sure to look at all relevant commits (NOT just the latest commit, but ALL commits that will be included in the pull request!!!), and draft a pull request title and summary:
   - Keep the PR title short (under 70 characters)
   - Use the description/body for details, not the title
3. Run the following commands in parallel:
   - Create new branch if needed
   - Push to remote with -u flag if needed
   - Create PR using gh pr create with the format below. Use a HEREDOC to pass the body to ensure correct formatting.
<example>
gh pr create --title "the pr title" --body "$(cat <<'EOF'
## Summary
<1-3 bullet points>

## Test plan
[Bulleted markdown checklist of TODOs for testing the pull request...]

🤖 Generated with Claude Code
EOF
)"
</example>

Important:
- DO NOT use the TodoWrite or Agent tools
- Return the PR URL when you're done, so the user can see it

# Other common operations
- View comments on a Github PR: gh api repos/foo/bar/pulls/123/comments
```

**Analysis**: This text reveals a core engineering philosophy at Anthropic: **write operational norms into tool descriptions rather than the system prompt.** The benefits are several:

1. **Context binding**: The norm is bound to the tool, not floating in the system prompt. When the AI decides whether to use Bash for git operations, these constraints naturally appear within the tool's capability description.
2. **Modularity**: Each tool's norm is self-contained, easing maintenance and versioning. `BashTool/prompt.ts` is independent of `systemPrompt.ts` and can be updated on its own.
3. **Differentiation**: Note the `process.env.USER_TYPE === 'ant'` branch in the code—internal users ('ant') see a short version pointing to `/commit` and `/commit-push-pr` Skills, while external users see the full embedded version above. **The same tool shows different descriptions to different user types**—a sophisticated use of dynamic `description()`.
4. **The `--amend` trap warning**: Note the CRITICAL clause—"When a pre-commit hook fails, the commit did NOT happen — so --amend would modify the PREVIOUS commit." This is a precise warning born from real production scars, preventing the AI from misusing `--amend` after a hook failure.

**Anthropic internal vs. external version design difference** (`getCommitAndPRInstructions()` lines 56–76): Internal users are directed to dedicated `/commit` and `/commit-push-pr` Skills, while external users get the full embedded workflow. This shows that internally Anthropic has encapsulated these operations into more refined Skills, while externally it maintains an out-of-the-box experience that requires no extra configuration.

---

### Case 2: The Full AgentTool Description

**Source location**: `src/tools/AgentTool/prompt.ts`, `getPrompt()` function (lines 66–287)

The AgentTool description teaches the model **how to spawn a subagent, when to spawn one, and how to write prompts for subagents**. This description is itself an agent-orchestration manual.

Core structure (abridged):

```
Launch a new agent to handle complex, multi-step tasks autonomously.

The Agent tool launches specialized agents (subprocesses) that autonomously handle complex tasks. Each agent type has specific capabilities and tools available to it.

[Available agent types listed here]

When using the Agent tool, specify a subagent_type parameter to select which agent type to use. If omitted, the general-purpose agent is used.

When NOT to use the Agent tool:
- If you want to read a specific file path, use the Read tool or Glob tool instead of the Agent tool, to find the match more quickly
- If you are searching for a specific class definition like "class Foo", use the Glob tool instead, to find the match more quickly
- If you are searching for code within a specific file or set of 2-3 files, use the Read tool instead of the Agent tool, to find the match more quickly
- Other tasks that are not related to the agent descriptions above

Usage notes:
- Always include a short description (3-5 words) summarizing what the agent will do
- Launch multiple agents concurrently whenever possible, to maximize performance; to do that, use a single message with multiple tool uses
- When the agent is done, it will return a single message back to you. The result returned by the agent is not visible to the user. To show the user the result, you should send a text message back to the user with a concise summary of the result.
- You can optionally run agents in the background using the run_in_background parameter. When an agent runs in the background, you will be automatically notified when it completes — do NOT sleep, poll, or proactively check on its progress. Continue with other work or respond to the user instead.
- **Foreground vs background**: Use foreground (default) when you need the agent's results before you can proceed — e.g., research agents whose findings inform your next steps. Use background when you have genuinely independent work to do in parallel.
- To continue a previously spawned agent, use SendMessage with the agent's ID or name as the `to` field. The agent resumes with its full context preserved. Each Agent invocation starts fresh — provide a complete task description.
- The agent's outputs should generally be trusted
- Clearly tell the agent whether you expect it to write code or just to do research (search, file reads, web fetches, etc.), since it is not aware of the user's intent
- If the agent description mentions that it should be used proactively, then you should try your best to use it without the user having to ask for it first. Use your judgement.
- If the user specifies that they want you to run agents "in parallel", you MUST send a single message with multiple Agent tool use content blocks.
- You can optionally set `isolation: "worktree"` to run the agent in a temporary git worktree, giving it an isolated copy of the repository.

## Writing the prompt

Brief the agent like a smart colleague who just walked into the room — it hasn't seen this conversation, doesn't know what you've tried, doesn't understand why this task matters.
- Explain what you're trying to accomplish and why.
- Describe what you've already learned or ruled out.
- Give enough context about the surrounding problem that the agent can make judgment calls rather than just following a narrow instruction.
- If you need a short response, say so ("report in under 200 words").
- Lookups: hand over the exact command. Investigations: hand over the question — prescribed steps become dead weight when the premise is wrong.

Terse command-style prompts produce shallow, generic work.

**Never delegate understanding.** Don't write "based on your findings, fix the bug" or "based on the research, implement it." Those phrases push synthesis onto the agent instead of doing it yourself. Write prompts that prove you understood: include file paths, line numbers, what specifically to change.
```

**Analysis**: This description reveals how Claude Code **teaches agent orchestration through tool descriptions**:

1. **Explicitly listing "do NOT use" scenarios**: This is a counter-intuitive but crucial design—the tool description specifically lists "when you want to read a file, do not use Agent, use Read." This prevents the model from overusing a heavy-weight tool (spawning a subagent) for a light-weight task (reading one file). It is like writing "this tool is not suitable for small tasks" on the tool's instruction manual.
2. **"Writing the prompt" subchapter**: The tool description embeds a subagent prompt-writing guide, including the metaphor "brief the agent like a smart colleague" and precise warnings like "Never delegate understanding." This is **meta-prompting**—using a prompt to teach the AI how to write prompts.
3. **Fork-mode special handling** (lines 80–97): When `isForkSubagentEnabled()` is true, the description adds a "When to fork" chapter, instructing the AI when to fork itself (inherit context) rather than spawn a brand-new subagent. This is a classic case of dynamically injecting behavioral norms via a feature flag.
4. **Concurrency instruction**: The description explicitly requires "If the user specifies that they want you to run agents 'in parallel', you MUST send a single message with multiple Agent tool use content blocks"—this encodes a UI interaction norm (how to express parallelism at the API level) directly into the tool description.

---

### Case 3: EnterPlanMode's Planning Decision Tree

**Source location**: `src/tools/EnterPlanModeTool/prompt.ts`, `getEnterPlanModeToolPromptExternal()` function (lines 16–99)

The EnterPlanMode description is a complete **planning-mode activation decision manual**, teaching the AI when to enter plan mode (rather than starting to code immediately).

```
Use this tool proactively when you're about to start a non-trivial implementation task. Getting user sign-off on your approach before writing code prevents wasted effort and ensures alignment. This tool transitions you into plan mode where you can explore the codebase and design an implementation approach for user approval.

## When to Use This Tool

**Prefer using EnterPlanMode** for implementation tasks unless they're simple. Use it when ANY of these conditions apply:

1. **New Feature Implementation**: Adding meaningful new functionality
   - Example: "Add a logout button" - where should it go? What should happen on click?
   - Example: "Add form validation" - what rules? What error messages?

2. **Multiple Valid Approaches**: The task can be solved in several different ways
   - Example: "Add caching to the API" - could use Redis, in-memory, file-based, etc.
   - Example: "Improve performance" - many optimization strategies possible

3. **Code Modifications**: Changes that affect existing behavior or structure
   - Example: "Update the login flow" - what exactly should change?
   - Example: "Refactor this component" - what's the target architecture?

4. **Architectural Decisions**: The task requires choosing between patterns or technologies
   - Example: "Add real-time updates" - WebSockets vs SSE vs polling
   - Example: "Implement state management" - Redux vs Context vs custom solution

5. **Multi-File Changes**: The task will likely touch more than 2-3 files
   - Example: "Refactor the authentication system"
   - Example: "Add a new API endpoint with tests"

6. **Unclear Requirements**: You need to explore before understanding the full scope
   - Example: "Make the app faster" - need to profile and identify bottlenecks
   - Example: "Fix the bug in checkout" - need to investigate root cause

7. **User Preferences Matter**: The implementation could reasonably go multiple ways
   - If you would use AskUserQuestion to clarify the approach, use EnterPlanMode instead
   - Plan mode lets you explore first, then present options with context

## When NOT to Use This Tool

Only skip EnterPlanMode for simple tasks:
- Single-line or few-line fixes (typos, obvious bugs, small tweaks)
- Adding a single function with clear requirements
- Tasks where the user has given very specific, detailed instructions
- Pure research/exploration tasks (use the Agent tool with explore agent instead)

## What Happens in Plan Mode

In plan mode, you'll:
1. Thoroughly explore the codebase using Glob, Grep, and Read tools
2. Understand existing patterns and architecture
3. Design an implementation approach
4. Present your plan to the user for approval
5. Use AskUserQuestion if you need to clarify approaches
6. Exit plan mode with ExitPlanMode when ready to implement

## Examples

### GOOD - Use EnterPlanMode:
User: "Add user authentication to the app"
- Requires architectural decisions (session vs JWT, where to store tokens, middleware structure)

User: "Optimize the database queries"
- Multiple approaches possible, need to profile first, significant impact

User: "Implement dark mode"
- Architectural decision on theme system, affects many components

User: "Add a delete button to the user profile"
- Seems simple but involves: where to place it, confirmation dialog, API call, error handling, state updates

User: "Update the error handling in the API"
- Affects multiple files, user should approve the approach

### BAD - Don't use EnterPlanMode:
User: "Fix the typo in the README"
- Straightforward, no planning needed

User: "Add a console.log to debug this function"
- Simple, obvious implementation

User: "What files handle routing?"
- Research task, not implementation planning

## Important Notes

- This tool REQUIRES user approval - they must consent to entering plan mode
- If unsure whether to use it, err on the side of planning - it's better to get alignment upfront than to redo work
- Users appreciate being consulted before significant changes are made to their codebase
```

**Analysis**: The EnterPlanMode description reveals an important design choice at Anthropic: **instead of relying on the AI's "natural judgment" to decide when to plan, hard-code decision rules into the tool description.**

1. **The imperative tone of "Prefer using EnterPlanMode"**: Note the first sentence of the description, "Use this tool **proactively**"—this is an active instruction to the AI, not a passive capability description. The tool description is teaching the AI "plan by default, rather than act by default."
2. **Seven trigger rules + four exclusion rules**: This rule set is essentially a decision tree, transforming the "whether to enter plan mode" judgment from the model's "discretion" into **rule matching**. This reduces behavioral variance across AI instances—different AI instances facing the same task should make the same plan/do-not-plan decision.
3. **Different prompts for internal vs. external users** (`getEnterPlanModeToolPromptAnt()` lines 101–163): Internal users ('ant') see a **more concise, more permissive** version—"When in doubt, prefer starting work and using AskUserQuestion for specific questions over entering a full planning phase." The internal version trusts the AI's judgment more; the external version leans toward mandatory planning. This reflects Anthropic's trade-off between planning overhead in different user scenarios: external user tasks are more diverse, so planning is more valuable; internal expert tasks are usually more well-defined, so planning may be wasteful.
4. **The deep meaning of "Use this tool proactively"**: Writing "proactively" in a tool description means the AI should trigger this tool even without an explicit user request. This is the most direct manifestation of the "tool description = behavioral instruction" pattern in Claude Code's architecture—not the system prompt saying "you should plan," but the tool itself saying "in these situations, you should use me."

---

**Common Pattern Across the Three Cases**:

| Tool | Description Type | Core Design Intent |
|------|------------------|--------------------|
| BashTool Git instructions | Operating procedure (steps + prohibitions) | Prevent high-risk data-destructive operations from being executed incorrectly |
| AgentTool | Meta-prompt (how to write prompts for subagents) | Teach the AI how to effectively delegate work to subagents |
| EnterPlanMode | Decision tree (when to trigger planning) | Standardize the "plan vs. execute directly" judgment criteria |

These three cases show that Claude Code's "tools" are not merely capability containers, but **carriers of behavioral norms**. Anthropic has decomposed a large amount of instructions that could have lived in the system prompt and bound them to the most relevant tools' descriptions. The cost is that tool descriptions become extremely long (BashTool's description exceeds 500 tokens); the benefit is context-bound norms and modular maintainability.

---

## 10. Design Trade-offs

### Strengths

1. **The unified Tool interface** lets 40 built-in tool directories and an unlimited number of MCP tools share the same pipeline—registration, permissions, execution, and result handling are all reused
2. **AsyncGenerator return values** let tools continuously report progress during execution (UI updates in real time), rather than returning only at completion
3. **Edit/Write separation** is precise AI behavioral engineering—most scenarios only need Edit (local changes), while Write (full overwrite) is reserved for creating new files
4. **Deferred tools** deliver real token-economy benefits—every schema saved is hundreds of tokens saved from the system prompt
5. **The existence of `renderResultForAssistant()`** shows the team has seriously thought about "what result format the AI needs to see in order to make the best decisions"

### Costs

1. **20+ interface methods** raises the barrier to implementing new tools—a classic flexibility vs. ease-of-use trade-off
2. **Concurrency-safety declarations are self-reported**—if a tool author declares incorrectly, the system will not detect it (and it is hard to detect)
3. **Permission semantics for MCP tools are not fully identical to built-in tools**—built-in tools support finer-grained permission pattern matching (e.g., `Bash(git *)`), while MCP tools can only be controlled at the server level
4. **Large-result persistence strategy is "disk dump + reference" rather than "intelligent summarization"**—results exceeding the threshold are fully written to `~/.claude/tool-results/`, and the conversation body only keeps a reference-style preview (`<persisted-output>` tags). This guarantees no information loss, but the model sees a summary rather than the original text in subsequent turns, which may affect decision accuracy. GrowthBook can remotely adjust each tool's persistence threshold (`toolResultStorage.ts`)
5. **Dynamic tool lists** increase prompt cache miss probability—every time an MCP server changes, the tool schema portion of the cache invalidates

---

> **[Chart placeholder 2.5-A]**: Panoramic tool execution pipeline—from AI request to result return in 7 steps
> **[Chart placeholder 2.5-B]**: Distribution of 40 built-in tool directories across four categories—File Operations / Execution Engine / Agent Family / Extensions & Helpers
> **[Chart placeholder 2.5-C]**: Concurrency safety matrix—which tool combinations can run in parallel