# Telemetry and Analytics System: A Complete Analysis

This chapter analyzes one of the most sensitive subsystems in Claude Code: how the telemetry system collects user behavior data, performs PII desensitization, and balances product analytics needs with user privacy protection through multi-level sampling and dynamic configuration.

---

> **🌍 Industry Context**: Telemetry systems are standard in all commercial developer tools, but they vary significantly in transparency and privacy protection. **VS Code** is the industry benchmark — its comprehensive [telemetry documentation](https://code.visualstudio.com/docs/getstarted/telemetry) details every category of collected data, provides a GUI toggle (`telemetry.telemetryLevel`), and the source code is fully open-source and auditable. **GitHub Copilot** faced scrutiny over telemetry issues in 2023 and subsequently added organization-level policy controls (`copilot_telemetry_disabled`). **Cursor** uses PostHog for product analytics and Sentry for error logging, but lacks a public telemetry data catalog. **Aider** sends anonymous usage statistics to MixPanel by default, which can be completely disabled via `--no-analytics`. **CodeX (OpenAI)** takes a simpler approach, collecting telemetry data primarily through API call logs. Claude Code's telemetry design is industry-leading at the technical level (type-level PII protection, dual-pipeline isolation, remote sampling control), but lags behind VS Code in user transparency (lacking a user-facing telemetry data catalog, defaulting to Opt-Out rather than Opt-In).

---

## Chapter Guide

The telemetry and analytics system is one of the most sensitive subsystems in Claude Code 2.1.88. It is responsible for collecting user behavior data, tool usage metrics, error logs, and other information, desensitizing it, and sending it to Anthropic's servers for analysis. This system is directly related to user privacy, so it is important to understand both its technical implementation and its privacy protection measures.

**Technical Analogy (OS Perspective)**: The telemetry system is like the **audit subsystem (auditd / ETW)** in an operating system — silently recording events during application runtime, filtering them through a pipeline (PII desensitization + sampling), and finally outputting them to multiple sinks (Datadog + first-party logs). GrowthBook acts as a remote policy server (Group Policy), controlling which features are enabled and which events are collected.

> 💡 **Plain English**: The telemetry system is like a **dashcam in a car** — silently recording usage data (events), desensitizing it through privacy filtering (like blurring license plates), and sending it back to headquarters for analysis (reporting to Datadog and Anthropic). The dashcam has an on/off switch (privacy level settings) and can receive remote updates (GrowthBook dynamic configuration).

## File Structure

The `src/services/analytics/` directory contains 9 files with approximately 4040 lines of code:

| File | Lines | Responsibility |
|------|------|------|
| `index.ts` | 174 | Public API — logEvent/logEventAsync + pre-startup event queue |
| `sink.ts` | 115 | Event routing — distributing events to Datadog and 1P pipelines |
| `metadata.ts` | ~800 | Metadata enrichment — attaching device/user/environment info to every event |
| `firstPartyEventLogger.ts` | ~400 | First-party event logging — OpenTelemetry Logger + sampling |
| `firstPartyEventLoggingExporter.ts` | ~800 | First-party event exporter — batch sending + retry + offline caching |
| `datadog.ts` | ~280 | Datadog RUM logs — event allowlist + batch sending |
| `growthbook.ts` | ~1000 | GrowthBook integration — feature gates + A/B experiments + remote config |
| `config.ts` | 38 | Global disable check — test / third-party provider / privacy level |
| `sinkKillswitch.ts` | 26 | Emergency kill-switch — remote kill-switch independent of config |

## 1. Event Lifecycle

### 1.1 Three-Channel Architecture

Claude Code's data reporting is not limited to a single `logEvent()` path — there are actually **three independent channels** that serve as backups to one another:

```
┌───────────────────────────────────────────────────────────────┐
│                    Three-Channel Telemetry Architecture        │
├──────────────┬──────────────────┬────────────────────────────┤
│ ① API Header │ ② 1P Event Log   │ ③ Datadog RUM              │
│ Embedded in  │ OpenTelemetry →  │ Client token hardcoded      │
│ system prompt│ BigQuery         │ Allowlist filters 30+ event │
│ Cannot be    │ (protobuf format)│ types                       │
│ disabled     │                  │ (pubbbf48e6d...)            │
│ (attribution)│                  │                             │
└──────────────┴──────────────────┴────────────────────────────┘
```

- **Channel ① (API Header)**: Every API request carries an attribution header embedded within the system prompt context. This cannot be individually disabled without affecting functionality. This is the most basic usage tracking.
- **Channels ② + ③**: Can be disabled through privacy level settings (`telemetry.disabled`), but Channel ① always remains active.

Events in channels ② and ③ pass through a three-stage pipeline:

```
┌──────────────┐    ┌──────────────┐    ┌──────────────────────────┐
│ Event        │ →  │ Sink Routing │ →  │ Two Pipelines             │
│ Generation   │    │ Sampling +   │    │ ② 1P Exporter (OpenTel)  │
│ logEvent()   │    │ Distribution │    │ ③ Datadog (allowlist     │
│ logEventAsync│    │              │    │    filtered)              │
└──────────────┘    └──────────────┘    └──────────────────────────┘
       ↑                   ↑                        ↑
   index.ts           sink.ts           datadog.ts + exporter.ts
```

> **Identifier Tracking Chain**: Even if a user changes their IP or clears cookies, the following combination of identifiers can still correlate the same user: `deviceId` (lifelong, stored in the local keychain) + `accountUuid` (account-level) + repo fingerprint (SHA256 extracted from the 4th/7th/20th characters of the message) + environment fingerprint (detection of 20+ terminal types, 30+ cloud platforms). Understanding this is important for making informed privacy choices.

### 1.2 Pre-Startup Event Queue

The most critical design in `index.ts` is the **pre-startup event queue** — events generated before the analytics sink is initialized are not lost:

```typescript
const eventQueue: QueuedEvent[] = []
let sink: AnalyticsSink | null = null

export function logEvent(eventName: string, metadata: LogEventMetadata): void {
  if (sink === null) {
    eventQueue.push({ eventName, metadata, async: false })
    return
  }
  sink.logEvent(eventName, metadata)
}

export function attachAnalyticsSink(newSink: AnalyticsSink): void {
  if (sink !== null) return  // idempotent
  sink = newSink

  if (eventQueue.length > 0) {
    const queuedEvents = [...eventQueue]
    eventQueue.length = 0

    // Asynchronous draining to avoid blocking the startup path
    queueMicrotask(() => {
      for (const event of queuedEvents) {
        sink!.logEvent(event.eventName, event.metadata)
      }
    })
  }
}
```

Using `queueMicrotask` instead of directly draining synchronously ensures that the draining operation does not block the return of `attachAnalyticsSink()`, allowing the caller to immediately continue with subsequent startup steps.

**⚠️ Precise Semantics of `queueMicrotask`**: A task scheduled by `queueMicrotask` executes during the **microtask phase of the current event loop tick**, not the next tick. This means it will not block the return of the current call stack (`attachAnalyticsSink` returns immediately), but the draining operation will still complete within the current tick, delaying subsequent macrotasks (such as I/O callbacks, `setTimeout` callbacks). If a large number of events have accumulated before startup, microtask draining may still introduce noticeable latency — it simply transforms the blocking from "synchronous blocking of the caller" to "microtask blocking of subsequent macrotasks." To truly defer the draining operation to the next event loop tick (completely avoiding impact on the remaining tasks in the current tick), one would need `setTimeout(fn, 0)` or Node.js's `setImmediate`. In Claude Code's actual scenario, the pre-startup queue typically contains only a few events, so the choice of `queueMicrotask` is practically reasonable — but in theory, it is not "zero-latency."

### 1.3 Type-Level PII Protection

`index.ts` defines two branded types to prevent accidentally logging sensitive data:

```typescript
// General desensitization marker — used to confirm a string contains no code or file paths
export type AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS = never

// PII tag marker — used for data that needs to enter privileged columns
export type AnalyticsMetadata_I_VERIFIED_THIS_IS_PII_TAGGED = never
```

Both of these types are `never`, meaning you cannot assign to them directly and must use an explicit `as` cast:

```typescript
// Incorrect usage (compile error)
logEvent('test', { path: someFilePath })

// Correct usage (requires developer confirmation)
logEvent('test', { 
  path: someFilePath as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS 
})
```

The verbose type names are **intentional** — every time a developer writes `as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS`, they are forced to read the reminder in the name, creating a form of "type-level code review."

> **📚 Course Connection**: This technique of using the type system to enforce security review is an application of **Type Safety** from programming language and compiler theory courses. The `never` type is the **Bottom Type** in type theory — no value can be assigned to it, so the only way around it is through an `as` type assertion, and every `as` is an explicit "I know what I'm doing" declaration.

**⚠️ Important Limitation: This is process-level protection, not technical-level protection.** It must be clearly understood that `as` assertions in TypeScript are the lightest form of type escape — they **completely disappear** at JavaScript runtime and provide **zero runtime protection**. Compared to Rust's `unsafe` blocks, the gap is fundamental:

| Dimension | Rust `unsafe` | TypeScript `as` |
|------|---------------|-----------------|
| Runtime effect | Compiler turns off borrow checker protection, truly changing runtime behavior | Completely disappears after compilation, zero runtime semantics |
| Audit visibility | The `unsafe` keyword is rare in a codebase, making all usage points easily locatable via grep | `as` is ubiquitous in TypeScript codebases, and reviewers are prone to "assertion fatigue" |
| Toolchain support | clippy can count and track all `unsafe` usage points | No standard tool automatically audits whether all `as AnalyticsMetadata_*` uses have actually undergone human verification |
| Bypass difficulty | Requires explicitly writing `unsafe {}` in code, which is highly conspicuous in PR review | A single `as` can bypass it, which is less noticeable than `// eslint-disable-next-line` |

Therefore, the actual effectiveness of this mechanism depends entirely on the team's code review discipline — if reviewers do not check each `as AnalyticsMetadata_*` usage individually, the type system's "reminder" is meaningless. Furthermore, if someone calls `logEvent` from pure JavaScript (rather than TypeScript), the type protection completely fails.

The industrial-grade implementation that truly leverages the type system for security is the **branded types + factory function** pattern (e.g., Google's Trusted Types, Meta's Opaque Types) — they ensure that only values that have passed validation logic can pass type checking, and do not allow `as` bypasses. Claude Code's `never` + `as` scheme is more like a **lightweight developer reminder mechanism** — effective in teams with strict review culture, but lacking mathematical guarantees at the type system level.

### 1.4 _PROTO_ Field Isolation

```typescript
export function stripProtoFields<V>(
  metadata: Record<string, V>,
): Record<string, V> {
  let result: Record<string, V> | undefined
  for (const key in metadata) {
    if (key.startsWith('_PROTO_')) {
      if (result === undefined) {
        result = { ...metadata }
      }
      delete result[key]
    }
  }
  return result ?? metadata
}
```

Fields prefixed with `_PROTO_` are only allowed into first-party privileged columns (PII-tagged proto columns). All other destinations (Datadog) have them stripped before routing. This is a "single-point stripping" design — calling `stripProtoFields` just once in `sink.ts` ensures that PII data does not leak into non-privileged storage.

## 2. Event Routing (Sink)

### 2.1 Routing Logic

`sink.ts` is the core of event routing:

```typescript
function logEventImpl(eventName: string, metadata: LogEventMetadata): void {
  // 1. Sampling decision
  const sampleResult = shouldSampleEvent(eventName)
  if (sampleResult === 0) return  // dropped by sampling

  const metadataWithSampleRate = sampleResult !== null
    ? { ...metadata, sample_rate: sampleResult }
    : metadata

  // 2. Datadog pipeline (after desensitization)
  if (shouldTrackDatadog()) {
    void trackDatadogEvent(eventName, stripProtoFields(metadataWithSampleRate))
  }

  // 3. First-party pipeline (full data, including _PROTO_ fields)
  logEventTo1P(eventName, metadataWithSampleRate)
}
```

Note the difference between step 2 and step 3:
- Datadog receives data processed by `stripProtoFields()` — **without** PII-tagged fields
- First-party logs receive the complete data — **with** `_PROTO_*` fields, which the exporter is responsible for routing to privileged columns

### 2.2 Datadog Toggle

```typescript
function shouldTrackDatadog(): boolean {
  if (isSinkKilled('datadog')) return false  // emergency kill
  if (isDatadogGateEnabled !== undefined) return isDatadogGateEnabled
  try {
    return checkStatsigFeatureGate_CACHED_MAY_BE_STALE(DATADOG_GATE_NAME)
  } catch {
    return false  // default to off on error
  }
}
```

Three-layer toggle: kill-switch → value initialized at startup → GrowthBook cached value. On failure, Datadog tracking defaults to **off** — this is the correct secure-by-default behavior.

## 3. Datadog Logging System

### 3.1 Event Allowlist

Lines 19-64 of `datadog.ts` define a strict event allowlist:

```typescript
const DATADOG_ALLOWED_EVENTS = new Set([
  'tengu_api_error',
  'tengu_api_success',
  'tengu_cancel',
  'tengu_exit',
  'tengu_init',
  'tengu_started',
  'tengu_tool_use_error',
  'tengu_tool_use_success',
  'tengu_voice_recording_started',
  'tengu_voice_toggled',
  // ... approximately 40 events total
])
```

Only events in the allowlist are sent to Datadog — this is an allowlist policy, defaulting to deny.

### 3.2 Tag Field Extraction

```typescript
const TAG_FIELDS = [
  'arch', 'clientType', 'errorType', 'http_status_range',
  'http_status', 'model', 'platform', 'provider', 'skillMode',
  'subscriptionType', 'toolName', 'userBucket', 'userType', 'version',
]
```

These fields are extracted as Datadog tags (for filtering and aggregation); the remaining fields stay in the log body.

### 3.3 Batch Sending

```typescript
const DEFAULT_FLUSH_INTERVAL_MS = 15000  // flush every 15 seconds
const MAX_BATCH_SIZE = 100               // max batch size of 100
const NETWORK_TIMEOUT_MS = 5000          // 5-second timeout

const DATADOG_LOGS_ENDPOINT = 'https://http-intake.logs.us5.datadoghq.com/api/v2/logs'
const DATADOG_CLIENT_TOKEN = 'pubbbf48e6d78dae54bceaa4acf463299bf'
```

Note: The Datadog Client Token is **public** (the `pub...` prefix indicates it is a write-only client token), but it is hardcoded in the source code, meaning anyone can write spoofed logs to this Datadog account.

## 4. First-Party Event Logging System

### 4.1 OpenTelemetry Integration

`firstPartyEventLogger.ts` uses OpenTelemetry's LoggerProvider + BatchLogRecordProcessor:

```typescript
import { BatchLogRecordProcessor, LoggerProvider } from '@opentelemetry/sdk-logs'

const BATCH_CONFIG_NAME = 'tengu_1p_event_batch_config'
type BatchConfig = {
  scheduledDelayMillis?: number   // batch interval
  maxExportBatchSize?: number     // max events per batch
  maxQueueSize?: number           // max queue capacity
  skipAuth?: boolean              // skip auth (for testing)
  maxAttempts?: number            // max retry attempts
  path?: string                   // API path
  baseUrl?: string                // API base URL
}
```

All batch parameters are controlled remotely through GrowthBook (`tengu_1p_event_batch_config`), allowing sending strategy adjustments without a new release.

### 4.2 Event Sampling

```typescript
export function shouldSampleEvent(eventName: string): number | null {
  const config = getEventSamplingConfig()
  const eventConfig = config[eventName]

  if (!eventConfig) return null       // not configured = 100% record
  const sampleRate = eventConfig.sample_rate

  if (sampleRate >= 1) return null    // 100% record
  if (sampleRate <= 0) return 0       // 0% record (drop)

  // Random sampling
  return Math.random() < sampleRate ? sampleRate : 0
}
```

Sampling configuration is also remote (`tengu_event_sampling_config`), allowing high-frequency events (like `tengu_tool_use_success`) to be downsampled to reduce data volume and cost. Events retained by sampling have a `sample_rate` field attached; during analysis, true event volume can be restored by weighting (`count / sample_rate`, the Horvitz-Thompson estimator principle).

**⚠️ Limitations of Naive Bernoulli Sampling**: `Math.random() < sampleRate` is **per-event independent sampling** (Bernoulli sampling) — each event is independently coin-flipped for retention or dropping. This means that events from the same user in the same session may be **partially retained and partially dropped**, making it impossible to reconstruct complete user behavior sequences during analysis. For example, an action chain containing `tool_use_start → tool_use_success → api_call` three events might only retain the first and third under a 50% sample rate, with the causal relationship in the middle lost.

A more mature industry practice is **deterministic hash-based sampling** (like Datadog APM's trace sampling) — making sampling decisions based on the hash of a session ID or trace ID, ensuring that all events in the same session/trace are either fully retained or fully dropped. This allows complete reconstruction of any retained user behavior sequence for analysis. Claude Code's choice of Bernoulli sampling is likely due to its simplicity and sufficiency for aggregate statistics ("how many tool_use_success events yesterday?"), but it sacrifices single-user behavior analysis capability — which is a practical deficiency for debugging issues reported by specific users.

### 4.3 Exporter Resilience Design

`firstPartyEventLoggingExporter.ts` implements multiple resilience guarantees:

```typescript
export class FirstPartyEventLoggingExporter implements LogRecordExporter {
  // Append-only logging — concurrency safe
  // Quadratic backoff retry
  // Drop after maxAttempts exceeded
  // Immediately retry queued failed events upon any successful export (endpoint has recovered)
  // Chunk large event sets for sending
  // Unauthenticated fallback on 401
}
```

The offline cache uses JSONL files stored in the `~/.claude/telemetry/` directory:

```typescript
const BATCH_UUID = randomUUID()       // process-unique ID to isolate failure files across runs
const FILE_PREFIX = '1p_failed_events.'

function getStorageDir(): string {
  return path.join(getClaudeConfigHomeDir(), 'telemetry')
}
```

## 5. GrowthBook Feature Gate System

`growthbook.ts` is the largest single file in the analytics directory (~1000 lines, 25% of the directory's code). It is not just a feature toggle — it is the core infrastructure through which Anthropic performs **remote behavior control** over Claude Code. Through GrowthBook, Anthropic can: enable/disable features, run A/B experiments, dynamically adjust runtime parameters, and emergency-shut down malfunctioning subsystems — all without a new release. Understanding this module is understanding "the boundary of Anthropic's operational control over Claude Code."

### 5.1 Architecture Overview: Remote Eval Mode

Claude Code uses GrowthBook's `remoteEval: true` mode — all feature values are pre-computed by the server; the client does not evaluate rules locally:

```typescript
const thisClient = new GrowthBook({
  apiHost: baseUrl,
  clientKey,
  attributes,
  remoteEval: true,
  cacheKeyAttributes: ['id', 'organizationUUID'],
  // Request with auth headers (needed for enterprise proxy scenarios)
  ...(authHeaders.error ? {} : { apiHostRequestHeaders: authHeaders.headers }),
})
```

This architectural choice has clear trade-offs:
- **Advantage**: The server can use arbitrarily complex targeting rules (including confidential bucketing logic), and the client doesn't need to know the rule details
- **Cost**: Every initialization and refresh requires a network request; extensive local caching and fallback logic are needed to handle offline scenarios

### 5.2 User Attributes and Targeting Dimensions

`growthbook.ts` sends the following user attributes to GrowthBook for targeting:

```typescript
export type GrowthBookUserAttributes = {
  id: string                    // user ID (actually deviceId)
  sessionId: string             // session ID
  deviceID: string              // device ID
  platform: 'win32' | 'darwin' | 'linux'
  apiBaseUrlHost?: string       // API host (enterprise proxy scenario)
  organizationUUID?: string     // organization UUID
  accountUUID?: string          // account UUID
  userType?: string             // user type (ant/external)
  subscriptionType?: string     // subscription type
  rateLimitTier?: string        // rate limit tier
  firstTokenTime?: number       // first usage time
  email?: string                // email
  appVersion?: string           // app version
  github?: GitHubActionsMetadata // GitHub Actions info
}
```

These attributes support multi-dimensional targeting strategies:
- **Internal dogfooding**: New features are first released to Anthropic internal employees based on `userType === 'ant'`
- **A/B experiment bucketing**: Deterministic bucketing based on `id` (device ID), so the same device always sees the same variant
- **Subscription tiering**: Differentiated features for different paid tiers based on `subscriptionType` and `rateLimitTier`
- **Enterprise proxy adaptation**: `apiBaseUrlHost` allows non-Anthropic direct-connect enterprise proxy deployments to be precisely targeted
- **Platform-specific configuration**: Platform-differentiated behavior based on `platform`

Note that the `id` field actually uses `deviceId` rather than user account ID — this means bucketing is **device-level**, and the same user may see different experiment variants on different devices.

### 5.3 Complete List of Feature Gates

Through source code search, one can find all feature gates remotely controlled through GrowthBook in Claude Code. The following are the main gate/config names extracted from the source:

| Gate/Config Name | Purpose | Call Pattern |
|------------|------|---------|
| `tengu_log_datadog_events` | Datadog telemetry toggle | `CACHED_MAY_BE_STALE` |
| `tengu_frond_boric` | Sink emergency kill-switch | `CACHED_MAY_BE_STALE` |
| `tengu_event_sampling_config` | Event sampling rate config | `CACHED_MAY_BE_STALE` |
| `tengu_1p_event_batch_config` | 1P event batch sending config | `CACHED_MAY_BE_STALE` |
| `tengu_amber_quartz_disabled` | Voice mode disable toggle | `CACHED_MAY_BE_STALE` |
| `tengu_scratch` | Coordinator mode toggle | `CACHED_MAY_BE_STALE` |
| `tengu_session_memory` | Session memory feature toggle | `CACHED_MAY_BE_STALE` |
| `tengu_lodestone_enabled` | Deep Link registration toggle | `CACHED_MAY_BE_STALE` |
| `tengu_surreal_dali` | Scheduled remote agent feature | `CACHED_MAY_BE_STALE` |
| `tengu_birch_trellis` | Bash permission control | `CACHED_MAY_BE_STALE` |
| `tengu_slate_prism` | CLI printing behavior | `CACHED_MAY_BE_STALE` |
| `tengu_otk_slot_v1` | OTK Slot feature | `CACHED_MAY_BE_STALE` |
| `tengu_max_version_config` | Max version control (forced update) | `BLOCKS_ON_INIT` |

All obfuscated names (like `tengu_amber_quartz`, `tengu_frond_boric`) use a naming convention of "project codename + two random words," which prevents external observers from inferring feature purposes from the config names — a security measure, but one that also increases audit difficulty.

### 5.4 Four-Level API and Consistency Guarantees

The GrowthBook module exposes four read APIs with different consistency guarantees, for different scenarios to choose from:

```
┌───────────────────────────────────┐  Consistency: Strong ←────────→ Weak
│ getDynamicConfig_BLOCKS_ON_INIT   │  Blocks until initialization completes, guarantees latest value
│ checkGate_CACHED_OR_BLOCKING      │  Returns quickly if cache is true, otherwise blocks
│ checkSecurityRestrictionGate      │  Waits for re-initialization, then reads cache
│ *_CACHED_MAY_BE_STALE             │  Reads cache only, may return stale values
└───────────────────────────────────┘  Performance: Poor ←──────────→ Good
```

This design reveals a classic **availability vs consistency** trade-off:

- **`_CACHED_MAY_BE_STALE`**: Used by all startup critical paths and synchronous contexts. It returns immediately from disk cache (`cachedGrowthBookFeatures` in `~/.claude.json`) or the in-memory `remoteEvalFeatureValues` Map, making no network requests. The value may be an old value written by a previous process.
- **`_BLOCKS_ON_INIT`**: Waits for the GrowthBook client to finish initialization (up to a 5-second timeout), ensuring the latest value is retrieved. Used for scenarios where "the latest config must be obtained to run correctly" (like max version control).
- **`checkGate_CACHED_OR_BLOCKING`**: A clever hybrid strategy — if the disk cache is already `true`, return immediately (trust the cached positive result); if `false` or missing, take the blocking path to get the latest value. Suitable for permission-style gates where "better to wait than to wrongly deny."
- **`checkSecurityRestrictionGate`**: Dedicated for security gates — if re-initialization is in progress (such as after a login switch), wait for initialization to complete before reading, avoiding the use of old user permission values.

**Race conditions are worth noting**: `_CACHED_MAY_BE_STALE` does not wait for `reinitializingPromise` — this means during the brief window of a login switch, values read through this API may belong to the **old user**. The suffix `_MAY_BE_STALE` is an honest label of this risk. For non-security-critical functionality (like UI configuration), this trade-off is acceptable; but for permission-related decisions, `checkSecurityRestrictionGate` must be used.

### 5.5 Experiment Exposure Logging and Deduplication

GrowthBook records an "exposure" when checking feature values — which experiment variant the user saw:

```typescript
type StoredExperimentData = {
  experimentId: string
  variationId: number
  inExperiment?: boolean
  hashAttribute?: string
  hashValue?: string
}
const experimentDataByFeature = new Map<string, StoredExperimentData>()
```

Exposure logging has two key design aspects:

1. **Session-level deduplication**: The `loggedExposures` Set ensures that only one exposure is logged per feature per session, preventing high-frequency call paths (like `isAutoMemoryEnabled` in a render loop) from generating large numbers of duplicate exposure events
2. **Deferred exposure**: If a feature is accessed before GrowthBook initialization completes (via `_CACHED_MAY_BE_STALE`), its feature key is added to the `pendingExposures` Set, and exposure logs are backfilled after initialization completes

```typescript
// Backfill all pending exposures after initialization
if (hadFeatures) {
  for (const feature of pendingExposures) {
    logExposureForFeature(feature)
  }
  pendingExposures.clear()
}
```

### 5.6 Three-Level Cache and Fallback Strategy

GrowthBook value resolution follows a strict fallback chain:

```
Environment variable override (CLAUDE_INTERNAL_FC_OVERRIDES, ant only)
    ↓ miss
Local config override (~/.claude.json growthBookOverrides, ant only)
    ↓ miss
In-memory cache (remoteEvalFeatureValues Map, filled after init in this process)
    ↓ miss
Disk cache (~/.claude.json cachedGrowthBookFeatures, cross-process persistent)
    ↓ miss
Default value (provided by caller)
```

The disk cache is maintained by `syncRemoteEvalToDisk()` — after each successful payload processing, the disk cache is completely overwritten (not merged), ensuring that features deleted on the server also disappear from disk.

**Environment variable overrides** are designed for internal testing and eval frameworks:
```bash
# Force-enable/disable specific features during testing
CLAUDE_INTERNAL_FC_OVERRIDES='{"tengu_session_memory": true}' claude
```

### 5.7 Re-initialization and Auth Changes

When a user logs in or out, GrowthBook needs to be completely destroyed and rebuilt (because `apiHostRequestHeaders` cannot be updated after client creation):

```typescript
export function refreshGrowthBookAfterAuthChange(): void {
  resetGrowthBook()       // destroy old client, clear all caches
  refreshed.emit()        // immediately notify subscribers to re-read (falling back to disk cache)
  reinitializingPromise = initializeGrowthBook()
    .catch(error => { logError(toError(error)); return null })
    .finally(() => { reinitializingPromise = null })
}
```

`reinitializingPromise` is the key to protecting security gates — `checkSecurityRestrictionGate` will `await` this Promise, ensuring that old user permission values are not returned before the auth switch completes.

### 5.8 Periodic Refresh

For long-running sessions, GrowthBook sets up periodic refresh:

```typescript
const GROWTHBOOK_REFRESH_INTERVAL_MS =
  process.env.USER_TYPE !== 'ant'
    ? 6 * 60 * 60 * 1000   // external users: 6 hours
    : 20 * 60 * 1000        // internal employees: 20 minutes
```

Internal employees refresh 18x more frequently than external users — allowing new features to become active for internal users within 20 minutes without restarting Claude Code. Refresh uses a "lightweight refresh" (`refreshGrowthBookFeatures`) rather than destroy-and-rebuild — preserving client state while only re-fetching feature values.

`refreshInterval.unref?.()` ensures this timer does not prevent the Node.js process from exiting naturally.

### 5.9 Architectural Issues with the GrowthBook Integration

**Vendor lock-in risk**: Much of the 1000 lines of adapter code deals with working around GrowthBook SDK issues:
- API returns `value` instead of `defaultValue` format mismatch (requires transform workaround in `processRemoteEvalPayload`)
- SDK's `evalFeature()` cannot correctly use pre-evaluated values in `remoteEval` mode (requires building a custom `remoteEvalFeatureValues` cache)
- SDK's `setForcedFeatures` is unreliable in `remoteEval` mode

These workarounds suggest that GrowthBook SDK's support for `remoteEval` mode is not mature, while Claude Code has deep dependency on this mode. If future GrowthBook SDK updates change these behaviors, these workarounds may break.

**Why GrowthBook instead of LaunchDarkly/Unleash?** The source code does not directly answer this, but it can be inferred: GrowthBook is open-source (MIT license), which for Anthropic means auditability and not being locked into a proprietary vendor. LaunchDarkly's SDK is more mature but proprietary and expensive; Unleash is another open-source option but with weaker remoteEval support. The price of choosing an open-source solution is having to fill the SDK gaps yourself — that is the essence of these 1000 lines of code.

**The operational cost of fail-closed**: If the GrowthBook service itself is unreachable (e.g., CDN outage), all gates using `_BLOCKS_ON_INIT` will fall back to disk cache or default values after a 5-second timeout. For the Datadog toggle (`tengu_log_datadog_events`), the default value is `false` — meaning precisely when telemetry data is most needed to diagnose the problem, the telemetry pipeline shuts itself off first. This is the classic tension between fail-closed (security-first) and fail-open (observability-first).

## 6. Metadata Enrichment

### 6.1 MCP Tool Name Desensitization

Lines 70-77 of `metadata.ts` desensitize MCP tool names:

```typescript
export function sanitizeToolNameForAnalytics(toolName: string) {
  if (toolName.startsWith('mcp__')) {
    return 'mcp_tool'  // all MCP tools unified as 'mcp_tool'
  }
  return toolName      // built-in tools keep original names
}
```

MCP tool names are formatted as `mcp__<server>__<tool>`, where the server name may expose user configuration (like `mcp__my-company-api__query`), which is medium-level PII. After desensitization, they are unified as `mcp_tool`.

But there are exceptions — certain trusted sources can retain full names:

```typescript
export function isAnalyticsToolDetailsLoggingEnabled(
  mcpServerType: string | undefined,
  mcpServerBaseUrl: string | undefined,
): boolean {
  if (process.env.CLAUDE_CODE_ENTRYPOINT === 'local-agent') return true  // Cowork
  if (mcpServerType === 'claudeai-proxy') return true   // claude.ai official proxy
  if (mcpServerBaseUrl && isOfficialMcpUrl(mcpServerBaseUrl)) return true  // official MCP
  return false
}
```

## 7. Global Disable Conditions

Lines 19-27 of `config.ts` define the global disable conditions:

```typescript
export function isAnalyticsDisabled(): boolean {
  return (
    process.env.NODE_ENV === 'test' ||
    isEnvTruthy(process.env.CLAUDE_CODE_USE_BEDROCK) ||
    isEnvTruthy(process.env.CLAUDE_CODE_USE_VERTEX) ||
    isEnvTruthy(process.env.CLAUDE_CODE_USE_FOUNDRY) ||
    isTelemetryDisabled()
  )
}
```

Disable scenarios:
1. **Test environment**: `NODE_ENV === 'test'`
2. **Third-party cloud providers**: Bedrock / Vertex / Foundry users do not send telemetry to Anthropic
3. **Privacy level**: `no-telemetry` or `essential-traffic` settings

### 7.1 Sink Kill-Switch

```typescript
// Obfuscated name: tengu_frond_boric = analytics pipeline emergency switch
const SINK_KILLSWITCH_CONFIG_NAME = 'tengu_frond_boric'

export function isSinkKilled(sink: SinkName): boolean {
  const config = getDynamicConfig_CACHED_MAY_BE_STALE<
    Partial<Record<SinkName, boolean>>
  >(SINK_KILLSWITCH_CONFIG_NAME, {})
  return config?.[sink] === true
}
```

This is a remote emergency kill-switch independent of `isAnalyticsDisabled()`. If Datadog has an issue (like anomalous billing), the Datadog sink can be remotely shut down via GrowthBook without affecting first-party logs — or vice versa.

## 8. Deep Privacy Impact Analysis

### 8.1 What Is Collected

Based on source code analysis, the following data is collected:

**Device info**: platform, architecture, WSL version, Linux distribution, VCS type
**Account info**: user ID (UUID), device ID, organization UUID, account UUID, subscription type, email
**Usage behavior**: tool invocations (name + result), API calls (model + status code), session duration, command usage
**Repository info**: `getRepoRemoteHash()` for repository remote address hash (note: hash, not plaintext)
**Experiment data**: GrowthBook feature values, experiment variants, exposure events

### 8.2 Desensitization Measures

1. **Type-level protection**: `AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS` forces developers to confirm that every string field contains no sensitive data
2. **MCP tool name desensitization**: Default replacement of `mcp__<server>__<tool>` with `mcp_tool`
3. **Repository address hashing**: Uses `getRepoRemoteHash()` instead of plaintext URL
4. **`_PROTO_` field isolation**: PII data only enters privileged storage columns and does not leak to Datadog
5. **Third-party provider exclusion**: Bedrock/Vertex/Foundry users send no telemetry whatsoever
6. **Global off switch**: `isTelemetryDisabled()` allows users to completely disable telemetry

### 8.3 Privacy Risk Points

**Hardcoded Datadog Token**: `pubbbf48e6d78dae54bceaa4acf463299bf` is hardcoded in the source. Although this is a write-only token (cannot read historical data), anyone can use it to inject fake logs into Anthropic's Datadog account, potentially affecting internal analysis results.

**Email field**: The `email` field in GrowthBookUserAttributes is PII. Although it is only sent to GrowthBook (not into Datadog/1P events), if GrowthBook's data is leaked, user emails would be exposed.

**Obfuscated names as double-edged sword**: Names like `tengu_amber_quartz` (voice), `tengu_frond_boric` (kill-switch) prevent external guessing of feature meanings, but also impede security auditing — auditors cannot infer purpose from config names.

**"Essential Traffic Only" ambiguity**: The exact meaning of `isEssentialTrafficOnly()` is not sufficiently clear in the source — what traffic counts as "essential"? Does it include GrowthBook feature checks? If so, then when users set the highest privacy level, feature gates (like the voice kill-switch) may not function correctly.

## Critical Analysis

### Strengths

1. **Type-level PII protection**: The `never` type marker is a practical developer reminder mechanism — the verbose type name forces developers to pause and think when writing an `as` cast. But its actual protective effect depends on the team's code review discipline, not on the type system's enforced guarantee (`as` assertions completely disappear at runtime, and the bypass cost is extremely low)
2. **Dual-pipeline isolation**: The separation of Datadog (desensitized) and 1P (privileged columns) ensures the "principle of least privilege" — most analysts can only see desensitized data
3. **Remote controllability**: Sampling rates, batch configuration, and kill-switches are all controlled remotely through GrowthBook, allowing response to emergencies without a new release
4. **Pre-startup queue**: Avoids the classic "initialization timing" problem in telemetry systems

### Weaknesses

1. **Hardcoded credentials**: Datadog Client Token and API endpoints are hardcoded in source code. While it is a write-only token, it is still a credential leak
2. **Opt-Out rather than Opt-In**: Telemetry is enabled by default (unless the user actively sets `no-telemetry`), which is controversial from a GDPR compliance perspective — while CLI tools are not as strictly regulated as websites, the "collect by default" approach may cause backlash in the privacy-sensitive developer community
3. **GrowthBook's own privacy issues**: Sending user attributes to GrowthBook (including email, organization UUID) is itself a data externalization — the security of GrowthBook's data depends on GrowthBook's own security practices
4. **Offline cache potential risk**: Failed events are cached in JSONL files in `~/.claude/telemetry/`. If these files are read by other programs (like malware or file sync tools), user behavior data may be leaked
5. **Missing data retention policy**: No logic is found in the source regarding how long collected data is retained or when it is deleted — this should be clearly stated in the privacy policy
6. **Ubiquitous `tengu_` prefix**: Almost all telemetry events are prefixed with `tengu_` (tengu is Claude Code's internal codename). This naming loses its obscurity once the source enters public circulation, and instead increases code comprehension difficulty

### Comparison with Industry Practice

Claude Code's telemetry design is technically mature — type-level PII protection, dual-pipeline isolation, and remote configuration are all common best-practice combinations in the industry. The OpenTelemetry + Datadog dual-pipeline architecture is widely used in SaaS products. However, in terms of transparency, it lags behind some competitors: VS Code's telemetry system has complete documentation explaining what is collected, why it is collected, and how to turn it off, and automatically generates a telemetry data catalog through the `vscode-telemetry-extractor` tool; GitHub Copilot allows telemetry to be completely disabled at the organization policy level; Aider provides a simple `--no-analytics` one-click off switch. Although Claude Code's telemetry can be controlled via environment variables and privacy levels, it lacks a user-facing, human-readable telemetry data catalog — this is an area for improvement in the increasingly open-source and transparency-focused developer tools market.
