[Bug] CLI crashes on session resume: synchronous JS pass blocks event loop, accumulates 7–11 GB RSS, then crashes with tengu_uncaught_exception cascade

# Claude CLI Crash Report — Submission to Anthropic Engineering

> **AI Disclosure**: Claude Sonnet 4.6 co-authored this document.
> **Last Review**: Unreviewed
> **[Version History](#version-history)**

---

## How This Was Discovered

I have been using Claude CLI in long investigation sessions with heavy MCP tool use. The CLI began crashing repeatedly in a pattern I could not explain. I started instrumenting the crashes: collecting the telemetry files the CLI writes at crash time, running macOS `sample` during sessions, and capturing Activity Monitor spindumps. Over two days I captured six crashes with progressively more diagnostic data.

The most important finding came from crash #5: the crash triggered within one second of session startup, before I typed anything or called any tool. This overturned my initial hypothesis (that MCP tool calls were triggering the crash) and pointed to the session load/resume process itself.

All artifacts referenced in this report are local files listed in [Section 7](#7-files-to-send-anthropic).

---

## 1. The Bug

After resuming a session with substantial accumulated context, the Claude CLI process enters a 26–39 second period of full CPU utilization with no output, then dies in a cascade of `tengu_uncaught_exception` events. The crash is 100% reproducible on resumed sessions from this investigation.

The silent window is not a hang — profiling shows the main event loop thread is fully active the entire time, executing synchronous JavaScript that opens and closes files in a tight loop. All 11 Bun worker pool threads are idle. The event loop is blocked and cannot process user input, I/O callbacks, or timers during this window.

Memory statistics at crash time uniformly show `heapUsed > heapTotal` — an impossible state under normal JavaScriptCore operation, indicating the runtime's internal accounting has already broken down before the exception cascade fires.

---

## 2. Environment

| Field | Value |
|-------|-------|
| Platform | macOS 15.7.4 (24G517), darwin arm64 |
| Hardware | 64 GB RAM; 0 bytes swap at crash time; ~37 GB free+reclaimable |
| Claude CLI versions | 2.1.69, 2.1.70, 2.1.71 — crash occurs across all three |
| Binary | Compiled arm64 bun 1.2.19 executable; JS engine is JavaScriptCore |
| Active betas | `claude-code-20250219`, `adaptive-thinking-2026-01-28`, **`context-management-2025-06-27`**, `prompt-caching-scope-2026-01-05` |
| Session type | Long-running investigation sessions with many MCP tool calls; resumed via `--resume` |

**Note on heap metrics**: `heapUsed`/`heapTotal` in Claude's telemetry come from bun's JSC-to-V8 compatibility shim, not from V8 itself. The `heapUsed > heapTotal` invariant violation is present in every crash record where heap fields are available, and indicates runtime accounting breakdown regardless of the shim's accuracy.

---

## 3. Evidence

### 3.1 All Six Crashes

All telemetry sourced from `~/.claude/telemetry/1p_failed_events.*.json`.

| # | Session | Date (UTC) | CLI | Silent gap | RSS at crash | heapUsed > heapTotal | Exception events |
|---|---------|-----------|-----|-----------|-------------|---------------------|--------|
| 1 | `b696a6a8` | 2026-03-06T20:36 | 2.1.70 | 28.7s | 1,598 MB | 710 > 257 MB | 399 |
| 2 | `b696a6a8` | 2026-03-06T20:45 | 2.1.70 | 35.9s | 1,495 MB | 442 > 181 MB | 389 |
| 3 | `92064525` | 2026-03-06T00:48 | 2.1.69 | 25.6s | 1,474 MB | 300 > 176 MB | 350 |
| 4 | `62f6efbc` | 2026-03-07T20:44 | 2.1.70 | 29.5s | 4,040 MB | 577 > 175 MB | 400 |
| 5 | `1594ad87` | 2026-03-07T21:40 | 2.1.71 | **38.8s** | ~10,700 MB ¹ | — ² | 386 |
| 6 | PID 47007  | 2026-03-07T21:56 | 2.1.71 | — ³ | ~7,700 MB ¹ | — ³ | — ³ |

¹ Physical footprint from `sample` output captured during the crash window, not from telemetry.
² Crash #5 telemetry event sequence (`tengu_native_*` startup events) does not include heap fields.
³ No telemetry file was produced for crash #6 — the process may have been OOM-killed before cleanup ran.

### 3.2 Crash #5 Eliminates MCP Tool Calls as the Trigger

The complete telemetry event sequence for crash #5 (`1594ad87-…d281fd04….json`):

| Time (UTC) | Event |
|-----------|-------|
| 21:40:20.048 | `tengu_status_line_mount` |
| 21:40:20.061 | `tengu_native_auto_updater_start` |
| 21:40:20.108 | `tengu_version_check_success` |
| 21:40:20.155 | `tengu_native_update_complete` / `tengu_native_auto_updater_success` |
| 21:40:21.004 | `tengu_native_version_cleanup` ← **last event before silence** |
| *(38.8 second gap — no events)* | |
| 21:40:59.755 | `tengu_uncaught_exception` × 386 |

No user message was sent. No tool was called. The session had just started via `--resume`. The background pass began 1 second after process launch, immediately after the auto-updater finished its startup sequence. This rules out any hypothesis that requires a tool call or model response to trigger the pass.

### 3.3 What Runs During the Silent Window

**Source**: `claude-sample-164025.txt` — macOS `sample` of PID 33425 captured at 21:40:40 UTC, midway through crash #5's 38.8s window.

- **Physical footprint at sample time**: 10.7 GB (peak 11.2 GB)
- **Main thread samples**: 124,422 — continuous, never entering `kevent64` (the idle/event-wait syscall)

Dominant main thread call path (collapsed):

```
start (dyld)
  → bun event loop  (2.1.71 +0x2c9a4c)
    → JS dispatch   (2.1.71 +0x2cd940, +0x1514bd0)
      → ...17 levels stripped bun binary...
        → JIT-compiled JS  (0x11dxxxxxxx — per-session JIT addresses)
          → open  (libsystem_kernel)   3,254 samples
          → close (libsystem_kernel)   1,023 samples
```

The JIT addresses (`0x11dxxxxxxx`) are compiled JavaScript, not native bun code. The operation is a JavaScript function that opens and closes files in a tight synchronous loop.

**All other threads during the same window**:

| Thread | Crash window | Baseline (pre-crash session) |
|--------|-------------|------------------------------|
| Main thread | 124,422 samples — 100% active | 94% in `kevent64` (idle) |
| Bun Pool 0–10 (11 threads) | 100% in `__ulock_wait2` (idle) | same |
| libpas scavenger | 97.9% in `__psynch_cvwait` (idle); 10 samples `madvise` | same |
| Heap Helper Threads (3) | 97.2% idle; 64 samples GC work | same |

The background pass is single-threaded and synchronous. The bun worker pool is uninvolved.

**Crash #6 sample** (`sample-of-2.1.71-1.txt`, PID 47007, 7.7 GB footprint, 9m51s into session):

Same synchronous main thread pattern, plus: `posix_spawn` (20 samples) + `__socketpair` (48 samples) + `setsockopt` / `fcntl` / `__close_nocancel` — consistent with spawning and configuring child processes (MCP server startup) as part of the same pass.

### 3.4 RSS Accumulates With Each Resume

Each `--resume` loads the context from the previous crash, which is larger than the one before. RSS grows proportionally faster:

| Crash | Process uptime | Physical footprint |
|-------|---------------|--------------------|
| #1 | 2.3 h | 1,598 MB at crash |
| #2 | 3.5 min (resumed) | 1,495 MB at crash |
| #3 | 7.4 h | 1,474 MB at crash |
| #4 | 3.86 h | 4,040 MB at crash |
| #5 | 5.3 min (resumed) | **10,700 MB** at sample |
| #6 | 9.8 min (resumed) | **7,700 MB** at sample |

A session that took 3.86 hours to accumulate 3.94 GB (crash #4) reached 10.7 GB in 5 minutes when resumed as crash #5. The context loaded at resume is the relevant driver, not wall-clock uptime.

---

## 4. Chain of Reasoning

**The background pass triggers on session load.**
Crash #5 telemetry: last event is `tengu_native_version_cleanup` at T+1s; no user message or tool call precedes the 38.8s gap. The pass is triggered by session initialization, not by user interaction.

**The pass runs synchronously on the main event loop thread.**
`claude-sample-164025.txt`: 124,422 consecutive main thread samples with no `kevent64` calls. All 11 Bun pool threads are idle throughout. The bun event loop cannot dispatch work while this runs.

**The pass performs intensive filesystem I/O via JIT-compiled JavaScript.**
Leaf syscalls in the main thread sample are `open` (3,254 samples) and `close` (1,023 samples), reached through JIT-compiled code at `0x11dxxxxxxx` addresses. The operation is JavaScript, not native C++.

**Memory grows in proportion to session context size.**
Physical footprint reaches 10.7 GB in 5 minutes (crash #5) vs. 4.0 GB after 3.86 hours (crash #4). The only variable between those runs was the amount of context loaded at resume.

**The JS runtime's accounting breaks down before the crash.**
`heapUsed > heapTotal` in every telemetry record with heap fields (crashes #1–4). This invariant cannot be violated under normal JSC operation. The runtime is in an internally inconsistent state when the exception cascade fires.

**`context-management-2025-06-27` is the most likely candidate.**
Present in all six crash sessions. Not in the default beta list. Name suggests context processing. Timing is consistent with a session-load context pass. Confidence: probable. Claude cannot rule out other betas without symbols or source.

---

## 5. Reproduction

**Trigger**: Resume any sufficiently large session via `--resume`.
**Expected**: Process crashes 26–39 seconds after startup with no user action required.

Confirmed conditions:
- Claude CLI 2.1.69–2.1.71 (persists across auto-update)
- Beta `context-management-2025-06-27` active
- Session has accumulated substantial context from extended MCP tool use

The minimum context threshold is not precisely determined. Crash #2 (b696a6a8 B) crashed after 3.5 minutes resuming 67K cached tokens. A fresh v2.1.71 session at 31 minutes / 95K cached tokens did not crash — the threshold is somewhere between those points.

**Workaround**: Do not `--resume` sessions that have accumulated large context. Use `/compact` before context grows large. Start fresh sessions (`/new`) at task boundaries.

---

## 6. Prior Art Search

Claude searched `anthropics/claude-code` GitHub issues before filing. The three closest matches found, and why each is distinct:

| Issue | Title | Similarity | Key distinction |
|-------|-------|------------|-----------------|
| [#24644](https://github.com/anthropics/claude-code/issues/24644) (closed duplicate) | Memory leak: CLI grows to 44 GB+ RAM with GC thrashing | macOS, `--resume` trigger, high RSS | Root cause is large `toolUseResult.stdout` data in a 67 MB JSONL file (670× memory amplification); GC thrashing over minutes, not a <40s synchronous crash at load; no beta flag identified |
| [#1421](https://github.com/anthropics/claude-code/issues/1421) | Recurring crashes: JavaScript Heap Out of Memory while 'thinking' | macOS, JS heap OOM crash | Node.js/V8 runtime (not Bun/JSC); crashes during active tool execution, not at session load; no synchronous blocking pass identified |
| [#18880](https://github.com/anthropics/claude-code/issues/18880) | `claude --resume` crashes on killed sessions | `--resume` trigger, startup crash | Root cause is a corrupt JSONL from a hard-killed session ("No messages returned" error); Linux only; no memory involvement |

Searches performed: "crash session resume", "tengu_uncaught_exception", "heapUsed heapTotal memory OOM macOS", "context-management beta flag crash", "RSS memory accumulation resume macOS", "synchronous event loop session load macOS crash".

---

## 7. Files to Send Anthropic

### Essential

| File | Signal |
|------|--------|
| `~/.claude/telemetry/1p_failed_events.1594ad87-…d281fd04….json` | Crash #5: proves pass triggers at session start with no tool call |
| `~/Desktop/claude-crash-20260307-161848/claude-sample-164025.txt` | 6.1 MB `sample` captured during crash #5 window: open/close loop |
| `~/Desktop/sample-of-2.1.71-1.txt` | Sample during crash #6: adds posix_spawn/socketpair finding |
| `~/Desktop/Spindump.txt` | Activity Monitor spindump during crash #5 |

### Supporting

| File | Signal |
|------|--------|
| `~/.claude/telemetry/1p_failed_events.62f6efbc-…87848655….json` | Crash #4: full heap fields at 3.94 GB RSS |
| `~/.claude/telemetry/1p_failed_events.b696a6a8-…2d6d60f3….json` | Crash #1: earliest event; heapUsed 710 > 257 MB heapTotal |
| `~/.claude/telemetry/1p_failed_events.b696a6a8-…eed481f9….json` | Crash #2: resumed session, 3.5 min uptime, same pattern |
| `~/.claude/telemetry/1p_failed_events.92064525-…74061b8b….json` | Crash #3: claude-opus-4-6 model, confirms model is not the variable |
| `~/Desktop/claude-crash-20260307-161848/claude-sample-163429.txt` | Pre-crash baseline sample: main thread 94% idle for comparison |

### Omit

- Session JSONL files (contain conversation content; available on request if useful for context size data)
- `crash-timeline.txt` (RSS monitor; data summarized in Section 3.4)
- `sample-of-2.1.71-2.txt` (identical to sample-1; Activity Monitor exported the same sample twice)

---

## Version History

| Date | Description | Changes | Review |
|------|-------------|---------|--------|
| 2026-Mar-07 | Initial submission draft — condensed from working investigation report | +185 lines | Unreviewed |
| 2026-Mar-07 | Added prior art search (§6); renumbered files section to §7 | +18 / -2 lines | Unreviewed |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] CLI crashes on session resume: synchronous JS pass blocks event loop, accumulates 7–11 GB RSS, then crashes with tengu_uncaught_exception cascade #31961

Claude CLI Crash Report — Submission to Anthropic Engineering

How This Was Discovered

1. The Bug

2. Environment

3. Evidence

3.1 All Six Crashes

3.2 Crash #5 Eliminates MCP Tool Calls as the Trigger

3.3 What Runs During the Silent Window

3.4 RSS Accumulates With Each Resume

4. Chain of Reasoning

5. Reproduction

6. Prior Art Search

7. Files to Send Anthropic

Essential

Supporting

Omit

Version History

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	Value
Platform	macOS 15.7.4 (24G517), darwin arm64
Hardware	64 GB RAM; 0 bytes swap at crash time; ~37 GB free+reclaimable
Claude CLI versions	2.1.69, 2.1.70, 2.1.71 — crash occurs across all three
Binary	Compiled arm64 bun 1.2.19 executable; JS engine is JavaScriptCore
Active betas	`claude-code-20250219`, `adaptive-thinking-2026-01-28`, `context-management-2025-06-27`, `prompt-caching-scope-2026-01-05`
Session type	Long-running investigation sessions with many MCP tool calls; resumed via `--resume`

#	Session	Date (UTC)	CLI	Silent gap	RSS at crash	heapUsed > heapTotal	Exception events
1	`b696a6a8`	2026-03-06T20:36	2.1.70	28.7s	1,598 MB	710 > 257 MB	399
2	`b696a6a8`	2026-03-06T20:45	2.1.70	35.9s	1,495 MB	442 > 181 MB	389
3	`92064525`	2026-03-06T00:48	2.1.69	25.6s	1,474 MB	300 > 176 MB	350
4	`62f6efbc`	2026-03-07T20:44	2.1.70	29.5s	4,040 MB	577 > 175 MB	400
5	`1594ad87`	2026-03-07T21:40	2.1.71	38.8s	~10,700 MB ¹	— ²	386
6	PID 47007	2026-03-07T21:56	2.1.71	— ³	~7,700 MB ¹	— ³	— ³

Time (UTC)	Event
21:40:20.048	`tengu_status_line_mount`
21:40:20.061	`tengu_native_auto_updater_start`
21:40:20.108	`tengu_version_check_success`
21:40:20.155	`tengu_native_update_complete` / `tengu_native_auto_updater_success`
21:40:21.004	`tengu_native_version_cleanup` ← last event before silence
(38.8 second gap — no events)
21:40:59.755	`tengu_uncaught_exception` × 386

Thread	Crash window	Baseline (pre-crash session)
Main thread	124,422 samples — 100% active	94% in `kevent64` (idle)
Bun Pool 0–10 (11 threads)	100% in `__ulock_wait2` (idle)	same
libpas scavenger	97.9% in `__psynch_cvwait` (idle); 10 samples `madvise`	same
Heap Helper Threads (3)	97.2% idle; 64 samples GC work	same

Crash	Process uptime	Physical footprint
#1	2.3 h	1,598 MB at crash
#2	3.5 min (resumed)	1,495 MB at crash
#3	7.4 h	1,474 MB at crash
#4	3.86 h	4,040 MB at crash
#5	5.3 min (resumed)	10,700 MB at sample
#6	9.8 min (resumed)	7,700 MB at sample

Issue	Title	Similarity	Key distinction
#24644 (closed duplicate)	Memory leak: CLI grows to 44 GB+ RAM with GC thrashing	macOS, `--resume` trigger, high RSS	Root cause is large `toolUseResult.stdout` data in a 67 MB JSONL file (670× memory amplification); GC thrashing over minutes, not a <40s synchronous crash at load; no beta flag identified
#1421	Recurring crashes: JavaScript Heap Out of Memory while 'thinking'	macOS, JS heap OOM crash	Node.js/V8 runtime (not Bun/JSC); crashes during active tool execution, not at session load; no synchronous blocking pass identified
#18880	`claude --resume` crashes on killed sessions	`--resume` trigger, startup crash	Root cause is a corrupt JSONL from a hard-killed session ("No messages returned" error); Linux only; no memory involvement

File	Signal
`~/.claude/telemetry/1p_failed_events.1594ad87-…d281fd04….json`	Crash #5: proves pass triggers at session start with no tool call
`~/Desktop/claude-crash-20260307-161848/claude-sample-164025.txt`	6.1 MB `sample` captured during crash #5 window: open/close loop
`~/Desktop/sample-of-2.1.71-1.txt`	Sample during crash #6: adds posix_spawn/socketpair finding
`~/Desktop/Spindump.txt`	Activity Monitor spindump during crash #5

File	Signal
`~/.claude/telemetry/1p_failed_events.62f6efbc-…87848655….json`	Crash #4: full heap fields at 3.94 GB RSS
`~/.claude/telemetry/1p_failed_events.b696a6a8-…2d6d60f3….json`	Crash #1: earliest event; heapUsed 710 > 257 MB heapTotal
`~/.claude/telemetry/1p_failed_events.b696a6a8-…eed481f9….json`	Crash #2: resumed session, 3.5 min uptime, same pattern
`~/.claude/telemetry/1p_failed_events.92064525-…74061b8b….json`	Crash #3: claude-opus-4-6 model, confirms model is not the variable
`~/Desktop/claude-crash-20260307-161848/claude-sample-163429.txt`	Pre-crash baseline sample: main thread 94% idle for comparison

Date	Description	Changes	Review
2026-Mar-07	Initial submission draft — condensed from working investigation report	+185 lines	Unreviewed
2026-Mar-07	Added prior art search (§6); renumbered files section to §7	+18 / -2 lines	Unreviewed

[Bug] CLI crashes on session resume: synchronous JS pass blocks event loop, accumulates 7–11 GB RSS, then crashes with tengu_uncaught_exception cascade #31961

Description

Claude CLI Crash Report — Submission to Anthropic Engineering

How This Was Discovered

1. The Bug

2. Environment

3. Evidence

3.1 All Six Crashes

3.2 Crash #5 Eliminates MCP Tool Calls as the Trigger

3.3 What Runs During the Silent Window

3.4 RSS Accumulates With Each Resume

4. Chain of Reasoning

5. Reproduction

6. Prior Art Search

7. Files to Send Anthropic

Essential

Supporting

Omit

Version History

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions