Skip to content

[Bug] CLI crashes on session resume: synchronous JS pass blocks event loop, accumulates 7–11 GB RSS, then crashes with tengu_uncaught_exception cascade #31961

@christianromney

Description

@christianromney

Claude CLI Crash Report — Submission to Anthropic Engineering

AI Disclosure: Claude Sonnet 4.6 co-authored this document.
Last Review: Unreviewed
Version History


How This Was Discovered

I have been using Claude CLI in long investigation sessions with heavy MCP tool use. The CLI began crashing repeatedly in a pattern I could not explain. I started instrumenting the crashes: collecting the telemetry files the CLI writes at crash time, running macOS sample during sessions, and capturing Activity Monitor spindumps. Over two days I captured six crashes with progressively more diagnostic data.

The most important finding came from crash #5: the crash triggered within one second of session startup, before I typed anything or called any tool. This overturned my initial hypothesis (that MCP tool calls were triggering the crash) and pointed to the session load/resume process itself.

All artifacts referenced in this report are local files listed in Section 7.


1. The Bug

After resuming a session with substantial accumulated context, the Claude CLI process enters a 26–39 second period of full CPU utilization with no output, then dies in a cascade of tengu_uncaught_exception events. The crash is 100% reproducible on resumed sessions from this investigation.

The silent window is not a hang — profiling shows the main event loop thread is fully active the entire time, executing synchronous JavaScript that opens and closes files in a tight loop. All 11 Bun worker pool threads are idle. The event loop is blocked and cannot process user input, I/O callbacks, or timers during this window.

Memory statistics at crash time uniformly show heapUsed > heapTotal — an impossible state under normal JavaScriptCore operation, indicating the runtime's internal accounting has already broken down before the exception cascade fires.


2. Environment

Field Value
Platform macOS 15.7.4 (24G517), darwin arm64
Hardware 64 GB RAM; 0 bytes swap at crash time; ~37 GB free+reclaimable
Claude CLI versions 2.1.69, 2.1.70, 2.1.71 — crash occurs across all three
Binary Compiled arm64 bun 1.2.19 executable; JS engine is JavaScriptCore
Active betas claude-code-20250219, adaptive-thinking-2026-01-28, context-management-2025-06-27, prompt-caching-scope-2026-01-05
Session type Long-running investigation sessions with many MCP tool calls; resumed via --resume

Note on heap metrics: heapUsed/heapTotal in Claude's telemetry come from bun's JSC-to-V8 compatibility shim, not from V8 itself. The heapUsed > heapTotal invariant violation is present in every crash record where heap fields are available, and indicates runtime accounting breakdown regardless of the shim's accuracy.


3. Evidence

3.1 All Six Crashes

All telemetry sourced from ~/.claude/telemetry/1p_failed_events.*.json.

# Session Date (UTC) CLI Silent gap RSS at crash heapUsed > heapTotal Exception events
1 b696a6a8 2026-03-06T20:36 2.1.70 28.7s 1,598 MB 710 > 257 MB 399
2 b696a6a8 2026-03-06T20:45 2.1.70 35.9s 1,495 MB 442 > 181 MB 389
3 92064525 2026-03-06T00:48 2.1.69 25.6s 1,474 MB 300 > 176 MB 350
4 62f6efbc 2026-03-07T20:44 2.1.70 29.5s 4,040 MB 577 > 175 MB 400
5 1594ad87 2026-03-07T21:40 2.1.71 38.8s ~10,700 MB ¹ — ² 386
6 PID 47007 2026-03-07T21:56 2.1.71 — ³ ~7,700 MB ¹ — ³ — ³

¹ Physical footprint from sample output captured during the crash window, not from telemetry.
² Crash #5 telemetry event sequence (tengu_native_* startup events) does not include heap fields.
³ No telemetry file was produced for crash #6 — the process may have been OOM-killed before cleanup ran.

3.2 Crash #5 Eliminates MCP Tool Calls as the Trigger

The complete telemetry event sequence for crash #5 (1594ad87-…d281fd04….json):

Time (UTC) Event
21:40:20.048 tengu_status_line_mount
21:40:20.061 tengu_native_auto_updater_start
21:40:20.108 tengu_version_check_success
21:40:20.155 tengu_native_update_complete / tengu_native_auto_updater_success
21:40:21.004 tengu_native_version_cleanuplast event before silence
(38.8 second gap — no events)
21:40:59.755 tengu_uncaught_exception × 386

No user message was sent. No tool was called. The session had just started via --resume. The background pass began 1 second after process launch, immediately after the auto-updater finished its startup sequence. This rules out any hypothesis that requires a tool call or model response to trigger the pass.

3.3 What Runs During the Silent Window

Source: claude-sample-164025.txt — macOS sample of PID 33425 captured at 21:40:40 UTC, midway through crash #5's 38.8s window.

  • Physical footprint at sample time: 10.7 GB (peak 11.2 GB)
  • Main thread samples: 124,422 — continuous, never entering kevent64 (the idle/event-wait syscall)

Dominant main thread call path (collapsed):

start (dyld)
  → bun event loop  (2.1.71 +0x2c9a4c)
    → JS dispatch   (2.1.71 +0x2cd940, +0x1514bd0)
      → ...17 levels stripped bun binary...
        → JIT-compiled JS  (0x11dxxxxxxx — per-session JIT addresses)
          → open  (libsystem_kernel)   3,254 samples
          → close (libsystem_kernel)   1,023 samples

The JIT addresses (0x11dxxxxxxx) are compiled JavaScript, not native bun code. The operation is a JavaScript function that opens and closes files in a tight synchronous loop.

All other threads during the same window:

Thread Crash window Baseline (pre-crash session)
Main thread 124,422 samples — 100% active 94% in kevent64 (idle)
Bun Pool 0–10 (11 threads) 100% in __ulock_wait2 (idle) same
libpas scavenger 97.9% in __psynch_cvwait (idle); 10 samples madvise same
Heap Helper Threads (3) 97.2% idle; 64 samples GC work same

The background pass is single-threaded and synchronous. The bun worker pool is uninvolved.

Crash #6 sample (sample-of-2.1.71-1.txt, PID 47007, 7.7 GB footprint, 9m51s into session):

Same synchronous main thread pattern, plus: posix_spawn (20 samples) + __socketpair (48 samples) + setsockopt / fcntl / __close_nocancel — consistent with spawning and configuring child processes (MCP server startup) as part of the same pass.

3.4 RSS Accumulates With Each Resume

Each --resume loads the context from the previous crash, which is larger than the one before. RSS grows proportionally faster:

Crash Process uptime Physical footprint
#1 2.3 h 1,598 MB at crash
#2 3.5 min (resumed) 1,495 MB at crash
#3 7.4 h 1,474 MB at crash
#4 3.86 h 4,040 MB at crash
#5 5.3 min (resumed) 10,700 MB at sample
#6 9.8 min (resumed) 7,700 MB at sample

A session that took 3.86 hours to accumulate 3.94 GB (crash #4) reached 10.7 GB in 5 minutes when resumed as crash #5. The context loaded at resume is the relevant driver, not wall-clock uptime.


4. Chain of Reasoning

The background pass triggers on session load.
Crash #5 telemetry: last event is tengu_native_version_cleanup at T+1s; no user message or tool call precedes the 38.8s gap. The pass is triggered by session initialization, not by user interaction.

The pass runs synchronously on the main event loop thread.
claude-sample-164025.txt: 124,422 consecutive main thread samples with no kevent64 calls. All 11 Bun pool threads are idle throughout. The bun event loop cannot dispatch work while this runs.

The pass performs intensive filesystem I/O via JIT-compiled JavaScript.
Leaf syscalls in the main thread sample are open (3,254 samples) and close (1,023 samples), reached through JIT-compiled code at 0x11dxxxxxxx addresses. The operation is JavaScript, not native C++.

Memory grows in proportion to session context size.
Physical footprint reaches 10.7 GB in 5 minutes (crash #5) vs. 4.0 GB after 3.86 hours (crash #4). The only variable between those runs was the amount of context loaded at resume.

The JS runtime's accounting breaks down before the crash.
heapUsed > heapTotal in every telemetry record with heap fields (crashes #1–4). This invariant cannot be violated under normal JSC operation. The runtime is in an internally inconsistent state when the exception cascade fires.

context-management-2025-06-27 is the most likely candidate.
Present in all six crash sessions. Not in the default beta list. Name suggests context processing. Timing is consistent with a session-load context pass. Confidence: probable. Claude cannot rule out other betas without symbols or source.


5. Reproduction

Trigger: Resume any sufficiently large session via --resume.
Expected: Process crashes 26–39 seconds after startup with no user action required.

Confirmed conditions:

  • Claude CLI 2.1.69–2.1.71 (persists across auto-update)
  • Beta context-management-2025-06-27 active
  • Session has accumulated substantial context from extended MCP tool use

The minimum context threshold is not precisely determined. Crash #2 (b696a6a8 B) crashed after 3.5 minutes resuming 67K cached tokens. A fresh v2.1.71 session at 31 minutes / 95K cached tokens did not crash — the threshold is somewhere between those points.

Workaround: Do not --resume sessions that have accumulated large context. Use /compact before context grows large. Start fresh sessions (/new) at task boundaries.


6. Prior Art Search

Claude searched anthropics/claude-code GitHub issues before filing. The three closest matches found, and why each is distinct:

Issue Title Similarity Key distinction
#24644 (closed duplicate) Memory leak: CLI grows to 44 GB+ RAM with GC thrashing macOS, --resume trigger, high RSS Root cause is large toolUseResult.stdout data in a 67 MB JSONL file (670× memory amplification); GC thrashing over minutes, not a <40s synchronous crash at load; no beta flag identified
#1421 Recurring crashes: JavaScript Heap Out of Memory while 'thinking' macOS, JS heap OOM crash Node.js/V8 runtime (not Bun/JSC); crashes during active tool execution, not at session load; no synchronous blocking pass identified
#18880 claude --resume crashes on killed sessions --resume trigger, startup crash Root cause is a corrupt JSONL from a hard-killed session ("No messages returned" error); Linux only; no memory involvement

Searches performed: "crash session resume", "tengu_uncaught_exception", "heapUsed heapTotal memory OOM macOS", "context-management beta flag crash", "RSS memory accumulation resume macOS", "synchronous event loop session load macOS crash".


7. Files to Send Anthropic

Essential

File Signal
~/.claude/telemetry/1p_failed_events.1594ad87-…d281fd04….json Crash #5: proves pass triggers at session start with no tool call
~/Desktop/claude-crash-20260307-161848/claude-sample-164025.txt 6.1 MB sample captured during crash #5 window: open/close loop
~/Desktop/sample-of-2.1.71-1.txt Sample during crash #6: adds posix_spawn/socketpair finding
~/Desktop/Spindump.txt Activity Monitor spindump during crash #5

Supporting

File Signal
~/.claude/telemetry/1p_failed_events.62f6efbc-…87848655….json Crash #4: full heap fields at 3.94 GB RSS
~/.claude/telemetry/1p_failed_events.b696a6a8-…2d6d60f3….json Crash #1: earliest event; heapUsed 710 > 257 MB heapTotal
~/.claude/telemetry/1p_failed_events.b696a6a8-…eed481f9….json Crash #2: resumed session, 3.5 min uptime, same pattern
~/.claude/telemetry/1p_failed_events.92064525-…74061b8b….json Crash #3: claude-opus-4-6 model, confirms model is not the variable
~/Desktop/claude-crash-20260307-161848/claude-sample-163429.txt Pre-crash baseline sample: main thread 94% idle for comparison

Omit

  • Session JSONL files (contain conversation content; available on request if useful for context size data)
  • crash-timeline.txt (RSS monitor; data summarized in Section 3.4)
  • sample-of-2.1.71-2.txt (identical to sample-1; Activity Monitor exported the same sample twice)

Version History

Date Description Changes Review
2026-Mar-07 Initial submission draft — condensed from working investigation report +185 lines Unreviewed
2026-Mar-07 Added prior art search (§6); renumbered files section to §7 +18 / -2 lines Unreviewed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions