fix(hooks): persist session-sync cache to disk — Mac CPU fix E#1481
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a disk-backed cache for session synchronization to optimize performance by reducing database calls during hook execution. The implementation includes logic for loading, persisting, and bounding the cache size, supported by new unit tests. Review feedback identifies a critical race condition in the cache persistence logic that could result in data loss across concurrent processes and recommends adding structural validation when parsing the cache file to avoid potential runtime exceptions.
| function persistDiskCache(): void { | ||
| try { | ||
| const cacheFile = effectiveCacheFile(); | ||
| mkdirSync(join(cacheFile, '..'), { recursive: true }); | ||
| trimCache(); | ||
| const obj = Object.fromEntries(syncedSessions); | ||
| const tmp = `${cacheFile}.tmp.${process.pid}`; | ||
| writeFileSync(tmp, JSON.stringify(obj)); | ||
| renameSync(tmp, cacheFile); | ||
| } catch { | ||
| // Best-effort — in-memory cache still works for the current process. | ||
| } | ||
| } |
There was a problem hiding this comment.
The current implementation of persistDiskCache suffers from a significant race condition that leads to data loss in multi-agent environments.
Because each genie hook dispatch is a fresh process, it loads the cache once at startup. When persistDiskCache is called, it converts the current process's in-memory Map back to an object and overwrites the entire file. If two different agents (Executors A and B) run hooks simultaneously:
- Fork A loads
{}. - Fork B loads
{}. - Fork A reconciles A and writes
{'A': 'sessA'}to disk. - Fork B reconciles B and writes
{'B': 'sessB'}to disk. - Result: Executor A's reconciliation is lost from the disk cache.
To fix this, persistDiskCache should read the file again, merge its in-memory updates with the current disk content, and then write. While a race still exists between the read and the rename, the window for data loss is reduced from the entire process lifetime to a few milliseconds.
| const parsed = JSON.parse(readFileSync(cacheFile, 'utf-8')) as Record<string, string>; | ||
| for (const [executorId, sessionId] of Object.entries(parsed)) { | ||
| if (typeof sessionId === 'string' && sessionId.length > 0) { | ||
| syncedSessions.set(executorId, sessionId); | ||
| } | ||
| } |
There was a problem hiding this comment.
The loadDiskCache function lacks validation for the parsed JSON structure. If the cache file contains null (valid JSON), Object.entries(parsed) will throw a TypeError, which is caught by the generic catch block but could be avoided with a simple type check.
const parsed = JSON.parse(readFileSync(cacheFile, 'utf-8'));
if (parsed && typeof parsed === 'object' && !Array.isArray(parsed)) {
for (const [executorId, sessionId] of Object.entries(parsed)) {
if (typeof sessionId === 'string' && sessionId.length > 0) {
syncedSessions.set(executorId, sessionId);
}
}
}c81e967 to
bbef525
Compare
Per-process Map<executorId, sessionId> in session-sync.ts caches NOTHING across `genie hook dispatch` bun forks (each fork starts with empty Map). Result: every cold-start hook fork did 3 DB round-trips (getAgentByName + getExecutor + audit) even when the (executorId, sessionId) pair was already-reconciled by a previous fork. Fix: - Disk-backed cache at ~/.genie/cache/session-sync.json (overridable via __GENIE_SESSION_SYNC_CACHE_FILE for tests) - loadDiskCache() runs once per fork BEFORE any DB call, replacing empty Map with previously-persisted (executorId, sessionId) pairs - persistDiskCache() writes after every cache mutation (atomic via write-temp + rename) - MAX_CACHE_ENTRIES=1000 with insertion-order trim — bounded growth - Concurrency: rename races can lose one fork's writes for OTHER executor entries (benign: at most occasional redundant DB syncs from later forks) - Corrupt cache file is tolerated — falls through to DB After this PR, hook forks have ZERO DB calls in the steady state for already-reconciled sessions (only on actual rotation or first capture). Validation: 18/18 src/hooks/__tests__/session-sync.test.ts pass (14 prior + 4 new: persist-after-reconcile, cold-start-loads-from-disk, cache-miss- falls-through, corrupt-file-tolerated). tsc --noEmit clean. biome clean. Fix E of the 5-step .19 Mac-CPU root-cause plan (A→E): - A: shipped #1475 — drop runRetention from getConnection - filewatch: shipped #1474 — chokidar replacement - C: shipped #1476 — GENIE_SKIP_DB_BOOT for hook dispatch - D: open #1479 — narrow inject matchers - E: this PR - B: in flight (twin) — needsSeed mtime cache (lower priority post-C) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After rebasing fix E onto current dev (post #1479/#1482/#1483/#1484 merges), the pre-existing session-sync (Gap 2) tests started failing because loadDiskCache() on first call would read the production ~/.genie/cache/session-sync.json — populated by earlier real-mode runs of the handler. Stale (executorId, sessionId) entries caused the in-memory cache to short-circuit BEFORE the test fixtures' mocked DB calls were made, so audit-event emissions never fired. Fix: loadDiskCache() and persistDiskCache() now skip ALL disk I/O when: - ANY _deps field is non-null (test installed mocks), OR - NODE_ENV=test or BUN_ENV=test UNLESS the test explicitly set __GENIE_SESSION_SYNC_CACHE_FILE via _setCacheFileForTest (the new fix-E tests opt in this way). Mirrors the existing test-mode skip in shouldSkipSync. Validation: 18/18 src/hooks/__tests__/session-sync.test.ts pass. Both the 14 pre-existing tests and the 4 new fix-E tests work — pre-existing tests get clean cache state; fix-E tests use their isolated tmp-dir cache file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
43f6e45 to
e18e053
Compare
…ok entry (#1489) * docs(brainstorm): hookify-third-party-absorption design + wish (delivery #2) Crystallized brainstorm + scaffolded wish for delivery #2 of the genie hookify umbrella (delivery #1 shipped as PR #1485). Delivery #2 makes genie the only Claude Code hook entry: absorbs existing foreign hooks (Token Optimizer, ultratoken, future plugins) via a one-time settings.json rewrite + subprocess-passthrough handlers, and lets operators deploy custom hooks (e.g. rlmx run on every Bash) as plain TS code in .genie/hooks/. Three-tier scoping (per-team > per-repo > global), trust- allowlisted, with reload + test for inner-loop iteration. Council reviewed (sentinel + architect + ergonomist + operator) — all four push-backs folded into the design + wish: - Sentinel: trust allowlist required (filesystem presence is not consent), versioned absorb snapshots, two-mode env capture (probe + offline) with denylist, threat model documented (single-operator machine). - Architect: Handler interface gains version/source/manifest_path discriminated union; registry migrates const handlers to let registryRef ReadonlyArray Handler; loud shadowing instead of silent. - Ergonomist: defineHook() config-object scaffold; genie hook reload + test ship with this delivery; rename absorb to import; broken hooks loud in list. - Operator: versioned snapshots last 10; per-team archive lifecycle; 5 ms passthrough is a measured SLO with bench + alert template. Plan-review fix-loop 1 closed all 6 reviewer gaps (trust threat model, probe offline fallback, registry mutation contract, per-team archive lifecycle, Handler vNext strategy, broken-hook recovery). WISH.md plan-reviewed SHIP. 4 execution groups, 3 waves: - Wave 1: G1 foundation (registry + Handler v1 + loader + trust gate) - Wave 2 parallel: G2 operator inner loop + G3 foreign-hook absorption - Wave 3: G4 lifecycle wiring + telemetry/microbench + docs + delivery report Files: - .genie/brainstorms/hookify-third-party-absorption/DRAFT.md - .genie/brainstorms/hookify-third-party-absorption/DESIGN.md - .genie/wishes/hookify-third-party-absorption/WISH.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(brainstorm): patch hookify-third-party-absorption wish for Mac CPU follow-ups Three Mac CPU follow-up PRs landed on dev after delivery #1 merged (PR #1475 retention, #1476 GENIE_SKIP_DB_BOOT, #1481 narrowed matchers). Wish plan shape unchanged but two implementation realities now apply: 1. DISPATCHED_EVENTS renamed to DISPATCHED_EVENT_MATCHERS in src/hooks/types.ts and shrunk from 6 events to 2 (PreToolUse plus PostToolUse:SendMessage). Group 3's import logic must consult the new constant; foreign hooks on un-wired events (PreCompact, SessionStart, etc.) cannot be absorbed and must be left in place with explicit reporting in --dry-run. 2. Tools that invoke hook code outside genie serve set GENIE_SKIP_DB_BOOT=1 to mirror the bun-fallback path's behavior (genie hook test, scaffold validation). Added [left-in-place] reporting requirement to import --dry-run plus a new acceptance criterion covering the three-status output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Fix E of the 5-step .19 Mac-CPU root-cause plan. Last big remaining hook-fork DB call.
The per-process
Map<executorId, sessionId>insession-sync.tscaches NOTHING acrossgenie hook dispatchbun forks (each fork starts with empty Map). Every cold-start hook fork did 3 DB round-trips (getAgentByName+getExecutor+ audit) even when the (executorId, sessionId) pair was already-reconciled by a previous fork.Fix
~/.genie/cache/session-sync.json(overridable via__GENIE_SESSION_SYNC_CACHE_FILEfor tests)loadDiskCache()runs once per fork before any DB call, replacing the empty Map with previously-persisted (executorId, sessionId) pairspersistDiskCache()writes after every cache mutation (atomic via write-temp + rename)MAX_CACHE_ENTRIES=1000with insertion-order trim — bounded growth across weeks of useEffect
After this PR, hook forks have ZERO DB calls in the steady state for already-reconciled sessions. DB is only hit on:
Validation
bun test src/hooks/__tests__/session-sync.test.ts— 18/18 pass (14 prior + 4 new):writes cache file after a successful session.reconciledcold-start fork loads cache from disk and skips DB callsdisk cache miss falls through to DB and persists resultcorrupt cache file is tolerated — falls back to DBbun x tsc --noEmitcleanbun x biome checkcleanMac-CPU sprint complete after this merges
fix/mac-cpu-needsseed-mtime-markerCombined effect on Mac CPU
A typical hook fork on a busy Mac dev machine before .19:
runMigrations(~50ms)needsSeed()loops 92~/.claude/teamsentries (~10ms) →runSeed(~20ms)runRetention4 DELETEs against unbounded tables (~30ms)After A+C+D+E: hook fork connects to PG only when a handler actually needs DB, AND session-sync skips even THAT for already-reconciled pairs. PostToolUse only fires for SendMessage. Unused events don't fire bun at all.
Ready for the .19 cut + dogfood validation pass + PR #1446 dev→main flip after the remaining release-blockers (#1465–#1471) are fixed.