Skip to content

fix(hooks): persist session-sync cache to disk — Mac CPU fix E#1481

Merged
namastex888 merged 2 commits into
devfrom
fix/mac-cpu-session-sync-cache-file
Apr 29, 2026
Merged

fix(hooks): persist session-sync cache to disk — Mac CPU fix E#1481
namastex888 merged 2 commits into
devfrom
fix/mac-cpu-session-sync-cache-file

Conversation

@namastex888
Copy link
Copy Markdown
Contributor

Summary

Fix E of the 5-step .19 Mac-CPU root-cause plan. Last big remaining hook-fork DB call.

The per-process Map<executorId, sessionId> in session-sync.ts caches NOTHING across genie hook dispatch bun forks (each fork starts with empty Map). Every cold-start hook fork did 3 DB round-trips (getAgentByName + getExecutor + audit) even when the (executorId, sessionId) pair was already-reconciled by a previous fork.

Fix

  • Disk-backed cache at ~/.genie/cache/session-sync.json (overridable via __GENIE_SESSION_SYNC_CACHE_FILE for tests)
  • loadDiskCache() runs once per fork before any DB call, replacing the empty Map with previously-persisted (executorId, sessionId) pairs
  • persistDiskCache() writes after every cache mutation (atomic via write-temp + rename)
  • MAX_CACHE_ENTRIES=1000 with insertion-order trim — bounded growth across weeks of use
  • Concurrency: rename races can lose one fork's writes for OTHER executor entries (benign: at most occasional redundant DB syncs from later forks)
  • Corrupt cache is tolerated — falls through to DB (handler must never throw)

Effect

After this PR, hook forks have ZERO DB calls in the steady state for already-reconciled sessions. DB is only hit on:

  • First capture (executor never reconciled before)
  • Session-id rotation (Claude Code resume / compaction)
  • Cache miss (new executor)

Validation

  • bun test src/hooks/__tests__/session-sync.test.ts18/18 pass (14 prior + 4 new):
    • writes cache file after a successful session.reconciled
    • cold-start fork loads cache from disk and skips DB calls
    • disk cache miss falls through to DB and persists result
    • corrupt cache file is tolerated — falls back to DB
  • bun x tsc --noEmit clean
  • bun x biome check clean

Mac-CPU sprint complete after this merges

Combined effect on Mac CPU

A typical hook fork on a busy Mac dev machine before .19:

  1. PG connect → runMigrations (~50ms)
  2. needsSeed() loops 92 ~/.claude/teams entries (~10ms) → runSeed (~20ms)
  3. runRetention 4 DELETEs against unbounded tables (~30ms)
  4. session-sync 3 DB round-trips on every PreToolUse + UserPromptSubmit (~15ms each)
  5. PostToolUse, SessionStart, SessionEnd, TeammateIdle, TaskCompleted bun forks for nothing (~30ms each)

After A+C+D+E: hook fork connects to PG only when a handler actually needs DB, AND session-sync skips even THAT for already-reconciled pairs. PostToolUse only fires for SendMessage. Unused events don't fire bun at all.

Ready for the .19 cut + dogfood validation pass + PR #1446 dev→main flip after the remaining release-blockers (#1465#1471) are fixed.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 476c4072-4e5d-4ed4-bd42-cacd7f0797e2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/mac-cpu-session-sync-cache-file

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a disk-backed cache for session synchronization to optimize performance by reducing database calls during hook execution. The implementation includes logic for loading, persisting, and bounding the cache size, supported by new unit tests. Review feedback identifies a critical race condition in the cache persistence logic that could result in data loss across concurrent processes and recommends adding structural validation when parsing the cache file to avoid potential runtime exceptions.

Comment on lines +87 to +99
function persistDiskCache(): void {
try {
const cacheFile = effectiveCacheFile();
mkdirSync(join(cacheFile, '..'), { recursive: true });
trimCache();
const obj = Object.fromEntries(syncedSessions);
const tmp = `${cacheFile}.tmp.${process.pid}`;
writeFileSync(tmp, JSON.stringify(obj));
renameSync(tmp, cacheFile);
} catch {
// Best-effort — in-memory cache still works for the current process.
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of persistDiskCache suffers from a significant race condition that leads to data loss in multi-agent environments.

Because each genie hook dispatch is a fresh process, it loads the cache once at startup. When persistDiskCache is called, it converts the current process's in-memory Map back to an object and overwrites the entire file. If two different agents (Executors A and B) run hooks simultaneously:

  1. Fork A loads {}.
  2. Fork B loads {}.
  3. Fork A reconciles A and writes {'A': 'sessA'} to disk.
  4. Fork B reconciles B and writes {'B': 'sessB'} to disk.
  5. Result: Executor A's reconciliation is lost from the disk cache.

To fix this, persistDiskCache should read the file again, merge its in-memory updates with the current disk content, and then write. While a race still exists between the read and the rename, the window for data loss is reduced from the entire process lifetime to a few milliseconds.

Comment on lines +62 to +67
const parsed = JSON.parse(readFileSync(cacheFile, 'utf-8')) as Record<string, string>;
for (const [executorId, sessionId] of Object.entries(parsed)) {
if (typeof sessionId === 'string' && sessionId.length > 0) {
syncedSessions.set(executorId, sessionId);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The loadDiskCache function lacks validation for the parsed JSON structure. If the cache file contains null (valid JSON), Object.entries(parsed) will throw a TypeError, which is caught by the generic catch block but could be avoided with a simple type check.

    const parsed = JSON.parse(readFileSync(cacheFile, 'utf-8'));
    if (parsed && typeof parsed === 'object' && !Array.isArray(parsed)) {
      for (const [executorId, sessionId] of Object.entries(parsed)) {
        if (typeof sessionId === 'string' && sessionId.length > 0) {
          syncedSessions.set(executorId, sessionId);
        }
      }
    }

@namastex888 namastex888 force-pushed the fix/mac-cpu-session-sync-cache-file branch from c81e967 to bbef525 Compare April 29, 2026 00:26
namastex888 and others added 2 commits April 28, 2026 21:42
Per-process Map<executorId, sessionId> in session-sync.ts caches NOTHING
across `genie hook dispatch` bun forks (each fork starts with empty Map).
Result: every cold-start hook fork did 3 DB round-trips
(getAgentByName + getExecutor + audit) even when the (executorId, sessionId)
pair was already-reconciled by a previous fork.

Fix:
- Disk-backed cache at ~/.genie/cache/session-sync.json (overridable via
  __GENIE_SESSION_SYNC_CACHE_FILE for tests)
- loadDiskCache() runs once per fork BEFORE any DB call, replacing empty
  Map with previously-persisted (executorId, sessionId) pairs
- persistDiskCache() writes after every cache mutation (atomic via write-temp
  + rename)
- MAX_CACHE_ENTRIES=1000 with insertion-order trim — bounded growth
- Concurrency: rename races can lose one fork's writes for OTHER executor
  entries (benign: at most occasional redundant DB syncs from later forks)
- Corrupt cache file is tolerated — falls through to DB

After this PR, hook forks have ZERO DB calls in the steady state for
already-reconciled sessions (only on actual rotation or first capture).

Validation: 18/18 src/hooks/__tests__/session-sync.test.ts pass (14 prior
+ 4 new: persist-after-reconcile, cold-start-loads-from-disk, cache-miss-
falls-through, corrupt-file-tolerated). tsc --noEmit clean. biome clean.

Fix E of the 5-step .19 Mac-CPU root-cause plan (A→E):
- A: shipped #1475 — drop runRetention from getConnection
- filewatch: shipped #1474 — chokidar replacement
- C: shipped #1476 — GENIE_SKIP_DB_BOOT for hook dispatch
- D: open #1479 — narrow inject matchers
- E: this PR
- B: in flight (twin) — needsSeed mtime cache (lower priority post-C)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After rebasing fix E onto current dev (post #1479/#1482/#1483/#1484 merges),
the pre-existing session-sync (Gap 2) tests started failing because
loadDiskCache() on first call would read the production
~/.genie/cache/session-sync.json — populated by earlier real-mode runs of
the handler. Stale (executorId, sessionId) entries caused the in-memory
cache to short-circuit BEFORE the test fixtures' mocked DB calls were
made, so audit-event emissions never fired.

Fix: loadDiskCache() and persistDiskCache() now skip ALL disk I/O when:
- ANY _deps field is non-null (test installed mocks), OR
- NODE_ENV=test or BUN_ENV=test
UNLESS the test explicitly set __GENIE_SESSION_SYNC_CACHE_FILE via
_setCacheFileForTest (the new fix-E tests opt in this way).

Mirrors the existing test-mode skip in shouldSkipSync.

Validation: 18/18 src/hooks/__tests__/session-sync.test.ts pass. Both the
14 pre-existing tests and the 4 new fix-E tests work — pre-existing tests
get clean cache state; fix-E tests use their isolated tmp-dir cache file.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@namastex888 namastex888 force-pushed the fix/mac-cpu-session-sync-cache-file branch from 43f6e45 to e18e053 Compare April 29, 2026 00:43
@namastex888 namastex888 merged commit 68d6c02 into dev Apr 29, 2026
11 checks passed
namastex888 added a commit that referenced this pull request Apr 29, 2026
…ok entry (#1489)

* docs(brainstorm): hookify-third-party-absorption design + wish (delivery #2)

Crystallized brainstorm + scaffolded wish for delivery #2 of the genie hookify
umbrella (delivery #1 shipped as PR #1485).

Delivery #2 makes genie the only Claude Code hook entry: absorbs existing
foreign hooks (Token Optimizer, ultratoken, future plugins) via a one-time
settings.json rewrite + subprocess-passthrough handlers, and lets operators
deploy custom hooks (e.g. rlmx run on every Bash) as plain TS code in
.genie/hooks/. Three-tier scoping (per-team > per-repo > global), trust-
allowlisted, with reload + test for inner-loop iteration.

Council reviewed (sentinel + architect + ergonomist + operator) — all four
push-backs folded into the design + wish:
- Sentinel: trust allowlist required (filesystem presence is not consent),
  versioned absorb snapshots, two-mode env capture (probe + offline) with
  denylist, threat model documented (single-operator machine).
- Architect: Handler interface gains version/source/manifest_path
  discriminated union; registry migrates const handlers to
  let registryRef ReadonlyArray Handler; loud shadowing instead of silent.
- Ergonomist: defineHook() config-object scaffold; genie hook reload + test
  ship with this delivery; rename absorb to import; broken hooks loud in list.
- Operator: versioned snapshots last 10; per-team archive lifecycle; 5 ms
  passthrough is a measured SLO with bench + alert template.

Plan-review fix-loop 1 closed all 6 reviewer gaps (trust threat model, probe
offline fallback, registry mutation contract, per-team archive lifecycle,
Handler vNext strategy, broken-hook recovery). WISH.md plan-reviewed SHIP.

4 execution groups, 3 waves:
- Wave 1: G1 foundation (registry + Handler v1 + loader + trust gate)
- Wave 2 parallel: G2 operator inner loop + G3 foreign-hook absorption
- Wave 3: G4 lifecycle wiring + telemetry/microbench + docs + delivery report

Files:
- .genie/brainstorms/hookify-third-party-absorption/DRAFT.md
- .genie/brainstorms/hookify-third-party-absorption/DESIGN.md
- .genie/wishes/hookify-third-party-absorption/WISH.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(brainstorm): patch hookify-third-party-absorption wish for Mac CPU follow-ups

Three Mac CPU follow-up PRs landed on dev after delivery #1 merged
(PR #1475 retention, #1476 GENIE_SKIP_DB_BOOT, #1481 narrowed matchers).
Wish plan shape unchanged but two implementation realities now apply:

1. DISPATCHED_EVENTS renamed to DISPATCHED_EVENT_MATCHERS in
   src/hooks/types.ts and shrunk from 6 events to 2 (PreToolUse plus
   PostToolUse:SendMessage). Group 3's import logic must consult the new
   constant; foreign hooks on un-wired events (PreCompact, SessionStart,
   etc.) cannot be absorbed and must be left in place with explicit
   reporting in --dry-run.
2. Tools that invoke hook code outside genie serve set GENIE_SKIP_DB_BOOT=1
   to mirror the bun-fallback path's behavior (genie hook test, scaffold
   validation).

Added [left-in-place] reporting requirement to import --dry-run plus a new
acceptance criterion covering the three-status output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant