fix(filewatch): replace fs.watch recursive with chokidar (Mac CPU partial mitigation)#1474
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f1f21f6a2f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| watcher = createJsonlWatcher(claudeDir, (fullPath) => scheduleFileChange(fullPath, sql)); | ||
|
|
||
| watcher.on('error', (err) => { | ||
| console.error('[filewatch] watcher error:', err.message); | ||
| // Could fall back to polling here in the future | ||
| const message = err instanceof Error ? err.message : String(err); | ||
| console.error('[filewatch] watcher error:', message); | ||
| }); | ||
|
|
||
| console.log(`[filewatch] watching ${claudeDir} (${offsetCache.size} sessions cached)`); |
There was a problem hiding this comment.
Wait for chokidar readiness before reporting startup success
startFilewatch now returns true immediately after createJsonlWatcher(...), but chokidar surfaces missing-path/permission failures asynchronously via error events rather than a synchronous throw. In environments where ~/.claude/projects does not exist yet (fresh installs) or is inaccessible, this path will report success, skip the scheduler's polling fallback, and silently disable session capture. Gate success on watcher ready (and treat early error as startup failure) so fallback logic still activates.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request migrates the session file watcher from node:fs to chokidar to improve reliability and adds support for new session path layouts. It also introduces a benchmark script for monitoring file system watcher performance. The review feedback highlights opportunities to improve cross-platform compatibility by using path.join for path construction and suggests removing redundant debouncing logic that overlaps with chokidar's built-in stability features.
| if (sessionsIdx > 0) { | ||
| // Main session | ||
| // Legacy main session | ||
| const projectPath = parts.slice(0, sessionsIdx).join('/'); |
There was a problem hiding this comment.
| const projectIdx = parts.lastIndexOf('projects'); | ||
| if (projectIdx >= 0 && parts.length === projectIdx + 3) { | ||
| // Main session | ||
| const projectPath = parts.slice(0, projectIdx + 2).join('/'); |
There was a problem hiding this comment.
| function scheduleFileChange(filePath: string, sql: SqlClient): void { | ||
| if (!filePath.endsWith('.jsonl')) return; | ||
|
|
||
| const existing = debounceTimers.get(filePath); | ||
| if (existing) clearTimeout(existing); | ||
|
|
||
| debounceTimers.set( | ||
| filePath, | ||
| setTimeout(() => { | ||
| debounceTimers.delete(filePath); | ||
| handleFileChange(filePath, sql).catch((err) => { | ||
| const message = err instanceof Error ? err.message : String(err); | ||
| console.error(`[filewatch] unhandled error for ${filePath}: ${message}`); | ||
| }); | ||
| }, DEBOUNCE_MS), | ||
| ); | ||
| } |
There was a problem hiding this comment.
The manual debouncing logic in scheduleFileChange appears redundant because createJsonlWatcher is already configured with awaitWriteFinish and a stabilityThreshold of DEBOUNCE_MS (500ms). This results in a cumulative delay of 1 second (500ms for file stability + 500ms in setTimeout) before processing a change. Consider removing the manual setTimeout and calling handleFileChange directly from the watcher events, while retaining the error handling.
Per-process Map<executorId, sessionId> in session-sync.ts caches NOTHING across `genie hook dispatch` bun forks (each fork starts with empty Map). Result: every cold-start hook fork did 3 DB round-trips (getAgentByName + getExecutor + audit) even when the (executorId, sessionId) pair was already-reconciled by a previous fork. Fix: - Disk-backed cache at ~/.genie/cache/session-sync.json (overridable via __GENIE_SESSION_SYNC_CACHE_FILE for tests) - loadDiskCache() runs once per fork BEFORE any DB call, replacing empty Map with previously-persisted (executorId, sessionId) pairs - persistDiskCache() writes after every cache mutation (atomic via write-temp + rename) - MAX_CACHE_ENTRIES=1000 with insertion-order trim — bounded growth - Concurrency: rename races can lose one fork's writes for OTHER executor entries (benign: at most occasional redundant DB syncs from later forks) - Corrupt cache file is tolerated — falls through to DB After this PR, hook forks have ZERO DB calls in the steady state for already-reconciled sessions (only on actual rotation or first capture). Validation: 18/18 src/hooks/__tests__/session-sync.test.ts pass (14 prior + 4 new: persist-after-reconcile, cold-start-loads-from-disk, cache-miss- falls-through, corrupt-file-tolerated). tsc --noEmit clean. biome clean. Fix E of the 5-step .19 Mac-CPU root-cause plan (A→E): - A: shipped #1475 — drop runRetention from getConnection - filewatch: shipped #1474 — chokidar replacement - C: shipped #1476 — GENIE_SKIP_DB_BOOT for hook dispatch - D: open #1479 — narrow inject matchers - E: this PR - B: in flight (twin) — needsSeed mtime cache (lower priority post-C) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per-process Map<executorId, sessionId> in session-sync.ts caches NOTHING across `genie hook dispatch` bun forks (each fork starts with empty Map). Result: every cold-start hook fork did 3 DB round-trips (getAgentByName + getExecutor + audit) even when the (executorId, sessionId) pair was already-reconciled by a previous fork. Fix: - Disk-backed cache at ~/.genie/cache/session-sync.json (overridable via __GENIE_SESSION_SYNC_CACHE_FILE for tests) - loadDiskCache() runs once per fork BEFORE any DB call, replacing empty Map with previously-persisted (executorId, sessionId) pairs - persistDiskCache() writes after every cache mutation (atomic via write-temp + rename) - MAX_CACHE_ENTRIES=1000 with insertion-order trim — bounded growth - Concurrency: rename races can lose one fork's writes for OTHER executor entries (benign: at most occasional redundant DB syncs from later forks) - Corrupt cache file is tolerated — falls through to DB After this PR, hook forks have ZERO DB calls in the steady state for already-reconciled sessions (only on actual rotation or first capture). Validation: 18/18 src/hooks/__tests__/session-sync.test.ts pass (14 prior + 4 new: persist-after-reconcile, cold-start-loads-from-disk, cache-miss- falls-through, corrupt-file-tolerated). tsc --noEmit clean. biome clean. Fix E of the 5-step .19 Mac-CPU root-cause plan (A→E): - A: shipped #1475 — drop runRetention from getConnection - filewatch: shipped #1474 — chokidar replacement - C: shipped #1476 — GENIE_SKIP_DB_BOOT for hook dispatch - D: open #1479 — narrow inject matchers - E: this PR - B: in flight (twin) — needsSeed mtime cache (lower priority post-C) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(hooks): persist session-sync cache to disk — Mac CPU fix E Per-process Map<executorId, sessionId> in session-sync.ts caches NOTHING across `genie hook dispatch` bun forks (each fork starts with empty Map). Result: every cold-start hook fork did 3 DB round-trips (getAgentByName + getExecutor + audit) even when the (executorId, sessionId) pair was already-reconciled by a previous fork. Fix: - Disk-backed cache at ~/.genie/cache/session-sync.json (overridable via __GENIE_SESSION_SYNC_CACHE_FILE for tests) - loadDiskCache() runs once per fork BEFORE any DB call, replacing empty Map with previously-persisted (executorId, sessionId) pairs - persistDiskCache() writes after every cache mutation (atomic via write-temp + rename) - MAX_CACHE_ENTRIES=1000 with insertion-order trim — bounded growth - Concurrency: rename races can lose one fork's writes for OTHER executor entries (benign: at most occasional redundant DB syncs from later forks) - Corrupt cache file is tolerated — falls through to DB After this PR, hook forks have ZERO DB calls in the steady state for already-reconciled sessions (only on actual rotation or first capture). Validation: 18/18 src/hooks/__tests__/session-sync.test.ts pass (14 prior + 4 new: persist-after-reconcile, cold-start-loads-from-disk, cache-miss- falls-through, corrupt-file-tolerated). tsc --noEmit clean. biome clean. Fix E of the 5-step .19 Mac-CPU root-cause plan (A→E): - A: shipped #1475 — drop runRetention from getConnection - filewatch: shipped #1474 — chokidar replacement - C: shipped #1476 — GENIE_SKIP_DB_BOOT for hook dispatch - D: open #1479 — narrow inject matchers - E: this PR - B: in flight (twin) — needsSeed mtime cache (lower priority post-C) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(hooks): skip disk cache I/O in test mode (post-rebase fix-up) After rebasing fix E onto current dev (post #1479/#1482/#1483/#1484 merges), the pre-existing session-sync (Gap 2) tests started failing because loadDiskCache() on first call would read the production ~/.genie/cache/session-sync.json — populated by earlier real-mode runs of the handler. Stale (executorId, sessionId) entries caused the in-memory cache to short-circuit BEFORE the test fixtures' mocked DB calls were made, so audit-event emissions never fired. Fix: loadDiskCache() and persistDiskCache() now skip ALL disk I/O when: - ANY _deps field is non-null (test installed mocks), OR - NODE_ENV=test or BUN_ENV=test UNLESS the test explicitly set __GENIE_SESSION_SYNC_CACHE_FILE via _setCacheFileForTest (the new fix-E tests opt in this way). Mirrors the existing test-mode skip in shouldSkipSync. Validation: 18/18 src/hooks/__tests__/session-sync.test.ts pass. Both the 14 pre-existing tests and the 4 new fix-E tests work — pre-existing tests get clean cache state; fix-E tests use their isolated tmp-dir cache file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Replaces
fs.watch(claudeDir, { recursive: true }, ...)insrc/lib/session-filewatch.tswith chokidar, which is FSEvents-aware on macOS, debounces internally, and supportsdepth/ignoredknobs.Why this is a partial mitigation, not THE Mac CPU fix
Original symptom:
@automagik/genie@4.260428.18Mac users reporting 100% CPU + machine freeze. Initial trace pointed at this watcher.Deeper analysis (verified against code paths) found the dominant Mac CPU consumer is hook-dispatch cold-start fanout, NOT this watcher:
bunforgenie hook dispatchdb.ts:665) →needsSeed()-triggered seed of all 92~/.claude/teamsentries (pg-seed.ts:88) → 3 PreToolUse handlers all hitting DBscheduler.logat line 95912+ shows"sorry, too many clients already"errorsbuncold-starts per minute on a busy Mac dev machineThat fix is a separate PR (5-step plan A→E, in flight).
What this PR DOES fix
The recursive
fs.watchon macOS uses FSEvents which fires per-directory across the entire~/.claude/projects/tree (~126 project hashes / 1659 JSONLs on a typical Mac dev box). Every JSONL append triggers a debounce timer + a freshbuildWorkerMap(sql)PG query +ingestFileFull. This is a real amplifier of the underlying CPU/PG-pool pressure, even if not the dominant consumer.Linux uses inotify per-file watches → benign. The same code shipped since
bdaa741f feat: session capture v2(inv3.260402.1andv4.260327.4), so this bug exists on@latest(4.260423.10) as well as@next(4.260428.18).Files changed
package.json,bun.lock— addedchokidar ^5.0.0src/lib/session-filewatch.ts(+99 / -? lines) — chokidar replacementsrc/lib/session-filewatch.test.ts(+45 lines) — testsValidation
bun test src/lib/session-filewatch.test.ts— 9/9 pass, 23 expect() callsMitigation guidance for Mac users until merge + .19 cut
genie serve stop— kills the daemon that runs the watcher. Loses session capture / scheduler / brain-vault auto-start, but stops the freeze.Co-authors
Original commit by Felipe; pushed and PR-opened by Genie orchestrator after twin handoff.