Skip to content

fix(filewatch): replace fs.watch recursive with chokidar (Mac CPU partial mitigation)#1474

Merged
namastex888 merged 1 commit into
devfrom
fix/mac-cpu-recursive-fs-watch
Apr 28, 2026
Merged

fix(filewatch): replace fs.watch recursive with chokidar (Mac CPU partial mitigation)#1474
namastex888 merged 1 commit into
devfrom
fix/mac-cpu-recursive-fs-watch

Conversation

@namastex888
Copy link
Copy Markdown
Contributor

Summary

Replaces fs.watch(claudeDir, { recursive: true }, ...) in src/lib/session-filewatch.ts with chokidar, which is FSEvents-aware on macOS, debounces internally, and supports depth/ignored knobs.

Why this is a partial mitigation, not THE Mac CPU fix

Original symptom: @automagik/genie@4.260428.18 Mac users reporting 100% CPU + machine freeze. Initial trace pointed at this watcher.

Deeper analysis (verified against code paths) found the dominant Mac CPU consumer is hook-dispatch cold-start fanout, NOT this watcher:

  • Every Bash/Tool call from every Claude Code session forks a fresh bun for genie hook dispatch
  • That fork runs PG connect → retention DELETEs (db.ts:665) → needsSeed()-triggered seed of all 92 ~/.claude/teams entries (pg-seed.ts:88) → 3 PreToolUse handlers all hitting DB
  • Concrete evidence: scheduler.log at line 95912+ shows "sorry, too many clients already" errors
  • Hundreds of bun cold-starts per minute on a busy Mac dev machine

That fix is a separate PR (5-step plan A→E, in flight).

What this PR DOES fix

The recursive fs.watch on macOS uses FSEvents which fires per-directory across the entire ~/.claude/projects/ tree (~126 project hashes / 1659 JSONLs on a typical Mac dev box). Every JSONL append triggers a debounce timer + a fresh buildWorkerMap(sql) PG query + ingestFileFull. This is a real amplifier of the underlying CPU/PG-pool pressure, even if not the dominant consumer.

Linux uses inotify per-file watches → benign. The same code shipped since bdaa741f feat: session capture v2 (in v3.260402.1 and v4.260327.4), so this bug exists on @latest (4.260423.10) as well as @next (4.260428.18).

Files changed

  • package.json, bun.lock — added chokidar ^5.0.0
  • src/lib/session-filewatch.ts (+99 / -? lines) — chokidar replacement
  • src/lib/session-filewatch.test.ts (+45 lines) — tests

Validation

  • bun test src/lib/session-filewatch.test.ts — 9/9 pass, 23 expect() calls

Mitigation guidance for Mac users until merge + .19 cut

genie serve stop — kills the daemon that runs the watcher. Loses session capture / scheduler / brain-vault auto-start, but stops the freeze.

Co-authors

Original commit by Felipe; pushed and PR-opened by Genie orchestrator after twin handoff.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 338ce771-6152-40b6-8c4f-bd7dce3bd8b9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/mac-cpu-recursive-fs-watch

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@socket-security
Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addednpm/​chokidar@​5.0.010010010081100

View full report

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1f21f6a2f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +237 to 244
watcher = createJsonlWatcher(claudeDir, (fullPath) => scheduleFileChange(fullPath, sql));

watcher.on('error', (err) => {
console.error('[filewatch] watcher error:', err.message);
// Could fall back to polling here in the future
const message = err instanceof Error ? err.message : String(err);
console.error('[filewatch] watcher error:', message);
});

console.log(`[filewatch] watching ${claudeDir} (${offsetCache.size} sessions cached)`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Wait for chokidar readiness before reporting startup success

startFilewatch now returns true immediately after createJsonlWatcher(...), but chokidar surfaces missing-path/permission failures asynchronously via error events rather than a synchronous throw. In environments where ~/.claude/projects does not exist yet (fresh installs) or is inaccessible, this path will report success, skip the scheduler's polling fallback, and silently disable session capture. Gate success on watcher ready (and treat early error as startup failure) so fallback logic still activates.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates the session file watcher from node:fs to chokidar to improve reliability and adds support for new session path layouts. It also introduces a benchmark script for monitoring file system watcher performance. The review feedback highlights opportunities to improve cross-platform compatibility by using path.join for path construction and suggests removing redundant debouncing logic that overlaps with chokidar's built-in stability features.

if (sessionsIdx > 0) {
// Main session
// Legacy main session
const projectPath = parts.slice(0, sessionsIdx).join('/');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding .join('/') for path construction is not cross-platform friendly. Use path.join to ensure the correct platform-specific separator is used.

Suggested change
const projectPath = parts.slice(0, sessionsIdx).join('/');
const projectPath = join(...parts.slice(0, sessionsIdx));

const projectIdx = parts.lastIndexOf('projects');
if (projectIdx >= 0 && parts.length === projectIdx + 3) {
// Main session
const projectPath = parts.slice(0, projectIdx + 2).join('/');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding .join('/') for path construction is not cross-platform friendly. Use path.join to ensure the correct platform-specific separator is used.

Suggested change
const projectPath = parts.slice(0, projectIdx + 2).join('/');
const projectPath = join(...parts.slice(0, projectIdx + 2));

Comment on lines +183 to +199
function scheduleFileChange(filePath: string, sql: SqlClient): void {
if (!filePath.endsWith('.jsonl')) return;

const existing = debounceTimers.get(filePath);
if (existing) clearTimeout(existing);

debounceTimers.set(
filePath,
setTimeout(() => {
debounceTimers.delete(filePath);
handleFileChange(filePath, sql).catch((err) => {
const message = err instanceof Error ? err.message : String(err);
console.error(`[filewatch] unhandled error for ${filePath}: ${message}`);
});
}, DEBOUNCE_MS),
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The manual debouncing logic in scheduleFileChange appears redundant because createJsonlWatcher is already configured with awaitWriteFinish and a stabilityThreshold of DEBOUNCE_MS (500ms). This results in a cumulative delay of 1 second (500ms for file stability + 500ms in setTimeout) before processing a change. Consider removing the manual setTimeout and calling handleFileChange directly from the watcher events, while retaining the error handling.

@namastex888 namastex888 merged commit d3976b9 into dev Apr 28, 2026
11 checks passed
namastex888 added a commit that referenced this pull request Apr 29, 2026
Per-process Map<executorId, sessionId> in session-sync.ts caches NOTHING
across `genie hook dispatch` bun forks (each fork starts with empty Map).
Result: every cold-start hook fork did 3 DB round-trips
(getAgentByName + getExecutor + audit) even when the (executorId, sessionId)
pair was already-reconciled by a previous fork.

Fix:
- Disk-backed cache at ~/.genie/cache/session-sync.json (overridable via
  __GENIE_SESSION_SYNC_CACHE_FILE for tests)
- loadDiskCache() runs once per fork BEFORE any DB call, replacing empty
  Map with previously-persisted (executorId, sessionId) pairs
- persistDiskCache() writes after every cache mutation (atomic via write-temp
  + rename)
- MAX_CACHE_ENTRIES=1000 with insertion-order trim — bounded growth
- Concurrency: rename races can lose one fork's writes for OTHER executor
  entries (benign: at most occasional redundant DB syncs from later forks)
- Corrupt cache file is tolerated — falls through to DB

After this PR, hook forks have ZERO DB calls in the steady state for
already-reconciled sessions (only on actual rotation or first capture).

Validation: 18/18 src/hooks/__tests__/session-sync.test.ts pass (14 prior
+ 4 new: persist-after-reconcile, cold-start-loads-from-disk, cache-miss-
falls-through, corrupt-file-tolerated). tsc --noEmit clean. biome clean.

Fix E of the 5-step .19 Mac-CPU root-cause plan (A→E):
- A: shipped #1475 — drop runRetention from getConnection
- filewatch: shipped #1474 — chokidar replacement
- C: shipped #1476 — GENIE_SKIP_DB_BOOT for hook dispatch
- D: open #1479 — narrow inject matchers
- E: this PR
- B: in flight (twin) — needsSeed mtime cache (lower priority post-C)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
namastex888 added a commit that referenced this pull request Apr 29, 2026
Per-process Map<executorId, sessionId> in session-sync.ts caches NOTHING
across `genie hook dispatch` bun forks (each fork starts with empty Map).
Result: every cold-start hook fork did 3 DB round-trips
(getAgentByName + getExecutor + audit) even when the (executorId, sessionId)
pair was already-reconciled by a previous fork.

Fix:
- Disk-backed cache at ~/.genie/cache/session-sync.json (overridable via
  __GENIE_SESSION_SYNC_CACHE_FILE for tests)
- loadDiskCache() runs once per fork BEFORE any DB call, replacing empty
  Map with previously-persisted (executorId, sessionId) pairs
- persistDiskCache() writes after every cache mutation (atomic via write-temp
  + rename)
- MAX_CACHE_ENTRIES=1000 with insertion-order trim — bounded growth
- Concurrency: rename races can lose one fork's writes for OTHER executor
  entries (benign: at most occasional redundant DB syncs from later forks)
- Corrupt cache file is tolerated — falls through to DB

After this PR, hook forks have ZERO DB calls in the steady state for
already-reconciled sessions (only on actual rotation or first capture).

Validation: 18/18 src/hooks/__tests__/session-sync.test.ts pass (14 prior
+ 4 new: persist-after-reconcile, cold-start-loads-from-disk, cache-miss-
falls-through, corrupt-file-tolerated). tsc --noEmit clean. biome clean.

Fix E of the 5-step .19 Mac-CPU root-cause plan (A→E):
- A: shipped #1475 — drop runRetention from getConnection
- filewatch: shipped #1474 — chokidar replacement
- C: shipped #1476 — GENIE_SKIP_DB_BOOT for hook dispatch
- D: open #1479 — narrow inject matchers
- E: this PR
- B: in flight (twin) — needsSeed mtime cache (lower priority post-C)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
namastex888 added a commit that referenced this pull request Apr 29, 2026
* fix(hooks): persist session-sync cache to disk — Mac CPU fix E

Per-process Map<executorId, sessionId> in session-sync.ts caches NOTHING
across `genie hook dispatch` bun forks (each fork starts with empty Map).
Result: every cold-start hook fork did 3 DB round-trips
(getAgentByName + getExecutor + audit) even when the (executorId, sessionId)
pair was already-reconciled by a previous fork.

Fix:
- Disk-backed cache at ~/.genie/cache/session-sync.json (overridable via
  __GENIE_SESSION_SYNC_CACHE_FILE for tests)
- loadDiskCache() runs once per fork BEFORE any DB call, replacing empty
  Map with previously-persisted (executorId, sessionId) pairs
- persistDiskCache() writes after every cache mutation (atomic via write-temp
  + rename)
- MAX_CACHE_ENTRIES=1000 with insertion-order trim — bounded growth
- Concurrency: rename races can lose one fork's writes for OTHER executor
  entries (benign: at most occasional redundant DB syncs from later forks)
- Corrupt cache file is tolerated — falls through to DB

After this PR, hook forks have ZERO DB calls in the steady state for
already-reconciled sessions (only on actual rotation or first capture).

Validation: 18/18 src/hooks/__tests__/session-sync.test.ts pass (14 prior
+ 4 new: persist-after-reconcile, cold-start-loads-from-disk, cache-miss-
falls-through, corrupt-file-tolerated). tsc --noEmit clean. biome clean.

Fix E of the 5-step .19 Mac-CPU root-cause plan (A→E):
- A: shipped #1475 — drop runRetention from getConnection
- filewatch: shipped #1474 — chokidar replacement
- C: shipped #1476 — GENIE_SKIP_DB_BOOT for hook dispatch
- D: open #1479 — narrow inject matchers
- E: this PR
- B: in flight (twin) — needsSeed mtime cache (lower priority post-C)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(hooks): skip disk cache I/O in test mode (post-rebase fix-up)

After rebasing fix E onto current dev (post #1479/#1482/#1483/#1484 merges),
the pre-existing session-sync (Gap 2) tests started failing because
loadDiskCache() on first call would read the production
~/.genie/cache/session-sync.json — populated by earlier real-mode runs of
the handler. Stale (executorId, sessionId) entries caused the in-memory
cache to short-circuit BEFORE the test fixtures' mocked DB calls were
made, so audit-event emissions never fired.

Fix: loadDiskCache() and persistDiskCache() now skip ALL disk I/O when:
- ANY _deps field is non-null (test installed mocks), OR
- NODE_ENV=test or BUN_ENV=test
UNLESS the test explicitly set __GENIE_SESSION_SYNC_CACHE_FILE via
_setCacheFileForTest (the new fix-E tests opt in this way).

Mirrors the existing test-mode skip in shouldSkipSync.

Validation: 18/18 src/hooks/__tests__/session-sync.test.ts pass. Both the
14 pre-existing tests and the 4 new fix-E tests work — pre-existing tests
get clean cache state; fix-E tests use their isolated tmp-dir cache file.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant