Skip to content

fix(convert): mtime check before derive_project_slug (#612) — v1.3.74#687

Merged
Pratiyush merged 1 commit into
masterfrom
refactor/612-split-convert-all
Apr 27, 2026
Merged

fix(convert): mtime check before derive_project_slug (#612) — v1.3.74#687
Pratiyush merged 1 commit into
masterfrom
refactor/612-split-convert-all

Conversation

@Pratiyush

Copy link
Copy Markdown
Owner

Summary

`convert_all` no longer calls `adapter.derive_project_slug(path)` on every session before the mtime check. On Codex CLI that helper opens every `.jsonl` to read the `session_meta` `cwd` field — so a no-op sync of a 5k-session corpus paid 5k needless file opens. The fix reorders the per-session loop so the cheap mtime check runs first; slug derivation, project filter, and ignore filter only run for sessions that actually need conversion.

Closes #612 (`#arch-h9`).

What changed

  • `llmwiki/convert.py` — moved `path.stat()` + state-mtime check above the `derive_project_slug` call inside the per-session loop. No behavior change for sessions that DO get converted; the work is just reordered.
  • `tests/test_convert_no_op_perf.py` — new regression test exercising a counting adapter that fails CI if a future change re-introduces the early-derive pattern.

What's new

Surface Change
File opens per no-op Codex sync (5k corpus) 5,000 → 0
Mtime check ordering `derive_project_slug → filters → mtime`
Test coverage new `test_no_op_sync_does_not_call_derive_project_slug`

Behavioural delta

Before After
Per-session work for `state[key] == mtime` (unchanged) `stat` + `derive_project_slug` (file open on Codex) + project filter + ignore `stat` only
Per-session work for sessions that DO need conversion identical (slug + filters + write) identical (slug + filters + write)
Behavior change visible from outside none none
5k-session no-op sync wall-clock (dominated by 5k file opens) (dominated by stat() — negligible)

How to test it

```bash
python3 -m pytest tests/test_convert_no_op_perf.py -v
python3 -m pytest tests/ -q -m "not slow"

Manual perf sniff on a real Codex corpus:

python3 -m llmwiki sync # first run — converts everything
python3 -m llmwiki sync # second run — should print all-unchanged in <1s
```

Pre-merge checklist

  • One intent — single perf-fix + corresponding regression test
  • All CI checks green — local non-slow suite passes; CI to confirm
  • Linked issue — `Closes high: #arch-h9 — convert_all is 300-line monstrosity with 3 separate write paths. CodexCliAdap... #612` in body
  • Conventional-commit title — `fix(convert): ...`
  • Tests added or updated — new `tests/test_convert_no_op_perf.py` asserts the contract; full existing suite passes
  • CHANGELOG.md updated — `[1.3.74]` entry under Fixed
  • Breaking changes flagged — N/A; no public surface change
  • No new runtime dependencies — N/A
  • No real session data — N/A; test uses an in-memory counting adapter
  • No machine-specific paths — N/A
  • Docs updated — N/A; the change is internal to the sync loop
  • Release notes drafted — see CHANGELOG.md `[1.3.74]` Fixed bullet
  • UI verified — N/A, no UI surface
  • A11y verified — N/A, no UI surface
  • Commits GPG-signed — yes
  • Reviewer has read every changed line — diff is +176 / -12 across 6 files; most of the new lines are the regression test scaffold

Bundle

  • `llmwiki/convert.py` — per-session loop reorder; `stat()`/mtime first, slug+filters second
  • `tests/test_convert_no_op_perf.py` — new test using `_CountingAdapter` that registers temporarily, runs `convert_all` twice, and asserts `derive_project_slug` was not called on the second (no-op) run
  • `llmwiki/init.py`, `pyproject.toml` — version 1.3.73 → 1.3.74
  • `README.md` — version badge bump
  • `CHANGELOG.md` — `[1.3.74]` entry under Fixed

Out of scope / follow-ups

  • The issue title also called `convert_all` a "300-line monstrosity with 3 separate write paths". The two real write paths (markdown-source copy at the start of the loop, and parsed-jsonl render at the bottom) could be extracted into helpers in a future PR — held off here to keep this PR scoped to the perf bug. The reorder above is the part that was actually breaking on real corpora.

Next

After merge: tag `v1.3.74`, then run the README-claim audit (#682) or move directly to a final consolidated PyPI release (will need PyPI trusted-publisher to be configured first per the v1.2.0 plan; flagging that as the only remaining blocker).

Closes #612 (#arch-h9).

convert_all's per-session loop used to call
adapter.derive_project_slug(path) BEFORE the mtime check. On Codex CLI
that helper opens every .jsonl to read the session_meta cwd field — so
a no-op sync of a 5k-session corpus paid 5k needless file opens.

Reordered the loop: path.stat() (cheap) and the state-mtime comparison
run first. Slug derivation, project filter, and ignore filter only
run for sessions that actually need conversion.

New regression test test_no_op_sync_does_not_call_derive_project_slug
exercises a counting adapter and asserts derive_project_slug is NOT
called on the second pass when state already matches mtime.
@Pratiyush Pratiyush merged commit a19a2ba into master Apr 27, 2026
10 checks passed
@Pratiyush Pratiyush deleted the refactor/612-split-convert-all branch April 27, 2026 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

high: #arch-h9 — convert_all is 300-line monstrosity with 3 separate write paths. CodexCliAdap...

1 participant