Skip to content

docs: README claim audit + badge-drift regression test (#682) — v1.3.75#688

Merged
Pratiyush merged 1 commit into
masterfrom
docs/682-readme-claim-audit
Apr 27, 2026
Merged

docs: README claim audit + badge-drift regression test (#682) — v1.3.75#688
Pratiyush merged 1 commit into
masterfrom
docs/682-readme-claim-audit

Conversation

@Pratiyush

Copy link
Copy Markdown
Owner

Summary

Walks every numeric/factual claim in `README.md` against the current code + data and fixes the seven that drifted. Pairs the audit with a regression test that fails CI when the test-count badge silently rots more than ±15% behind reality.

Closes #682.

What changed

# Where Claim before Verdict Fix
1 README badge L13 `tests-2363 passing` stale — `pytest --collect-only` reports 2,651 bump badge to 2,651
2 README L300 `pip install -e '.[pdf]'` wrong — `[pdf]` extra was removed in the simplification sweep (memory: PDF adapter pruned deliberately) replace with the real extras (`[graph]`, `[dev]`, `[e2e]`, `[all]`)
3 README L539 `pypdf is an optional extra for PDF ingestion` wrong — same root cause rewrite the line to mention the real extras
4 README L340 `the unit suite (pytest tests/ — 472 tests)` stale — same reason as the badge; off by ~2,200 bump inline mention to 2,651
5 README L162 `Tutorial — every command in 60 seconds` stale — the v1.3.67 VHS recording at `docs/videos/cli-tutorial.gif` runs 31 seconds against an 8-session sandbox; "90 seconds" is the realistic narration time on a real corpus rename to "90 seconds" + link to the recording + tape source
6 README L36 `` stale — the demo gif shipped in v1.3.67 (#248 closed) embed `docs/demo.gif` directly
7 README L351 `~300 MB for Chromium` unprovable — Playwright re-pins Chromium every release; the snapshot rots quickly soften to "several hundred MB"

Verified, no change needed:

  • `16 lint rules` (`len(REGISTRY) == 16`)
  • `12 production tools` (12 unique `name="wiki_..."` entries in `mcp/server.py`)
  • `8 sessions across 3 projects` for the demo data (8 `.md` files under `examples/demo-sessions/`)
  • `161 features across 16 categories` (`docs/feature-matrix.md` totals row)
  • `Adapters` table in the "Works with" section (cross-checked against `llmwiki/adapters/REGISTRY`)
  • `Releases` table (cross-checked against `gh release list`)

What's new

Surface Change
README badge accuracy drifted ~290 tests behind reality (`2363 → 2651`)
Install table references real extras only — no phantom `[pdf]`
Tutorial section links to the v1.3.67 VHS recording + tape source
Design principles "Stdlib first" no longer mentions deleted `pypdf` extra
Demo gif re-embedded from `docs/demo.gif` (shipped 1.3.67)
Tests new regression test pins badge ≤ ±15% drift

Behavioural delta

Before After
Badge accuracy on `master` drifted 2,363 vs 2,651 actual within ±15% always
Install copy-paste `pip install -e '.[pdf]'` errors with "no such extra" every advertised extra is real
Demo gif visibility hidden behind a TODO comment inline at the top of the README
Future drift detection none — manual re-check CI fails when badge drifts >15%

How to test it

```bash
python3 -m pytest tests/test_readme_badges.py -v # 10 pass, including the new test
python3 -m pytest tests/ -q -m "not slow" # full suite green
python3 -m pytest tests/ --collect-only 2>&1 | grep "tests collected"
```

Pre-merge checklist

  • One intent — single doc audit + corresponding regression test, no scope mixing
  • All CI checks green — verified locally; CI to confirm
  • Linked issue — `Closes audit: README claims + analysis statements need verification #682` in body
  • Conventional-commit title — `docs: ...`
  • Tests added or updated — new `test_test_count_badge_within_window_of_actual` pins the contract; full existing badge suite still passes
  • CHANGELOG.md updated — `[1.3.75]` entry under Fixed + Added
  • Breaking changes flagged — N/A; doc-only PR
  • No new runtime dependencies — N/A
  • No real session data — N/A
  • No machine-specific paths — N/A
  • Docs updated — this PR is the docs update
  • Release notes drafted — see CHANGELOG.md `[1.3.75]`
  • UI verified — N/A; no UI surface changes
  • A11y verified — N/A
  • Commits GPG-signed — yes
  • Reviewer has read every changed line — diff is +73 / -11 across 5 files

Bundle

  • `README.md` — 7 textual fixes (badge, install table, design principles, tutorial heading, demo gif, E2E section, Chromium size)
  • `tests/test_readme_badges.py` — new `test_test_count_badge_within_window_of_actual` runs `pytest --collect-only` in a subprocess and compares to the badge with ±15% tolerance
  • `llmwiki/init.py`, `pyproject.toml` — version 1.3.74 → 1.3.75
  • `CHANGELOG.md` — `[1.3.75]` entry under Fixed + Added

Out of scope / follow-ups

  • `docs/benchmarks.md` numbers (M2 MacBook Air, "337 sessions (real wiki)") are old but they correctly call themselves "representative" with a "100 sessions = 8.3 s total" budget. The numbers fluctuate per machine + Python version, so re-measuring them every PR is overkill. Held off here.
  • `docs/feature-matrix.md` "Total: 161" is current but the rating columns ("⭐⭐⭐⭐⭐") are subjective and fall outside this audit's scope.
  • The `Google Fonts` line in `docs/benchmarks.md` ("CDN fonts — Inter and JetBrains Mono load from Google Fonts") technically conflicts with the README "Works offline — no Google fonts" line. Could be a separate small PR; it's adjacent rather than central to the README claim audit.

Next

After merge: tag `v1.3.75`, then move to the Playwright Test Agents epic (#462#467). Per memory: open #462 with a phased plan first, then #463 (decide pytest-playwright vs `npx @playwright/test`) before any code PRs in the family.

Closes #682.

7 stale or wrong claims fixed:
- tests-2363 → 2651 badge (pytest actually collects 2651)
- pip install -e '.[pdf]' (extra was removed in simplification sweep)
  → replaced with real extras [graph]/[dev]/[e2e]/[all]
- "pypdf is an optional extra for PDF ingestion" → real extras list
- "the unit suite (472 tests)" → 2,651 tests
- "every command in 60 seconds" → 90 seconds + link to VHS recording
- TODO re-record demo GIF for v1.3 → embed docs/demo.gif (shipped 1.3.67)
- "~300 MB for Chromium" → "several hundred MB"

New regression test:
test_test_count_badge_within_window_of_actual runs pytest --collect-only
and fails when the badge drifts more than ±15% from the actually-
collected count. Catches the exact rot mode that triggered this audit
(badge silently ~290 tests behind reality through several PR cycles).
@Pratiyush Pratiyush merged commit 567b51e into master Apr 27, 2026
10 checks passed
@Pratiyush Pratiyush deleted the docs/682-readme-claim-audit branch April 27, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

audit: README claims + analysis statements need verification

1 participant