docs: README claim audit + badge-drift regression test (#682) — v1.3.75 by Pratiyush · Pull Request #688 · Pratiyush/llm-wiki

Pratiyush · 2026-04-27T18:46:28Z

Summary

Walks every numeric/factual claim in `README.md` against the current code + data and fixes the seven that drifted. Pairs the audit with a regression test that fails CI when the test-count badge silently rots more than ±15% behind reality.

Closes #682.

What changed

#	Where	Claim before	Verdict	Fix
1	README badge L13	`tests-2363 passing`	stale — `pytest --collect-only` reports 2,651	bump badge to 2,651
2	README L300	`pip install -e '.[pdf]'`	wrong — `[pdf]` extra was removed in the simplification sweep (memory: PDF adapter pruned deliberately)	replace with the real extras (`[graph]`, `[dev]`, `[e2e]`, `[all]`)
3	README L539	`pypdf is an optional extra for PDF ingestion`	wrong — same root cause	rewrite the line to mention the real extras
4	README L340	`the unit suite (pytest tests/ — 472 tests)`	stale — same reason as the badge; off by ~2,200	bump inline mention to 2,651
5	README L162	`Tutorial — every command in 60 seconds`	stale — the v1.3.67 VHS recording at `docs/videos/cli-tutorial.gif` runs 31 seconds against an 8-session sandbox; "90 seconds" is the realistic narration time on a real corpus	rename to "90 seconds" + link to the recording + tape source
6	README L36	``	stale — the demo gif shipped in v1.3.67 (#248 closed)	embed `docs/demo.gif` directly
7	README L351	`~300 MB for Chromium`	unprovable — Playwright re-pins Chromium every release; the snapshot rots quickly	soften to "several hundred MB"

Verified, no change needed:

`16 lint rules` (`len(REGISTRY) == 16`)
`12 production tools` (12 unique `name="wiki_..."` entries in `mcp/server.py`)
`8 sessions across 3 projects` for the demo data (8 `.md` files under `examples/demo-sessions/`)
`161 features across 16 categories` (`docs/feature-matrix.md` totals row)
`Adapters` table in the "Works with" section (cross-checked against `llmwiki/adapters/REGISTRY`)
`Releases` table (cross-checked against `gh release list`)

What's new

Surface	Change
README badge accuracy	drifted ~290 tests behind reality (`2363 → 2651`)
Install table	references real extras only — no phantom `[pdf]`
Tutorial section	links to the v1.3.67 VHS recording + tape source
Design principles	"Stdlib first" no longer mentions deleted `pypdf` extra
Demo gif	re-embedded from `docs/demo.gif` (shipped 1.3.67)
Tests	new regression test pins badge ≤ ±15% drift

Behavioural delta

	Before	After
Badge accuracy on `master`	drifted 2,363 vs 2,651 actual	within ±15% always
Install copy-paste	`pip install -e '.[pdf]'` errors with "no such extra"	every advertised extra is real
Demo gif visibility	hidden behind a TODO comment	inline at the top of the README
Future drift detection	none — manual re-check	CI fails when badge drifts >15%

How to test it

```bash
python3 -m pytest tests/test_readme_badges.py -v # 10 pass, including the new test
python3 -m pytest tests/ -q -m "not slow" # full suite green
python3 -m pytest tests/ --collect-only 2>&1 | grep "tests collected"
```

Pre-merge checklist

Bundle

`README.md` — 7 textual fixes (badge, install table, design principles, tutorial heading, demo gif, E2E section, Chromium size)
`tests/test_readme_badges.py` — new `test_test_count_badge_within_window_of_actual` runs `pytest --collect-only` in a subprocess and compares to the badge with ±15% tolerance
`llmwiki/init.py`, `pyproject.toml` — version 1.3.74 → 1.3.75
`CHANGELOG.md` — `[1.3.75]` entry under Fixed + Added

Out of scope / follow-ups

`docs/benchmarks.md` numbers (M2 MacBook Air, "337 sessions (real wiki)") are old but they correctly call themselves "representative" with a "100 sessions = 8.3 s total" budget. The numbers fluctuate per machine + Python version, so re-measuring them every PR is overkill. Held off here.
`docs/feature-matrix.md` "Total: 161" is current but the rating columns ("⭐⭐⭐⭐⭐") are subjective and fall outside this audit's scope.
The `Google Fonts` line in `docs/benchmarks.md` ("CDN fonts — Inter and JetBrains Mono load from Google Fonts") technically conflicts with the README "Works offline — no Google fonts" line. Could be a separate small PR; it's adjacent rather than central to the README claim audit.

Closes #682. 7 stale or wrong claims fixed: - tests-2363 → 2651 badge (pytest actually collects 2651) - pip install -e '.[pdf]' (extra was removed in simplification sweep) → replaced with real extras [graph]/[dev]/[e2e]/[all] - "pypdf is an optional extra for PDF ingestion" → real extras list - "the unit suite (472 tests)" → 2,651 tests - "every command in 60 seconds" → 90 seconds + link to VHS recording - TODO re-record demo GIF for v1.3 → embed docs/demo.gif (shipped 1.3.67) - "~300 MB for Chromium" → "several hundred MB" New regression test: test_test_count_badge_within_window_of_actual runs pytest --collect-only and fails when the badge drifts more than ±15% from the actually- collected count. Catches the exact rot mode that triggered this audit (badge silently ~290 tests behind reality through several PR cycles).

Pratiyush merged commit 567b51e into master Apr 27, 2026
10 checks passed

Pratiyush deleted the docs/682-readme-claim-audit branch April 27, 2026 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: README claim audit + badge-drift regression test (#682) — v1.3.75#688

docs: README claim audit + badge-drift regression test (#682) — v1.3.75#688
Pratiyush merged 1 commit into
masterfrom
docs/682-readme-claim-audit

Pratiyush commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Pratiyush commented Apr 27, 2026

Summary

What changed

What's new

Behavioural delta

How to test it

Pre-merge checklist

Bundle

Out of scope / follow-ups

Next

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant