This repository is the PR-data source backing the blog post Ten Minutes, 16 Megabytes.
The PRs we authored / co-authored in the OpenAI parameter-golf challenge:
| PR | Description | val_bpb |
|---|---|---|
| #1987 | Record: MHA Path + 1855 9-hparam Stack + PR #1948 + PR #1855 (3-seed) | 1.06184 |
| #1948 | Record: Leaky ReLU Slope + GPTQ Reverse-Cholesky Speedup + PR #1938 | 1.06242 |
| #1938 | S0/PR1851 + Cap Tokenizer + LQER + Global TTT | 1.0713 |
| #1867 | (SMT) Train gpt 0427 | 1.078 |
Self-contained visualization of pull-request lineage and BPB (bits-per-byte) evidence for the Ten Minutes, 16 Megabytes parameter-golf challenge.
| Route | Content |
|---|---|
/ |
PR Lineage & BPB Evidence — per-section lineage DAGs + scatter, plus a unified "All PRs" view |
/vis |
PR-DAG visualizations: three SVG renderings of the same 133-cell ablation graph |
/vis/dag-chips |
Approach 1: layered DAG with topic chips inside each node (iTOL / IcyTree-style) |
/vis/heatmap |
Approach 2: tree gutter + presence/absence trait matrix + BPB heatmap (MetaTree / ggtree-style) |
/vis/storyline |
Approach 3: storyline / narrative chart, six topic threads with shared-x BPB curve (xkcd-657 / d3-layout-narrative-style) |
- Per-section lineage DAGs — parent/child PR graphs for each topic (data / architecture / optimizer / quantization / test-time compute) drawn as a layered DAG with cross-links from a sibling BPB scatter plot
- Unified "All PRs" view — every tracked PR aggregated onto a single PR# × BPB plane, with cross-section edges deduped and pie-sector markers for PRs that touch multiple sections
- Auditable BPB column — every BPB number is backed by a structured
provenance record (
data_origin,verification,confidence,methodology,merge_status,author,bpb_as_of); the schema is documented indocs/bpb-confidence-schema.mdand rendered inline at the bottom of the page - PR-DAG visualizations — three static SVG renderings of the 133-cell ablation graph, each adapting a different visualization paradigm from the phylogenetics / narrative-chart literature
- Light / dark theme toggle
Pre-generated and committed to the repo:
data/pr-sections.json— five PR sections, each withnodes,edges, and richerpointsfor the BPB scatterdata/pr-bpb.csv— one row per PR (or Issue / external reference); the source-of-truth audit trail described indocs/bpb-confidence-schema.mddata/cells.tsv— 133 ablation cells (stage / cell_id / kind / status / topic_tags / bpb / description); the input to the/vis/*figures. This is a reduced view of the original ablation table — only the columns the public figures need are kept (no per-step training curves, no separate pre-quant / post-quant / TTT BPB columns)
The PR-section JSON is generated from Mermaid flowcharts plus per-section
CSVs. Those raw inputs are not included in this export. To regenerate
data/pr-sections.json from your own raw inputs, populate a pr_raw_data/
directory with 01-data.md, 02-architecture.md, … plus matching CSVs and
run:
npm run data:build(See scripts/build-pr-data.mjs for the exact format.)
npm install
npm run dev # http://localhost:3000
npm run build && npm startNext.js 16 (Turbopack) · React 19 · Tailwind CSS 4 · TypeScript 5. The visualizations are hand-rolled SVG; no charting library.