Skip to content

TimS-ml/openai-parameter-golf-pr-visualization

Repository files navigation

pr-vis — Parameter-Golf PR Lineage & BPB Evidence

This repository is the PR-data source backing the blog post Ten Minutes, 16 Megabytes.

Our PRs

The PRs we authored / co-authored in the OpenAI parameter-golf challenge:

PR Description val_bpb
#1987 Record: MHA Path + 1855 9-hparam Stack + PR #1948 + PR #1855 (3-seed) 1.06184
#1948 Record: Leaky ReLU Slope + GPTQ Reverse-Cholesky Speedup + PR #1938 1.06242
#1938 S0/PR1851 + Cap Tokenizer + LQER + Global TTT 1.0713
#1867 (SMT) Train gpt 0427 1.078

What this repo is

Self-contained visualization of pull-request lineage and BPB (bits-per-byte) evidence for the Ten Minutes, 16 Megabytes parameter-golf challenge.

Routes

Route Content
/ PR Lineage & BPB Evidence — per-section lineage DAGs + scatter, plus a unified "All PRs" view
/vis PR-DAG visualizations: three SVG renderings of the same 133-cell ablation graph
/vis/dag-chips Approach 1: layered DAG with topic chips inside each node (iTOL / IcyTree-style)
/vis/heatmap Approach 2: tree gutter + presence/absence trait matrix + BPB heatmap (MetaTree / ggtree-style)
/vis/storyline Approach 3: storyline / narrative chart, six topic threads with shared-x BPB curve (xkcd-657 / d3-layout-narrative-style)

Features

  • Per-section lineage DAGs — parent/child PR graphs for each topic (data / architecture / optimizer / quantization / test-time compute) drawn as a layered DAG with cross-links from a sibling BPB scatter plot
  • Unified "All PRs" view — every tracked PR aggregated onto a single PR# × BPB plane, with cross-section edges deduped and pie-sector markers for PRs that touch multiple sections
  • Auditable BPB column — every BPB number is backed by a structured provenance record (data_origin, verification, confidence, methodology, merge_status, author, bpb_as_of); the schema is documented in docs/bpb-confidence-schema.md and rendered inline at the bottom of the page
  • PR-DAG visualizations — three static SVG renderings of the 133-cell ablation graph, each adapting a different visualization paradigm from the phylogenetics / narrative-chart literature
  • Light / dark theme toggle

Data

Pre-generated and committed to the repo:

  • data/pr-sections.json — five PR sections, each with nodes, edges, and richer points for the BPB scatter
  • data/pr-bpb.csv — one row per PR (or Issue / external reference); the source-of-truth audit trail described in docs/bpb-confidence-schema.md
  • data/cells.tsv — 133 ablation cells (stage / cell_id / kind / status / topic_tags / bpb / description); the input to the /vis/* figures. This is a reduced view of the original ablation table — only the columns the public figures need are kept (no per-step training curves, no separate pre-quant / post-quant / TTT BPB columns)

The PR-section JSON is generated from Mermaid flowcharts plus per-section CSVs. Those raw inputs are not included in this export. To regenerate data/pr-sections.json from your own raw inputs, populate a pr_raw_data/ directory with 01-data.md, 02-architecture.md, … plus matching CSVs and run:

npm run data:build

(See scripts/build-pr-data.mjs for the exact format.)

Run

npm install
npm run dev      # http://localhost:3000
npm run build && npm start

Stack

Next.js 16 (Turbopack) · React 19 · Tailwind CSS 4 · TypeScript 5. The visualizations are hand-rolled SVG; no charting library.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors