Skip to content

Commit 73c00e6

Browse files
committed
Add competition leaderboard with validity classifications
Comprehensive leaderboard of openai/parameter-golf record submissions compiled from open PRs. Each entry classified as valid/invalid/suspect based on source code review against PR openai#1017 validity rules. Key findings: - Best verified-valid score: 1.0800 BPB (PR openai#1408) - 3 submissions confirmed invalid (pre-quant TTT, unnormalized n-gram) - Sub-0.70 BPB submissions violate normalization requirements - 6 submissions fully code-reviewed and verified valid https://claude.ai/code/session_017F8GGeKA7MhUoQdqMGcTpg
1 parent 9c977f1 commit 73c00e6

1 file changed

Lines changed: 124 additions & 0 deletions

File tree

LEADERBOARD.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Parameter Golf Leaderboard (2026-04-07)
2+
3+
Compiled from open PRs on [openai/parameter-golf](https://github.com/openai/parameter-golf).
4+
Validity assessed against [PR #1017 "A Field Guide to Valid Submissions"](https://github.com/openai/parameter-golf/pull/1017).
5+
6+
## Validity Rules (PR #1017)
7+
8+
1. **Strict Causal Dependence** — predictions depend only on artifact + tokens before position t
9+
2. **Full Normalized Distribution** — complete probability distribution over full vocabulary before scoring
10+
3. **Score-Before-Update** — state updates only after scoring
11+
4. **Single Left-to-Right Pass** — no rescoring, multi-pass, or retrospective revision
12+
13+
---
14+
15+
## Top Record Submissions (Open PRs, sorted by claimed BPB)
16+
17+
### Tier 1: Sub-1.08 BPB (Top Contenders)
18+
19+
| Rank | PR | BPB | Author | Technique | Validity | Notes |
20+
|------|-----|-----|--------|-----------|----------|-------|
21+
| 1 | [#1333](https://github.com/openai/parameter-golf/pull/1333) | 1.0766 | aryanbhosale | SP4096 + Depth Recurrence + Causal SLOT-16 | NEEDS REVIEW | SLOT usage requires scrutiny |
22+
| 2 | [#1423](https://github.com/openai/parameter-golf/pull/1423) | 1.0791 | aryanbhosale | SP8192 + Pre-Quant TTT + QK-Gain 5.0 + Depth Recurrence | INVALID | Pre-quant TTT trains on val data 6 epochs. Rules 1,3,4 violated. Author notified. |
23+
| 3 | [#1416](https://github.com/openai/parameter-golf/pull/1416) | 1.0795 | erichroepke | SP8192 + Pre-Quant TTT | INVALID | Same pre-quant TTT violation. Author acknowledged and withdrew. |
24+
| 4 | [#1408](https://github.com/openai/parameter-golf/pull/1408) | 1.0800 | aamodbhatt | dTTT + BigramHash 3072x112 | VALID | dTTT baked into artifact pre-quant. Frozen model eval. Clean. |
25+
| 5 | [#1420](https://github.com/openai/parameter-golf/pull/1420) | 1.0801 | abaybektursun | Triple Loop + Fused Kernels + Parallel Residuals + N-gram Tilt | VALID (FIXED) | within_hint/word_hint causal bug fixed in 5e2eff8. |
26+
| 6 | [#1437](https://github.com/openai/parameter-golf/pull/1437) | 1.0809 | dexhunter | SP8192 + Parallel Residuals + 3-Layer Recurrence + N-gram Tilt | NEEDS FIX | Same within_hint/word_hint causal bug as #1420. Shared C++ code. |
27+
| 7 | [#1289](https://github.com/openai/parameter-golf/pull/1289) | 1.0819 | MatoTeziTanka | PROTEUS v1.6 — Scylla + Parallel Residuals | NEEDS REVIEW | |
28+
| 8 | [#1413](https://github.com/openai/parameter-golf/pull/1413) | 1.0828 | dexhunter | SP8192 + QK-Gain 5 + Legal Score-First TTT | VALID | Score-first TTT: chunks scored under no_grad() before training. Clean. |
29+
| 9 | [#1412](https://github.com/openai/parameter-golf/pull/1412) | 1.0835 | Robby955 | Parallel Residuals + Hessian-Aware SDClip | NEEDS REVIEW | |
30+
| 10 | [#1450](https://github.com/openai/parameter-golf/pull/1450) | 1.0848 | andrewbaggio1 | TMA Megakernel + Triple Loop + Parallel Residuals | NEEDS REVIEW | |
31+
| 11 | [#1257](https://github.com/openai/parameter-golf/pull/1257) | 1.0855 | BoxiYu | 11L Complement Training + TTT + No-JEPA | NEEDS REVIEW | |
32+
| 12 | [#1424](https://github.com/openai/parameter-golf/pull/1424) | 1.0858 | OnlyJundong | Extended Compute Scaling Analysis (50K steps) | NEEDS REVIEW | |
33+
| 13 | [#1394](https://github.com/openai/parameter-golf/pull/1394) | 1.0856 | clarkkev | SP8192 + GPTQ Embeddings + Depth Recurrence + MuonEq-R | NEEDS REVIEW | |
34+
| 14 | [#1406](https://github.com/openai/parameter-golf/pull/1406) | 1.0887 | aamodbhatt | 11L Depth Recurrence + Discriminative Pre-Quant TTT | NEEDS REVIEW | |
35+
36+
### Tier 2: 1.08–1.10 BPB (Competitive)
37+
38+
| Rank | PR | BPB | Author | Technique | Validity | Notes |
39+
|------|-----|-----|--------|-----------|----------|-------|
40+
| 15 | [#1445](https://github.com/openai/parameter-golf/pull/1445) | 1.0889 | X-Abhishek-X | 3-Layer Depth Recurrence + EMA 0.9965 | NEEDS REVIEW | |
41+
| 16 | [#1399](https://github.com/openai/parameter-golf/pull/1399) | 1.0898 | AnubhavBharadwaaj | Pre-Quant TTT + ETLB | NEEDS REVIEW | Pre-quant TTT likely invalid |
42+
| 17 | [#1331](https://github.com/openai/parameter-golf/pull/1331) | 1.0900 | dexhunter | MuonEq-R + 3-Layer Recurrence + All-Int6 | NEEDS REVIEW | |
43+
| 18 | [#1285](https://github.com/openai/parameter-golf/pull/1285) | 1.0912 | dexhunter | MuonEq-R + Depth Recurrence + Mixed Int5/Int6 GPTQ | NEEDS REVIEW | |
44+
| 19 | [#1415](https://github.com/openai/parameter-golf/pull/1415) | 1.0913 | bigbag | SP4096 + 3-Layer Recurrence + GPTQ Embeddings + ETLB | VALID | ETLB bias trained on context only. Clean. |
45+
| 20 | [#1344](https://github.com/openai/parameter-golf/pull/1344) | 1.0923 | Omrigotlieb | SP4096 + Polar Express + MuonEq-R | NEEDS REVIEW | |
46+
| 21 | [#1395](https://github.com/openai/parameter-golf/pull/1395) | 1.0924 | dttdrv | SP4096 + Linear LR + Depth Recurrence | NEEDS REVIEW | |
47+
| 22 | [#1421](https://github.com/openai/parameter-golf/pull/1421) | 1.0925 | X-Abhishek-X | 11L Depth Recurrence + EMA Tuning (0.9965) | VALID | Vanilla sliding window. Frozen model. Clean. |
48+
| 23 | [#1291](https://github.com/openai/parameter-golf/pull/1291) | 1.0925 | dentity007 | Vocab4096 + MLP4.0x + SLOT | NEEDS REVIEW | SLOT requires scrutiny |
49+
| 24 | [#1260](https://github.com/openai/parameter-golf/pull/1260) | 1.0929 | dexhunter | MuonEq-R + Depth Recurrence + Mixed Int5/Int6 GPTQ | NEEDS REVIEW | |
50+
| 25 | [#1339](https://github.com/openai/parameter-golf/pull/1339) | 1.0955 | bigbag | SP2048 + 3-Layer Recurrence + SWA + BigramHash | NEEDS REVIEW | |
51+
| 26 | [#1407](https://github.com/openai/parameter-golf/pull/1407) | 1.0960 | OnlyJundong | Extended Compute Scaling Analysis | NEEDS REVIEW | |
52+
| 27 | [#1435](https://github.com/openai/parameter-golf/pull/1435) | 1.0980 | AbhayAnandUCSD | 11L Depth Recurrence + BigramHash + EMA 0.9965 | VALID | Standard sliding window. Frozen model. Clean. |
53+
| 28 | [#1446](https://github.com/openai/parameter-golf/pull/1446) | 1.0960 | LauraGomezjurado | 11L gated Krylov + AR GPTQ int6 + lzma | NEEDS REVIEW | |
54+
55+
### Tier 3: Sub-1.0 BPB (SLOT/N-gram Heavy — Validity Suspect)
56+
57+
| Rank | PR | BPB | Author | Technique | Validity | Notes |
58+
|------|-----|-----|--------|-----------|----------|-------|
59+
|| [#1430](https://github.com/openai/parameter-golf/pull/1430) | 0.3964 | renqianluo | Per-Sample SLOT + N-gram Order-22 + TTT | INVALID | Improperly normalized n-gram mixer. Below 0.70 theoretical floor. |
60+
|| [#1379](https://github.com/openai/parameter-golf/pull/1379) | 0.4162 | LucasErcolano | Mixed quant ngram | LIKELY INVALID | Below 0.70 floor. Likely normalization issue. |
61+
|| [#1329](https://github.com/openai/parameter-golf/pull/1329) | 0.6361 | renqianluo | Per-Sample SLOT + TTT | LIKELY INVALID | Below 0.70 floor. Same author as #1430. |
62+
|| [#1319](https://github.com/openai/parameter-golf/pull/1319) | 0.6951 | canivel | 11L LeakyReLU² XSA-all GPTQ-AR SLOT64 | SUSPECT | Near theoretical floor. SLOT-64 requires scrutiny. |
63+
|| [#1376](https://github.com/openai/parameter-golf/pull/1376) | 0.7094 | stukenov | SLOT-24 + Pre-quant TTT | SUSPECT | Pre-quant TTT + SLOT combination. |
64+
|| [#1324](https://github.com/openai/parameter-golf/pull/1324) | 0.7271 | yahya010 | SLOT-48 + VRL + QK-Gain 4.0 | SUSPECT | Large SLOT window. |
65+
|| [#1321](https://github.com/openai/parameter-golf/pull/1321) | 0.7406 | anthony-maio | SLOT-48 | SUSPECT | Large SLOT window. |
66+
|| [#1278](https://github.com/openai/parameter-golf/pull/1278) | 0.7736 | GitGeeks | SLOT-32 + Partial Depth Recurrence | SUSPECT | Large SLOT window. |
67+
|| [#1368](https://github.com/openai/parameter-golf/pull/1368) | 0.8503 | JKSNS | Mean-delta warm start + depth recurrence | NEEDS REVIEW | |
68+
|| [#1313](https://github.com/openai/parameter-golf/pull/1313) | 0.8637 | anthony-maio | SLOT-24 Aggressive | SUSPECT | |
69+
|| [#1263](https://github.com/openai/parameter-golf/pull/1263) | 0.9354 | xexyz | 11L LeakyReLU² + XSA-all + Full GPTQ + SLOT | SUSPECT | |
70+
|| [#1303](https://github.com/openai/parameter-golf/pull/1303) | 0.9462 | anthony-maio | SLOT + QK-Gain 4.0 + XSA-11 | SUSPECT | |
71+
|| [#1246](https://github.com/openai/parameter-golf/pull/1246) | 0.9650 | deborahnelson8788726 | Trinity Ternary GPT | NEEDS REVIEW | |
72+
|| [#1241](https://github.com/openai/parameter-golf/pull/1241) | 0.9901 | aiejvn | MDLM Masked Diffusion | NEEDS REVIEW | Different architecture class |
73+
|| [#1318](https://github.com/openai/parameter-golf/pull/1318) | 1.0096 | renqianluo | TTT-AdamW + SLOT L-BFGS25 LogitDelta | NEEDS REVIEW | Same author as #1430 |
74+
75+
### Merged Records (Official Leaderboard)
76+
77+
| PR | BPB | Author | Technique |
78+
|----|-----|--------|-----------|
79+
| [#1019](https://github.com/openai/parameter-golf/pull/1019) | 1.1147 | abaybektursun | AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 |
80+
| [#549](https://github.com/openai/parameter-golf/pull/549) | 1.1194 | abaybektursun | LeakyReLU² + Legal Score-First TTT + Parallel Muon |
81+
| [#414](https://github.com/openai/parameter-golf/pull/414) | 1.1233 | signalrush | 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 |
82+
| [#315](https://github.com/openai/parameter-golf/pull/315) | 1.1248 | jfprincz | 11L Partial RoPE + LN Scale + EMA + XSA4 |
83+
| [#287](https://github.com/openai/parameter-golf/pull/287) | 1.1271 | jfprincz | 11L XSA + EMA + Int6 MLP3x + WD=0.04 |
84+
| [#265](https://github.com/openai/parameter-golf/pull/265) | 1.1307 | unnir | 11L + Efficient Partial XSA |
85+
| [#180](https://github.com/openai/parameter-golf/pull/180) | 1.1428 | thwu1 | 10L Int5-MLP + BigramHash(10240) + SWA(0.4) |
86+
| [#162](https://github.com/openai/parameter-golf/pull/162) | 1.1483 | raahilshah | Int6 MLP3x + SmearGate + BigramHash + MuonWD + SWA |
87+
| [#65](https://github.com/openai/parameter-golf/pull/65) | 1.1556 | aquariouseworkman | Mixed Quant Int6/FP16 + SmearGate + OrthoInit |
88+
| [#63](https://github.com/openai/parameter-golf/pull/63) | 1.1598 | yahya010 | 10L Int6 QAT + Zstd MLP2.6x |
89+
90+
---
91+
92+
## Verified Valid Submissions (Code-Reviewed)
93+
94+
| PR | BPB | Author | Technique | Key Finding |
95+
|----|-----|--------|-----------|-------------|
96+
| [#1408](https://github.com/openai/parameter-golf/pull/1408) | 1.0800 | aamodbhatt | dTTT + BigramHash 3072x112 | dTTT pre-quant only, baked into artifact. Frozen eval. |
97+
| [#1420](https://github.com/openai/parameter-golf/pull/1420) | 1.0801 | abaybektursun | Triple Loop + Fused Kernels + N-gram Tilt | Fixed in 5e2eff8. token_hint clean, within/word_hint fixed. |
98+
| [#1413](https://github.com/openai/parameter-golf/pull/1413) | 1.0828 | dexhunter | SP8192 + QK-Gain 5 + Score-First TTT | Chunks scored under no_grad() before training. |
99+
| [#1415](https://github.com/openai/parameter-golf/pull/1415) | 1.0913 | bigbag | SP4096 + 3-Layer Recurrence + ETLB | ETLB bias trained on context tokens only. |
100+
| [#1421](https://github.com/openai/parameter-golf/pull/1421) | 1.0925 | X-Abhishek-X | 11L Depth Recurrence + EMA | Vanilla sliding window. No eval-time adaptation. |
101+
| [#1435](https://github.com/openai/parameter-golf/pull/1435) | 1.0980 | AbhayAnandUCSD | 11L Depth Recurrence + BigramHash | BigramHash is trained component. Standard frozen eval. |
102+
103+
## Confirmed Invalid Submissions (Code-Reviewed)
104+
105+
| PR | BPB | Author | Violation |
106+
|----|-----|--------|-----------|
107+
| [#1430](https://github.com/openai/parameter-golf/pull/1430) | 0.3964 | renqianluo | Rule 2: N-gram mixer not normalized over full vocab. Explains impossible 0.40 BPB. |
108+
| [#1423](https://github.com/openai/parameter-golf/pull/1423) | 1.0791 | aryanbhosale | Rules 1,3,4: Pre-quant TTT trains on val data 6 epochs before scoring same data. |
109+
| [#1416](https://github.com/openai/parameter-golf/pull/1416) | 1.0795 | erichroepke | Rules 1,3: Same pre-quant TTT pattern. Author acknowledged and withdrew. |
110+
111+
---
112+
113+
## Key Observations
114+
115+
1. **Best verified-valid score: 1.0800 BPB** (PR #1408, aamodbhatt)
116+
2. **Sub-0.70 BPB submissions are almost certainly invalid** — the theoretical entropy floor for web text is ~0.70 BPB
117+
3. **Pre-quantization TTT** (training on val data before quantizing into artifact) is the most common serious violation
118+
4. **SLOT** (Score-Optimized Last-layer Tuning) is a gray area — small SLOT windows are generally accepted, but large windows (SLOT-32+) produce suspiciously low scores
119+
5. **Score-first TTT** (score chunk, then train on it) appears to be valid when correctly implemented
120+
6. **N-gram tilt** is valid when properly normalized and causal, but easy to get wrong
121+
122+
---
123+
124+
*Generated 2026-04-07. Validity assessments based on source code review against PR #1017 rules.*

0 commit comments

Comments
 (0)