Record: val_bpb: 1.14020 [tested 3x on 8xh100] by andrewgcodes · Pull Request #267 · openai/parameter-golf

andrewgcodes · 2026-03-20T21:14:38Z

Flagging that this is doing TTT during Val but compliantly. @0hq

I believe these make it allowed:

No training before evaluation: Each chunk is evaluated first, loss is recorded, then training occurs
No re-evaluation: Tokens are scored exactly once; training on chunk N cannot affect scores for chunks 0..N
No multiple passes: The validation set is processed in a single sequential pass (32 chunks)

… WD=0.08

…its) Combines techniques from PR openai#162, openai#180, openai#267, openai#281: - 11-layer GPT with U-Net skip connections, GQA - SmearGate + BigramHash(10240) - Mixed int5/int6 quantization + 3% magnitude pruning - Causal TTT at eval time - SWA(frac=0.4), WD=0.042, Z-loss - Target: sub-1.135 val_bpb Awaiting RunPod 8xH100 credits for 3-seed validation.

…7MB)

MatoTeziTanka · 2026-04-12T14:00:52Z

Community Review — Record: val_bpb: 1.14020 [tested 3x on 8xh100]

Compliance: LOOKS CLEAN — legal score-first-per-chunk TTT (PR #1413 pattern)

PR #267 — "Record: val_bpb: 1.14020 [tested 3x on 8xh100]"
Head SHA: 7940226
Submission dir: records/track_10min_16mb/2026-03-20_CausalTTT_Int5MLP_BigramHash_SWA

Check 1: N-gram family bug (CLOSE trigger)

CLEAN. BigramHashEmbedding.bigram_hash (line 693–699) computes:

out[..., 1:] = torch.bitwise_xor(36313 * t[..., 1:], 27191 * t[..., :-1]) % mod

The hash key for position i is (t[i], t[i-1]) — current token XOR'd with previous token. The target token (t[i+1]) is never in the lookup key. This is standard causal bigram context; no future-token leakage. NOT the CLOSE-triggering bug.

Check 2: Pre-Quant TTT (CLOSE trigger)

CLEAN. The TTT optimizer is torch.optim.SGD (line 1435), not AdamW. The Pre-Quant TTT CLOSE trigger requires multi-epoch AdamW on val_tokens without score-first. This submission uses SGD with momentum=0.9, post-quantization, after scoring each chunk. Does not meet the CLOSE criteria.

Check 3: Legal TTT (CLEAN)

CONFIRMED LEGAL. The causal TTT loop (lines 1446–1529) follows strict score-first-per-chunk ordering:

Note: The sliding-window clamping logic is novel — specifically how clamped_start/clamped_end partition scored tokens. The TTT implementation itself follows legal score-first discipline, so this is a MERGE recommendation with a note that the BPB accounting math could benefit from maintainer spot-check.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks and a quick look at the sliding-window clamping math.

Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually.

andrewgcodes added 2 commits March 20, 2026 21:08

Record: 10L Int5-MLP + BigramHash(16384) + Causal TTT + SWA(0.3/20) +…

66cb253

… WD=0.08

Update author to devin-ai

78a2c04

andrewgcodes changed the title ~~Record: val_bpb: 1.14020~~ Record: val_bpb: 1.14020 [tested 3x on 8xh100] Mar 20, 2026

notapplica mentioned this pull request Mar 20, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

andrewgcodes added 7 commits March 21, 2026 03:31

Update to 11L Int5-All + EMA + XSA3 + Causal TTT12 (val_bpb=1.13742)

83f00c9

Fix double-counting at chunk boundaries in causal TTT eval

07051cc

Update to XSA5 + TTT12 config (val_bpb=1.13679, 15.93MB)

945287b

Update to XSA5 + TTT15 config (val_bpb=1.13657, 15.32MB)

5d8f2cb

Update to XSA5 + TTT20 config (val_bpb=1.13644, 15.15MB)

dfd2bbb

Update to XSA5 + TTT30 config (val_bpb=1.13585, 15.35MB)

88adc44

Update to XSA6 + TTT30 config (val_bpb=1.13198, 15.97MB)

b084d05

romainsantoli-web mentioned this pull request Mar 21, 2026

11L SmearGate + BigramHash(10240) + Causal TTT + Mixed Int5/Int6 + SWA #322

Draft

andrewgcodes added 8 commits March 21, 2026 08:31

Update to Partial RoPE12 + XSA6 + TTT30 config (val_bpb=1.12934, 15.9…

3b5763e

…7MB)

Update to XSA6 + TTT35 + 5% pruning config (val_bpb=1.12897, 15.997MB)

c712f5a

Update to LN Scale + adaptive pruning config (val_bpb=1.12769, 15.60MB)

46bd727

Update to XSA7 + LN Scale config (val_bpb=1.12752, 15.67MB)

86c4b44

Update to TTT_LR=0.005 config (val_bpb=1.12654, 15.92MB)

736c142

Update to TTT40+LR0.006 config (val_bpb=1.12634, 15.89MB)

ded600f

Update to TTT_LR=0.008 config (val_bpb=1.12763, 15.997MB)

d10493b

Update to TTT60+LR0.010 config (val_bpb=1.12708, 15.92MB)

7940226

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: val_bpb: 1.14020 [tested 3x on 8xh100]#267

Record: val_bpb: 1.14020 [tested 3x on 8xh100]#267
andrewgcodes wants to merge 17 commits intoopenai:mainfrom
andrewgcodes:devin/1774040790-causal-ttt-submission

andrewgcodes commented Mar 20, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andrewgcodes commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Community Review — Record: val_bpb: 1.14020 [tested 3x on 8xh100]

Check 1: N-gram family bug (CLOSE trigger)

Check 2: Pre-Quant TTT (CLOSE trigger)

Check 3: Legal TTT (CLEAN)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andrewgcodes commented Mar 20, 2026 •

edited

Loading