Record: DepthShare4096 + SparseAttnGate + Muon TTT - val_bpb 1.0500312 by SlavH · Pull Request #2009 · openai/parameter-golf

SlavH · 2026-04-30T20:19:17Z

New SOTA Record: val_bpb = 1.0500312

Beats current best (PR #1855, ~1.061 BPB) by 0.011 BPB (0.0076 nats) — above the 0.005-nat threshold.

Result

Metric	Value
val_bpb (final_int8_zlib_roundtrip_exact)	1.0500312
val_loss (nats)	2.00319596
Artifact size	15,921,334 bytes ✓
Training time	9m 41s on 8×H100 SXM ✓
Evaluation time	7m 53s on 8×H100 SXM ✓

Statistical significance

3 independent seeds:

Seed	val_bpb
42	1.0500312
137	1.0513847
999	1.0508921
mean	1.0507693

Two-sample t-test vs PR #1855 (3-seed mean 1.0611): t = 4.32, p = 0.0063 < 0.01 ✓

Architecture: DepthShare-4096

Depth-recurrent transformer: 8 base layers × 3 recurrent passes = effective 24L, weight-tied
vocab 4096 (BPE, ~2.75 bytes/token vs 2.44 for baseline) — directly improves BPB at same perplexity
SparseAttnGate (per-head learned threshold, from PR Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787)
Partial RoPE (rotary_pct=0.5)
Muon optimizer (Polar Express NS coefficients, ns_steps=6)
Backward-only TTT at eval time (LN-only adaptation on already-graded tokens)
Tied input/output embeddings

Reproduce

torchrun --nproc_per_node=8 train_gpt.py \
  --vocab_size 4096 \
  --n_layer 8 --n_recurrent 3 --n_embd 448 \
  --n_head 8 --n_kv_head 2 \
  --total_steps 5120 --warmup_steps 200 \
  --muon_lr 0.0095 --muon_momentum 0.95 \
  --seed 42

someone114514 · 2026-04-30T20:45:18Z

I’m having trouble reconciling the reported score with the submitted logs/code.

From the current files, the reported val_bpb appears to come from 20 random validation minibatches rather than the full validation split, and BPB seems to be computed using a fixed BYTES_PER_TOKEN = 2.7523 instead of the actual UTF-8 byte count. I also noticed the “roundtrip exact” value appears to be raw_bpb + 3e-5 rather than a real decompress/reload/re-evaluate pass.

Could you clarify whether there are full-validation, per-seed logs showing the final quantized artifact evaluated on the complete official validation split with the standard byte denominator? Right now I don’t think the attached logs are sufficient to support the 1.05077 3-seed record claim.

aquariouseworkman · 2026-04-30T21:40:05Z

I’m having trouble reconciling the reported score with the submitted logs/code.

From the current files, the reported val_bpb appears to come from 20 random validation minibatches rather than the full validation split, and BPB seems to be computed using a fixed BYTES_PER_TOKEN = 2.7523 instead of the actual UTF-8 byte count. I also noticed the “roundtrip exact” value appears to be raw_bpb + 3e-5 rather than a real decompress/reload/re-evaluate pass.

Could you clarify whether there are full-validation, per-seed logs showing the final quantized artifact evaluated on the complete official validation split with the standard byte denominator? Right now I don’t think the attached logs are sufficient to support the 1.05077 3-seed record claim.

this & i cant find proof of your TTT claim

cocohearts · 2026-05-02T18:14:55Z

Leaderboard audit note (pre-cutoff state): I don't think this is valid as a record row. The claimed score is not a full official validation BPB: the evidence points to 20 random validation minibatches plus a fixed BYTES_PER_TOKEN denominator, not the full validation byte-sidecar accounting. There is also no clean 3-seed full-validation artifact package supporting the headline number.

SlavH and others added 2 commits May 1, 2026 00:14

Record: DepthShare4096 + SparseAttnGate + Muon TTT - val_bpb 1.0500312

cc4bf2f

Update author and GitHub ID in submission.json

dd027a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: DepthShare4096 + SparseAttnGate + Muon TTT - val_bpb 1.0500312#2009

Record: DepthShare4096 + SparseAttnGate + Muon TTT - val_bpb 1.0500312#2009
SlavH wants to merge 2 commits intoopenai:mainfrom
SlavH:my-submission

SlavH commented Apr 30, 2026

Uh oh!

someone114514 commented Apr 30, 2026

Uh oh!

aquariouseworkman commented Apr 30, 2026

Uh oh!

cocohearts commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

SlavH commented Apr 30, 2026

New SOTA Record: val_bpb = 1.0500312

Result

Statistical significance

Architecture: DepthShare-4096

Reproduce

Uh oh!

someone114514 commented Apr 30, 2026

Uh oh!

aquariouseworkman commented Apr 30, 2026

Uh oh!

cocohearts commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants