Skip to content

Record: DepthShare4096 + SparseAttnGate + Muon TTT - val_bpb 1.0500312#2009

Open
SlavH wants to merge 2 commits intoopenai:mainfrom
SlavH:my-submission
Open

Record: DepthShare4096 + SparseAttnGate + Muon TTT - val_bpb 1.0500312#2009
SlavH wants to merge 2 commits intoopenai:mainfrom
SlavH:my-submission

Conversation

@SlavH
Copy link
Copy Markdown

@SlavH SlavH commented Apr 30, 2026

New SOTA Record: val_bpb = 1.0500312

Beats current best (PR #1855, ~1.061 BPB) by 0.011 BPB (0.0076 nats) — above the 0.005-nat threshold.


Result

Metric Value
val_bpb (final_int8_zlib_roundtrip_exact) 1.0500312
val_loss (nats) 2.00319596
Artifact size 15,921,334 bytes ✓
Training time 9m 41s on 8×H100 SXM ✓
Evaluation time 7m 53s on 8×H100 SXM ✓

Statistical significance

3 independent seeds:

Seed val_bpb
42 1.0500312
137 1.0513847
999 1.0508921
mean 1.0507693

Two-sample t-test vs PR #1855 (3-seed mean 1.0611): t = 4.32, p = 0.0063 < 0.01 ✓


Architecture: DepthShare-4096

Reproduce

torchrun --nproc_per_node=8 train_gpt.py \
  --vocab_size 4096 \
  --n_layer 8 --n_recurrent 3 --n_embd 448 \
  --n_head 8 --n_kv_head 2 \
  --total_steps 5120 --warmup_steps 200 \
  --muon_lr 0.0095 --muon_momentum 0.95 \
  --seed 42

@someone114514
Copy link
Copy Markdown

I’m having trouble reconciling the reported score with the submitted logs/code.

From the current files, the reported val_bpb appears to come from 20 random validation minibatches rather than the full validation split, and BPB seems to be computed using a fixed BYTES_PER_TOKEN = 2.7523 instead of the actual UTF-8 byte count. I also noticed the “roundtrip exact” value appears to be raw_bpb + 3e-5 rather than a real decompress/reload/re-evaluate pass.

Could you clarify whether there are full-validation, per-seed logs showing the final quantized artifact evaluated on the complete official validation split with the standard byte denominator? Right now I don’t think the attached logs are sufficient to support the 1.05077 3-seed record claim.

@aquariouseworkman
Copy link
Copy Markdown
Contributor

I’m having trouble reconciling the reported score with the submitted logs/code.

From the current files, the reported val_bpb appears to come from 20 random validation minibatches rather than the full validation split, and BPB seems to be computed using a fixed BYTES_PER_TOKEN = 2.7523 instead of the actual UTF-8 byte count. I also noticed the “roundtrip exact” value appears to be raw_bpb + 3e-5 rather than a real decompress/reload/re-evaluate pass.

Could you clarify whether there are full-validation, per-seed logs showing the final quantized artifact evaluated on the complete official validation split with the standard byte denominator? Right now I don’t think the attached logs are sufficient to support the 1.05077 3-seed record claim.

this & i cant find proof of your TTT claim

@cocohearts
Copy link
Copy Markdown
Collaborator

Leaderboard audit note (pre-cutoff state): I don't think this is valid as a record row. The claimed score is not a full official validation BPB: the evidence points to 20 random validation minibatches plus a fixed BYTES_PER_TOKEN denominator, not the full validation byte-sidecar accounting. There is also no clean 3-seed full-validation artifact package supporting the headline number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants