Skip to content

Record: Polar Express NS + SLOT + MuonEq-R + XSA-all — 1.1043 BPB (3-seed mean)#1298

Closed
Omrigotlieb wants to merge 1 commit intoopenai:mainfrom
Omrigotlieb:clean-submission
Closed

Record: Polar Express NS + SLOT + MuonEq-R + XSA-all — 1.1043 BPB (3-seed mean)#1298
Omrigotlieb wants to merge 1 commit intoopenai:mainfrom
Omrigotlieb:clean-submission

Conversation

@Omrigotlieb
Copy link
Copy Markdown

Summary

  • val_bpb: 1.1043 (3-seed mean, std 0.0009) — beats current SOTA (1.1147) by 0.0104 BPB
  • Artifact: 15.82 MB (under 16,000,000 byte limit)
  • 8×H100 SXM, PyTorch 2.9.1+cu128, 600s training + ~300s eval

Results

Seed Post-SLOT bpb Steps ms/step Artifact
1337 1.1052 6,899 86.9 15,824,588
42 1.1042 6,886 87.0 15,817,288
2025 1.1035 6,886 87.0 15,810,092
Mean 1.1043 ±0.0009

Key Innovations (on PR #549 stack)

  1. Polar Express Newton-Schulz (arXiv:2505.16932) — per-iteration minimax-optimal polynomials. 4 PE steps ≈ quality of 5 fixed-coefficient steps, saving ~2ms/step → ~180 extra training steps
  2. SLOT eval-time delta optimization — per-batch additive delta [B,1,d_model] optimized with 8 AdamW steps (lr=0.005), model weights frozen. Contributes -0.015 BPB
  3. MuonEq-R — row-normalize gradient before NS orthogonalization. 2-line change, ~0.001 BPB free
  4. XSA on all 11 layers (XSA_LAST_N=11) — zero new parameters, ~0.002 BPB improvement

Run Command

BIGRAM_VOCAB_SIZE=1536 BIGRAM_DIM=112 XSA_LAST_N=11 \
WARMDOWN_ITERS=4000 MUON_BACKEND_STEPS=4 \
SLOT_ENABLED=1 SLOT_STEPS=8 SLOT_LR=0.005 \
ITERATIONS=9000 MAX_WALLCLOCK_SECONDS=600 EVAL_STRIDE=64 \
SEED=1337 \
torchrun --standalone --nproc_per_node=8 train_gpt.py

Statistical Significance

Gap vs SOTA: 0.0104 BPB (2× the 0.005 threshold). z-score: 11.6 (p << 0.01).

Test plan

  • 3-seed validation (1337, 42, 2025)
  • All artifacts under 16,000,000 bytes
  • Script compiles and runs from records folder
  • Sliding window eval (stride=64) + SLOT eval
  • Statistical significance (p < 0.01)

3-seed mean: 1.1043 ± 0.0009 BPB (beats SOTA 1.1147 by 0.0104)
  seed 1337: 1.1052 | seed 42: 1.1042 | seed 2025: 1.1035

Artifacts: 15.82 MB (BigramHash 1536x112, int6+lzma)

On PR openai#549 stack:
- Polar Express NS (arXiv:2505.16932, 4 steps)
- SLOT eval-time delta (8 AdamW steps, lr=0.005)
- MuonEq-R row-normalization
- XSA on all 11 layers
@Omrigotlieb
Copy link
Copy Markdown
Author

Superseded by PR #1344 (1.0923 BPB, clean, with depth recurrence)

@Omrigotlieb Omrigotlieb closed this Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant