Commit 4a73033

and

committed

Document Run 4 (PR1851 + 9 hparams + wd_strong + AR) — best q_ttt yet

Run 4 results, single seed s42: pre = 1.06331 (best pre of session, beats Run 0's 1.06429 by 0.00098) q = 1.07239 (q_gap 0.00908 — tightest gap of session) q_ttt = 1.05950 (best q_ttt of session, beats PR openai#1855 published s42 1.05989 by 0.00039) artifact = 16,140,607 B (BUSTS 16 MB cap by 140,607 B with brotli; PR openai#1855's pergroup compressor saves ~280 KB, which is needed for this hparam stack to fit) Three findings: 1. The 9 hparams transfer cleanly through to final EMA model quality. Contrast with paired-head Muon NS (Run 3): also gave a striking mid-train signal (-0.0046 at step 4000) but that gain converged out by pre-quant time (+0.00038 vs Run 0). Run 4's mid-train gain (-0.0059) carried through to pre-quant (-0.00098). Mechanism: the 9 hparams change *what's actually being trained* (tighter clipping preserves outliers, longer warmdown reshapes convergence, tuned TTT-LoRA reshapes recovery), not just the optimizer's update direction. 2. Tightest quant gap of the session (0.00908). Tighter MLP/EMBED clipping (11.5/14.0) preserves outliers that LQER asymmetric int4 rank-4 correction can exploit, on top of AR's narrowing. 3. Artifact busts cap with brotli alone — confirms PR openai#1855's claim that their pergroup compressor saves ~280 KB on this stack. With brotli, even PR openai#1855 itself would land ~16,180,000 B. They needed pergroup; we need pergroup. This run made the case to pivot to PR openai#1855 base for Run 5. Earlier session's choice of PR openai#1851 (yesterday's "no lrzip dispute" reasoning) overturned by Run 4's evidence: PR openai#1855 is 0.00037 BPB ahead at 3-seed mean, ships the pergroup compressor we need to fit cap, and the 9 hparams we manually applied transfer cleanly. Run 5 (queued, auto-launch when Run 4 GPUs free) = PR openai#1855's full env stack + our wd_strong + AR + COMPRESSOR=pergroup. Expected q_ttt ~1.0590-1.0595 single-seed; 3-seed mean ~1.0593 ± 0.001. Honest acceptance-bar math: SOTA = 1.06108 (PR openai#1855 3-seed mean) Bar = SOTA - 0.005 nats ≈ 1.0588 Run 4 single = 1.05950, +0.00070 short of bar Run 5 predicted = 1.0590-1.0595, still 0.0002-0.0007 short Even best-case Run 5 likely just misses the record bar by ~half a sigma. Best plausible outcome is non-record submission with documented findings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

1 parent 611b598 commit 4a73033Copy full SHA for 4a73033

3 files changed

logs
- top_pr1855_hparams_s42.stdout
- top_pr1855_hparams_s42.txt
top_run4_session.md

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 4a73033

File tree

0 commit comments