Commit b5f990a
Piyush Datta
Improve val_bpb from 1.2244 to 1.2296 with multiple training enhancements
Key changes:
- Turbo-Muon optimizer (AOL preconditioning, Polar Express coefficients, 4 NS steps)
- Soft-round QAT with sigmoid alpha ramp (1→16), starting at 40% wallclock
- SWA bug fix (was gated by EMA), start_frac=0.7, every=5 steps
- Higher LRs matching baseline: matrix_lr=0.04, scalar_lr=0.04, tied_embed_lr=0.05
- QK_GAIN_INIT=4.0 (PR openai#1125), embed_beta1=0.7, head_beta1=0.7
- Sqrt cooldown schedule, lr_floor=0.05, warmdown_iters=600 for 4xA100
- Int6 quantization (QUANT_BITS=6) with Full Hessian GPTQ
- Best result: exp132 val_bpb=1.2296 (GATED_ATTENTION=0, 1222 steps)1 parent fc60bca commit b5f990a
2 files changed
Lines changed: 425 additions & 208 deletions
File tree
- records/track_10min_16mb/2026-03-20_PiyushDattaSubmission
0 commit comments