Skip to content

Commit b5f990a

Browse files
author
Piyush Datta
committed
Improve val_bpb from 1.2244 to 1.2296 with multiple training enhancements
Key changes: - Turbo-Muon optimizer (AOL preconditioning, Polar Express coefficients, 4 NS steps) - Soft-round QAT with sigmoid alpha ramp (1→16), starting at 40% wallclock - SWA bug fix (was gated by EMA), start_frac=0.7, every=5 steps - Higher LRs matching baseline: matrix_lr=0.04, scalar_lr=0.04, tied_embed_lr=0.05 - QK_GAIN_INIT=4.0 (PR openai#1125), embed_beta1=0.7, head_beta1=0.7 - Sqrt cooldown schedule, lr_floor=0.05, warmdown_iters=600 for 4xA100 - Int6 quantization (QUANT_BITS=6) with Full Hessian GPTQ - Best result: exp132 val_bpb=1.2296 (GATED_ATTENTION=0, 1222 steps)
1 parent fc60bca commit b5f990a

2 files changed

Lines changed: 425 additions & 208 deletions

File tree

0 commit comments

Comments
 (0)