Commit b5f990a

Piyush Datta

committed

Improve val_bpb from 1.2244 to 1.2296 with multiple training enhancements

Key changes: - Turbo-Muon optimizer (AOL preconditioning, Polar Express coefficients, 4 NS steps) - Soft-round QAT with sigmoid alpha ramp (1→16), starting at 40% wallclock - SWA bug fix (was gated by EMA), start_frac=0.7, every=5 steps - Higher LRs matching baseline: matrix_lr=0.04, scalar_lr=0.04, tied_embed_lr=0.05 - QK_GAIN_INIT=4.0 (PR openai#1125), embed_beta1=0.7, head_beta1=0.7 - Sqrt cooldown schedule, lr_floor=0.05, warmdown_iters=600 for 4xA100 - Int6 quantization (QUANT_BITS=6) with Full Hessian GPTQ - Best result: exp132 val_bpb=1.2296 (GATED_ATTENTION=0, 1222 steps)

1 parent fc60bca commit b5f990aCopy full SHA for b5f990a

2 files changed

records/track_10min_16mb/2026-03-20_PiyushDattaSubmission
- claude_agents_task_board.md
- train_gpt.py

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit b5f990a

File tree

0 commit comments