Non-record: Systematic Hyperparameter Search (val_bpb=1.2075) by nglain · Pull Request #141 · openai/parameter-golf

nglain · 2026-03-20T00:03:37Z

Summary

Metric	Value
Post-quant val_bpb	1.2075
Pre-quant val_bpb	1.2008
Compressed artifact	~15.2 MB
Training steps	7,390
Training time	600s (8×H100 SXM)

Approach

Methodical hyperparameter search through 33 experiments across three GPU tiers (A40 → 1×H100 → 8×H100), using fixed-seed paired comparison (SEED=1337) for reliable delta measurement (±0.001 BPB).

What works

Muon optimizer (lr=0.02, momentum=0.99, warmdown=3000): -0.005 BPB
ROPE_BASE=200000: -0.003 BPB
seq_len=4096: -0.006 BPB

What doesn't work

int6 STE + Muon: conflicts (+0.007 worse)
12 layers: too slow, fewer steps
Larger batch (786K): fewer steps outweighs quality

Key insight

Optimal hyperparameters differ dramatically across compute budgets. The optimal LR on A40/2min (0.10) is 5× the optimal on 8×H100/10min (0.02). Parameters must be re-validated at target compute scale.

Changes from baseline

Only hyperparameters: MATRIX_LR=0.02, MUON_MOMENTUM=0.99, WARMDOWN_ITERS=3000, ROPE_BASE=200000, TRAIN_SEQ_LEN=4096. No architectural changes.

Test plan

Trained on 8×H100 SXM, 600s wallclock
final_int8_zlib_roundtrip val_bpb: 1.2075
Artifact under 16,000,000 bytes
train_gpt.py compiles and runs from records folder
train.log included

Methodical search through 33 experiments across A40, 1xH100, 8xH100. Fixed-seed paired comparison (SEED=1337) for reliable delta measurement. Key findings: - Muon optimizer (lr=0.02, momentum=0.99, warmdown=3000): -0.005 BPB - ROPE_BASE=200000: -0.003 BPB - seq_len=4096: -0.006 BPB - int6 STE conflicts with Muon optimizer (+0.007 worse) - Hyperparameter transfer across compute scales is unreliable val_bpb: 1.2075 (post-quant roundtrip) Artifact: ~15.2 MB (under 16 MB cap) Trained on 8xH100 SXM, 600s wallclock, 7390 steps

MatoTeziTanka · 2026-04-12T14:17:28Z

PR #141 Review

Title: Non-record: Systematic Hyperparameter Search (val_bpb=1.2075)
State: open
Date Reviewed: 2026-04-11

Code Analysis

train_gpt.py Checks

target-in-key pattern: not found
TTT (Temporal Token Tagging): not found
SLOT (Slot MoE): not found
Custom Tokenizer: not found

Verdict

Classification: PURE_NEURAL_CLEAN

Recommendation:

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: ✓ MERGE

Draft by @MatoTeziTanka for parameter-golf review sweep (2026-04-11)

Reviewed by @MatoTeziTanka — The Agora. Classification via sibling-session agent (Haiku-backed). This review was drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

notapplica mentioned this pull request Mar 20, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Systematic Hyperparameter Search (val_bpb=1.2075)#141

Non-record: Systematic Hyperparameter Search (val_bpb=1.2075)#141
nglain wants to merge 1 commit intoopenai:mainfrom
nglain:submission/systematic-search

nglain commented Mar 20, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nglain commented Mar 20, 2026

Summary

Approach

What works

What doesn't work

Key insight

Changes from baseline

Test plan

Uh oh!

MatoTeziTanka commented Apr 12, 2026

PR #141 Review

Code Analysis

train_gpt.py Checks

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants