Record support: 3-seed reproduction of PR #2101 + 3 ablations — val_b… by JulianTang2027 · Pull Request #2117 · openai/parameter-golf

JulianTang2027 · 2026-05-01T09:54:15Z

…pb 1.05879 (3-seed mean)

Independent 3-seed reproduction of PR #2101 (@OnlyJundong) lands at 1.05879 ± 0.00098 sample-std, within +0.00033 of the author's 3-seed mean (1.05845 ± 0.00058). Per-seed gap is roughly symmetric around zero (one beat, one match, one above), so spread (~1.7x author's std) is the dominant story. All 3 seeds individually beat merged SOTA (PR #1868, 1.06141) by 0.00156-0.00349 BPB; mean improvement 0.00181 nats, below the 0.005-nat strict record threshold.

Three seed-42 ablations of unused author-introduced knobs:

LABEL_SMOOTH=0.05: +0.00008 BPB (null, within noise)
GRAD_CENTRALIZE=1: +0.00004 BPB (null, within noise)
TTT_LOCAL_LR_MULT=0.85: +0.00058 BPB (worse, local optimum is in [0.75, 0.80])

Both author-introduced knobs are nulls on this stack. Useful negative result for future contributors. train_gpt.py byte-identical to PR #2101 (md5 5606a60541ef66315ac6991e8cc16de8); no new ML technique introduced.

Compliance: max train_wallclock 592.14 s, max ttt_eval 447.3 s, max artifact 15,983,343 B (min slack 16,657 B to 16M cap). All 782 phased-TTT eval batches drained on every seed.

@OnlyJundong

… val_bpb 1.05879 (3-seed mean) Independent 3-seed reproduction of PR openai#2101 (@OnlyJundong) lands at 1.05879 ± 0.00098 sample-std, within +0.00033 of the author's 3-seed mean (1.05845 ± 0.00058). Per-seed gap is roughly symmetric around zero (one beat, one match, one above), so spread (~1.7x author's std) is the dominant story. All 3 seeds individually beat merged SOTA (PR openai#1868, 1.06141) by 0.00156-0.00349 BPB; mean improvement 0.00181 nats, below the 0.005-nat strict record threshold. Three seed-42 ablations of unused author-introduced knobs: - LABEL_SMOOTH=0.05: +0.00008 BPB (null, within noise) - GRAD_CENTRALIZE=1: +0.00004 BPB (null, within noise) - TTT_LOCAL_LR_MULT=0.85: +0.00058 BPB (worse — local optimum is in [0.75, 0.80]) Both author-introduced knobs are nulls on this stack — useful negative result for future contributors. train_gpt.py byte-identical to PR openai#2101 (md5 5606a60541ef66315ac6991e8cc16de8); no new ML technique introduced. Compliance: max train_wallclock 592.14 s, max ttt_eval 447.3 s, max artifact 15,983,343 B (min slack 16,657 B to 16M cap). All 782 phased-TTT eval batches drained on every seed.

leon2k2k2k mentioned this pull request May 1, 2026

Train/val data leakage in CaseOps records — prepare_caseops_data.py default overlaps 80% of val docs with training data #2127

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record support: 3-seed reproduction of PR #2101 + 3 ablations — val_b…#2117

Record support: 3-seed reproduction of PR #2101 + 3 ablations — val_b…#2117
JulianTang2027 wants to merge 1 commit intoopenai:mainfrom
JulianTang2027:record-support/pr2101-3seed-ablations

JulianTang2027 commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JulianTang2027 commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant