Skip to content

Record support: 3-seed reproduction of PR #2101 + 3 ablations — val_b…#2117

Open
JulianTang2027 wants to merge 1 commit intoopenai:mainfrom
JulianTang2027:record-support/pr2101-3seed-ablations
Open

Record support: 3-seed reproduction of PR #2101 + 3 ablations — val_b…#2117
JulianTang2027 wants to merge 1 commit intoopenai:mainfrom
JulianTang2027:record-support/pr2101-3seed-ablations

Conversation

@JulianTang2027
Copy link
Copy Markdown

…pb 1.05879 (3-seed mean)

Independent 3-seed reproduction of PR #2101 (@OnlyJundong) lands at 1.05879 ± 0.00098 sample-std, within +0.00033 of the author's 3-seed mean (1.05845 ± 0.00058). Per-seed gap is roughly symmetric around zero (one beat, one match, one above), so spread (~1.7x author's std) is the dominant story. All 3 seeds individually beat merged SOTA (PR #1868, 1.06141) by 0.00156-0.00349 BPB; mean improvement 0.00181 nats, below the 0.005-nat strict record threshold.

Three seed-42 ablations of unused author-introduced knobs:

  • LABEL_SMOOTH=0.05: +0.00008 BPB (null, within noise)
  • GRAD_CENTRALIZE=1: +0.00004 BPB (null, within noise)
  • TTT_LOCAL_LR_MULT=0.85: +0.00058 BPB (worse, local optimum is in [0.75, 0.80])

Both author-introduced knobs are nulls on this stack. Useful negative result for future contributors. train_gpt.py byte-identical to PR #2101 (md5 5606a60541ef66315ac6991e8cc16de8); no new ML technique introduced.

Compliance: max train_wallclock 592.14 s, max ttt_eval 447.3 s, max artifact 15,983,343 B (min slack 16,657 B to 16M cap). All 782 phased-TTT eval batches drained on every seed.

… val_bpb 1.05879 (3-seed mean)

Independent 3-seed reproduction of PR openai#2101 (@OnlyJundong) lands at 1.05879 ± 0.00098 sample-std,
within +0.00033 of the author's 3-seed mean (1.05845 ± 0.00058). Per-seed gap is roughly symmetric
around zero (one beat, one match, one above), so spread (~1.7x author's std) is the dominant story.
All 3 seeds individually beat merged SOTA (PR openai#1868, 1.06141) by 0.00156-0.00349 BPB; mean
improvement 0.00181 nats, below the 0.005-nat strict record threshold.

Three seed-42 ablations of unused author-introduced knobs:
- LABEL_SMOOTH=0.05: +0.00008 BPB (null, within noise)
- GRAD_CENTRALIZE=1:  +0.00004 BPB (null, within noise)
- TTT_LOCAL_LR_MULT=0.85: +0.00058 BPB (worse — local optimum is in [0.75, 0.80])

Both author-introduced knobs are nulls on this stack — useful negative result for future
contributors. train_gpt.py byte-identical to PR openai#2101 (md5 5606a60541ef66315ac6991e8cc16de8);
no new ML technique introduced.

Compliance: max train_wallclock 592.14 s, max ttt_eval 447.3 s, max artifact 15,983,343 B
(min slack 16,657 B to 16M cap). All 782 phased-TTT eval batches drained on every seed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant