Record support: 3-seed reproduction of PR #2101 + 3 ablations — val_b…#2117
Open
JulianTang2027 wants to merge 1 commit intoopenai:mainfrom
Open
Record support: 3-seed reproduction of PR #2101 + 3 ablations — val_b…#2117JulianTang2027 wants to merge 1 commit intoopenai:mainfrom
JulianTang2027 wants to merge 1 commit intoopenai:mainfrom
Conversation
… val_bpb 1.05879 (3-seed mean) Independent 3-seed reproduction of PR openai#2101 (@OnlyJundong) lands at 1.05879 ± 0.00098 sample-std, within +0.00033 of the author's 3-seed mean (1.05845 ± 0.00058). Per-seed gap is roughly symmetric around zero (one beat, one match, one above), so spread (~1.7x author's std) is the dominant story. All 3 seeds individually beat merged SOTA (PR openai#1868, 1.06141) by 0.00156-0.00349 BPB; mean improvement 0.00181 nats, below the 0.005-nat strict record threshold. Three seed-42 ablations of unused author-introduced knobs: - LABEL_SMOOTH=0.05: +0.00008 BPB (null, within noise) - GRAD_CENTRALIZE=1: +0.00004 BPB (null, within noise) - TTT_LOCAL_LR_MULT=0.85: +0.00058 BPB (worse — local optimum is in [0.75, 0.80]) Both author-introduced knobs are nulls on this stack — useful negative result for future contributors. train_gpt.py byte-identical to PR openai#2101 (md5 5606a60541ef66315ac6991e8cc16de8); no new ML technique introduced. Compliance: max train_wallclock 592.14 s, max ttt_eval 447.3 s, max artifact 15,983,343 B (min slack 16,657 B to 16M cap). All 782 phased-TTT eval batches drained on every seed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…pb 1.05879 (3-seed mean)
Independent 3-seed reproduction of PR #2101 (@OnlyJundong) lands at 1.05879 ± 0.00098 sample-std, within +0.00033 of the author's 3-seed mean (1.05845 ± 0.00058). Per-seed gap is roughly symmetric around zero (one beat, one match, one above), so spread (~1.7x author's std) is the dominant story. All 3 seeds individually beat merged SOTA (PR #1868, 1.06141) by 0.00156-0.00349 BPB; mean improvement 0.00181 nats, below the 0.005-nat strict record threshold.
Three seed-42 ablations of unused author-introduced knobs:
Both author-introduced knobs are nulls on this stack. Useful negative result for future contributors. train_gpt.py byte-identical to PR #2101 (md5 5606a60541ef66315ac6991e8cc16de8); no new ML technique introduced.
Compliance: max train_wallclock 592.14 s, max ttt_eval 447.3 s, max artifact 15,983,343 B (min slack 16,657 B to 16M cap). All 782 phased-TTT eval batches drained on every seed.