Record: PR1851 + 9-hparam stack + wd_strong + GPTQ AR + pergroup - val_bpb 1.05957 (1 seed)#2020
Open
Itssshikhar wants to merge 3 commits intoopenai:mainfrom
Open
Record: PR1851 + 9-hparam stack + wd_strong + GPTQ AR + pergroup - val_bpb 1.05957 (1 seed)#2020Itssshikhar wants to merge 3 commits intoopenai:mainfrom
Itssshikhar wants to merge 3 commits intoopenai:mainfrom
Conversation
Collaborator
|
Leaderboard audit note (pre-cutoff state): I don't think this is record-ready as submitted. The headline is a single-seed result with no std/p-value evidence. For a score this close to the existing frontier, it needs a matching 3-seed package and significance evidence before it can be treated as a leaderboard row. |
Author
|
hey @cocohearts. im aware of how close this seed is from the current top, but as i mentioned in the description, im out of runpod credits. Is there a way to run 3-seed mean on this submission to make this decisive? |
3-seed mean val_bpb = 1.06017968 (seeds 42, 0, 1234) on the published train_gpt.py + env block. Seed 42 reproduces (1.05948583 vs README 1.05956571). Honest delta vs PR openai#1855 3-seed mean is -0.00090, not the README's headline -0.00151 (which compares this candidate's best seed to PR openai#1855's mean). All artifacts under the 16 MB cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Full per-seed verbose logs (seed42.log, seed0.log, seed1234.log) with hparams + source dump + per-step training trace + TTT phase details, matching parent record's train_seedXX.log convention. - submission.json with machine-readable per-seed and 3-seed-mean numbers, artifact bytes (max 15,909,242 / cap 16,777,216), and per-seed deltas vs PR openai#1855. - Banner at top of parent README pointing readers to three_seed_eval/ so the corrected 1.06018 mean is visible alongside the 1.05957 headline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
|
i was able to run 3-seed mean for the submission with no addition changes + logs. here are the numbers: Per-seed results
@cocohearts let me know if this is enough to close the leaderboard score. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
val_bpb: 1.05957 (seed 42) | 15,901,624 bytes | 8xH100 SXM, 600s | Phased LoRA TTT
Built on the PR #1851 stack. Key additions: PR #1855's 9-hparam stack, stronger Muon weight-decay schedule, GPTQ all-rank Hessian averaging, and PR #1855-style pergroup
lrzip+brotli compression ported into the PR #1851 graph.
Couldn't make it into a 3-seed mean as Runpod-credits ran out. took into accounr the discussion on CaseOps in prev PRs, tho since they got merged, I went ahead with it.
Results (8xH100 80GB SXM, 600s, phased TTT)
Delta vs PR #1855 seed 42 (1.05989): -0.00033 BPB.
Delta vs PR #1855 3-seed mean (1.06108): -0.00151 BPB.
Key Techniques
PR Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT (indirect 3 seed mean) #1851 graph preserved - keeps the BOS-fixed SmearGate + LQER asymmetric + PR Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787 SparseAttnGate/PolarNS/FusedCE stack.
PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 9-hparam stack - transfers the accepted greedy hparam overrides onto the PR Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT (indirect 3 seed mean) #1851-derived graph.
wd_strong - stronger Muon WD schedule with low=0.5 and high=1.75.
GPTQ all-rank Hessian averaging - averages GPTQ calibration Hessians across ranks.
Pergroup compression port - ports PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855's lrzip+brotli per-group compressor into the PR Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT (indirect 3 seed mean) #1851 graph, making it under the 16MB cap.
Numbers
The compressor swap costs only +0.00006 BPB while saving 238,983 bytes total, making this valid.
Reproduction