Record: PR1851 + 9-hparam stack + wd_strong + GPTQ AR + pergroup - val_bpb 1.05957 (1 seed) by Itssshikhar · Pull Request #2020 · openai/parameter-golf

Itssshikhar · 2026-04-30T21:30:09Z

Summary

val_bpb: 1.05957 (seed 42) | 15,901,624 bytes | 8xH100 SXM, 600s | Phased LoRA TTT

Built on the PR #1851 stack. Key additions: PR #1855's 9-hparam stack, stronger Muon weight-decay schedule, GPTQ all-rank Hessian averaging, and PR #1855-style pergroup
lrzip+brotli compression ported into the PR #1851 graph.

Couldn't make it into a 3-seed mean as Runpod-credits ran out. took into accounr the discussion on CaseOps in prev PRs, tho since they got merged, I went ahead with it.

Results (8xH100 80GB SXM, 600s, phased TTT)

Seed	Steps	ms/step	Pre-quant BPB	Quant BPB	TTT BPB	Artifact
42	4,844	122.2	1.06335	1.07246	1.05957	15,901,624

Delta vs PR #1855 seed 42 (1.05989): -0.00033 BPB.
Delta vs PR #1855 3-seed mean (1.06108): -0.00151 BPB.

Key Techniques

PR Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT (indirect 3 seed mean) #1851 graph preserved - keeps the BOS-fixed SmearGate + LQER asymmetric + PR Record: PR #1736 + Polar Express NS + MIN_LR + Sparse Attn Gate + Fused CE + PR #1767 TTT — val_bpb 1.06335 #1787 SparseAttnGate/PolarNS/FusedCE stack.
PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855 9-hparam stack - transfers the accepted greedy hparam overrides onto the PR Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT (indirect 3 seed mean) #1851-derived graph.
wd_strong - stronger Muon WD schedule with low=0.5 and high=1.75.
GPTQ all-rank Hessian averaging - averages GPTQ calibration Hessians across ranks.
Pergroup compression port - ports PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855's lrzip+brotli per-group compressor into the PR Record: val_bpb = 1.06128 SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT (indirect 3 seed mean) #1851 graph, making it under the 16MB cap.

Numbers

Run	Graph	Compressor	TTT BPB	Artifact	Valid
Run-1	PR #1851 + 9hp + wd_strong + AR	brotli	1.05950	16,140,607	No
This PR	PR #1851 + 9hp + wd_strong + AR	pergroup	1.05957	15,901,624	Yes

The compressor swap costs only +0.00006 BPB while saving 238,983 bytes total, making this valid.

Reproduction

RUN_ID=top_pr1855_hparams_s42_pergroup SEED=42 \
CASEOPS_ENABLED=1 EMBED_BITS=7 \
SMEAR_GATE_ENABLED=1 SPARSE_ATTN_GATE_ENABLED=1 \
MIN_LR=0.1 GPTQ_RESERVE_SECONDS=8.0 \
PHASED_TTT_NUM_PHASES=3 GPTQ_ALL_REDUCE=1 \
WD_SCHEDULE_ENABLED=1 WD_SCHED_LOW_FACTOR=0.5 WD_SCHED_HIGH_FACTOR=1.75 \
EMBED_CLIP_SIGMAS=14.0 MLP_CLIP_SIGMAS=11.5 \
WARMDOWN_FRAC=0.85 BETA2=0.99 \
TTT_BETA2=0.99 TTT_WEIGHT_DECAY=0.5 TTT_LORA_RANK=80 \
SPARSE_ATTN_GATE_SCALE=0.5 PHASED_TTT_PREFIX_DOCS=2500 \
COMPRESSOR=pergroup \
torchrun --standalone --nproc_per_node=8 train_gpt.py

Test Plan

- Seed 42 validation on 8xH100 SXM
- Artifact under 16,000,000 bytes
- Training wallclock stop at 592.1s
- Full pipeline: train -> EMA -> GPTQ/LQER -> pergroup compress -> decompress -> quant eval -> phased TTT eval

cocohearts · 2026-05-02T18:15:00Z

Leaderboard audit note (pre-cutoff state): I don't think this is record-ready as submitted. The headline is a single-seed result with no std/p-value evidence. For a score this close to the existing frontier, it needs a matching 3-seed package and significance evidence before it can be treated as a leaderboard row.

Itssshikhar · 2026-05-02T19:22:11Z

hey @cocohearts. im aware of how close this seed is from the current top, but as i mentioned in the description, im out of runpod credits. Is there a way to run 3-seed mean on this submission to make this decisive?

3-seed mean val_bpb = 1.06017968 (seeds 42, 0, 1234) on the published train_gpt.py + env block. Seed 42 reproduces (1.05948583 vs README 1.05956571). Honest delta vs PR openai#1855 3-seed mean is -0.00090, not the README's headline -0.00151 (which compares this candidate's best seed to PR openai#1855's mean). All artifacts under the 16 MB cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Full per-seed verbose logs (seed42.log, seed0.log, seed1234.log) with hparams + source dump + per-step training trace + TTT phase details, matching parent record's train_seedXX.log convention. - submission.json with machine-readable per-seed and 3-seed-mean numbers, artifact bytes (max 15,909,242 / cap 16,777,216), and per-seed deltas vs PR openai#1855. - Banner at top of parent README pointing readers to three_seed_eval/ so the corrected 1.06018 mean is visible alongside the 1.05957 headline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Itssshikhar · 2026-05-03T22:11:05Z

i was able to run 3-seed mean for the submission with no addition changes + logs. here are the numbers:

Per-seed results

Seed	Pre-quant	Post-quant	Post-TTT	Artifact (bytes)	Steps
42	1.06324	1.07238	1.05949	15,899,339	4862
0	1.06411	1.07321	1.06029	15,903,214	4849
1234	1.06430	1.07363	1.06076	15,909,242	4878
mean			1.06018	15,903,932

3-seed stdev: 0.00064
3-seed spread (max−min): 0.00127
All artifacts under cap. Tightest margin is seed 1234 with 867,974 B of headroom.

@cocohearts let me know if this is enough to close the leaderboard score.

Add Run 6 pergroup record candidate

05e376b

Itssshikhar and others added 2 commits May 3, 2026 18:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: PR1851 + 9-hparam stack + wd_strong + GPTQ AR + pergroup - val_bpb 1.05957 (1 seed)#2020

Record: PR1851 + 9-hparam stack + wd_strong + GPTQ AR + pergroup - val_bpb 1.05957 (1 seed)#2020
Itssshikhar wants to merge 3 commits intoopenai:mainfrom
Itssshikhar:run6-pergroup-record

Itssshikhar commented Apr 30, 2026

Uh oh!

cocohearts commented May 2, 2026

Uh oh!

Itssshikhar commented May 2, 2026

Uh oh!

Itssshikhar commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Itssshikhar commented Apr 30, 2026

Summary

Results (8xH100 80GB SXM, 600s, phased TTT)

Key Techniques

Numbers

Reproduction

Uh oh!

cocohearts commented May 2, 2026

Uh oh!

Itssshikhar commented May 2, 2026

Uh oh!

Itssshikhar commented May 3, 2026

Per-seed results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants