Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 (val_bpb=1.13072, seed 314) by Jaksenc · Pull Request #1475 · openai/parameter-golf

Jaksenc · 2026-04-08T15:17:39Z

Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112

val_bpb: 1.13071788 (saved 1-seed result, seed 314) | 15,651,808 bytes | Stage 1: 8xH100 80GB | Stage 2: 1xH100 80GB

This PR packages my strongest saved run on the public AR self-generated GPTQ + XSA-all + BigramHash 3072x112 stack. I am not submitting it as a leaderboard claim. The local contribution is a checkpointed two-stage execution path: Stage 1 trains and saves final_model.pt on 8xH100, and Stage 2 runs GPTQ, artifact packing, and final evaluation on 1xH100.

Results

Seed	Stage 1 steps	ms/step	Post-EMA BPB	Roundtrip BPB	Sliding BPB	Artifact
314	4,783	~124	1.1501	1.15442828	1.13071788	15,651,808

Change from PR #1019 lineage

Everything in the inherited modeling stack comes from the public 2026-03-25_ValCalib_GPTQ_XSA_BigramHash3072 folder and PR #1019. This PR makes the following local additions:

Change	Why it matters
Checkpointed two-stage `8xH100 -> 1xH100` execution	Moves GPTQ and final eval off the expensive `8xH100` box without changing the scored model path.
Recovered raw Stage 1 and Stage 2 logs	Preserves direct evidence for the saved seed-314 run.
Clean non-record packaging	Adds a single folder under `records/track_non_record_16mb/...` with code, metadata, summaries, and logs.

Quantization pipeline

Stage	BPB
Post-EMA diagnostic	1.1501
Post-GPTQ int6 roundtrip	1.15442828
Post-GPTQ sliding	1.13071788

Compliance / scope

No new eval-time adaptation, no TTT, no n-gram cache, and no multi-pass scoring are added in this PR.
The inherited stack uses AR self-generated calibration; Stage 2 loads the saved checkpoint and runs the same GPTQ/eval path on 1xH100.
As of April 9, 2026, this saved result does not beat the current merged rank-1 README entry (1.1147), so I am submitting it as a non-record contribution.

Reproduction

SEED=314 SKIP_QUANTIZE=1 torchrun --standalone --nproc_per_node=8 train_gpt.py

export CHECKPOINT_LOAD_PATH=/data/parameter-golf/checkpoints/record_seed314/final_model.pt
torchrun --standalone --nproc_per_node=1 run_gptq.py

Files

Only adds records/track_non_record_16mb/2026-04-08_8xH100_TwoStage_GPTQ_Baseline/ with:

README.md
submission.json
proxy_results.md
train_gpt.py
run_gptq.py
stock.env
requirements.txt
stage1_modal_seed314.log
stage2_modal_seed314.log

Credits

PR #1019: direct public record lineage this checkpointed baseline preserves.
PR #549: legal leaderboard base underneath that stack.
PR #609: inherited full-GPTQ and selective-pruning lineage used by this stack.
PR #478: XSA-all idea used by the inherited modeling stack.

MatoTeziTanka · 2026-04-11T20:07:05Z

Community Review — Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 (val_bpb=1.13072, seed 314)

BPB: 1.13072 | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

What I found in the code (head SHA 5a29a9b7c32a, file records/track_non_record_16mb/2026-04-08_8xH100_TwoStage_GPTQ_Baseline/train_gpt.py):

Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.05s, dim=512, layers=11, vocab=1024, code=106816 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline.

Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.05s, dim=512, layers=11, vocab=1024, code=106816 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

Add non-record two-stage GPTQ baseline

cbd65d9

Jaksenc force-pushed the codex/parameter-golf-submission-baseline branch from 4ccea0e to cbd65d9 Compare April 8, 2026 15:21

Add recovered Modal logs for seed 314

250b149

Jaksenc marked this pull request as ready for review April 8, 2026 17:06

Tighten non-record PR lineage and naming

c40c11c

Jaksenc changed the title ~~Non-record: 8xH100->1xH100 Two-Stage GPTQ Baseline — val_bpb 1.13072, 15,651,808 bytes~~ Non-record: Checkpointed 8xH100->1xH100 GPTQ Baseline — val_bpb 1.13072, 15,651,808 bytes Apr 8, 2026

Align non-record naming with PR lineage

5a29a9b

Jaksenc changed the title ~~Non-record: Checkpointed 8xH100->1xH100 GPTQ Baseline — val_bpb 1.13072, 15,651,808 bytes~~ Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 (val_bpb=1.13072, seed 314) Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 (val_bpb=1.13072, seed 314)#1475

Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 (val_bpb=1.13072, seed 314)#1475
Jaksenc wants to merge 4 commits intoopenai:mainfrom
Jaksenc:codex/parameter-golf-submission-baseline

Jaksenc commented Apr 8, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jaksenc commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112

Results

Change from PR #1019 lineage

Quantization pipeline

Compliance / scope

Reproduction

Files

Credits

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Community Review — Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 (val_bpb=1.13072, seed 314)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jaksenc commented Apr 8, 2026 •

edited

Loading