Skip to content

Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 (val_bpb=1.13072, seed 314)#1475

Open
Jaksenc wants to merge 4 commits intoopenai:mainfrom
Jaksenc:codex/parameter-golf-submission-baseline
Open

Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 (val_bpb=1.13072, seed 314)#1475
Jaksenc wants to merge 4 commits intoopenai:mainfrom
Jaksenc:codex/parameter-golf-submission-baseline

Conversation

@Jaksenc
Copy link
Copy Markdown

@Jaksenc Jaksenc commented Apr 8, 2026

Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112

val_bpb: 1.13071788 (saved 1-seed result, seed 314) | 15,651,808 bytes | Stage 1: 8xH100 80GB | Stage 2: 1xH100 80GB

This PR packages my strongest saved run on the public AR self-generated GPTQ + XSA-all + BigramHash 3072x112 stack. I am not submitting it as a leaderboard claim. The local contribution is a checkpointed two-stage execution path: Stage 1 trains and saves final_model.pt on 8xH100, and Stage 2 runs GPTQ, artifact packing, and final evaluation on 1xH100.

Results

Seed Stage 1 steps ms/step Post-EMA BPB Roundtrip BPB Sliding BPB Artifact
314 4,783 ~124 1.1501 1.15442828 1.13071788 15,651,808

Change from PR #1019 lineage

Everything in the inherited modeling stack comes from the public 2026-03-25_ValCalib_GPTQ_XSA_BigramHash3072 folder and PR #1019. This PR makes the following local additions:

Change Why it matters
Checkpointed two-stage 8xH100 -> 1xH100 execution Moves GPTQ and final eval off the expensive 8xH100 box without changing the scored model path.
Recovered raw Stage 1 and Stage 2 logs Preserves direct evidence for the saved seed-314 run.
Clean non-record packaging Adds a single folder under records/track_non_record_16mb/... with code, metadata, summaries, and logs.

Quantization pipeline

Stage BPB
Post-EMA diagnostic 1.1501
Post-GPTQ int6 roundtrip 1.15442828
Post-GPTQ sliding 1.13071788

Compliance / scope

  • No new eval-time adaptation, no TTT, no n-gram cache, and no multi-pass scoring are added in this PR.
  • The inherited stack uses AR self-generated calibration; Stage 2 loads the saved checkpoint and runs the same GPTQ/eval path on 1xH100.
  • As of April 9, 2026, this saved result does not beat the current merged rank-1 README entry (1.1147), so I am submitting it as a non-record contribution.

Reproduction

SEED=314 SKIP_QUANTIZE=1 torchrun --standalone --nproc_per_node=8 train_gpt.py

export CHECKPOINT_LOAD_PATH=/data/parameter-golf/checkpoints/record_seed314/final_model.pt
torchrun --standalone --nproc_per_node=1 run_gptq.py

Files

Only adds records/track_non_record_16mb/2026-04-08_8xH100_TwoStage_GPTQ_Baseline/ with:

  • README.md
  • submission.json
  • proxy_results.md
  • train_gpt.py
  • run_gptq.py
  • stock.env
  • requirements.txt
  • stage1_modal_seed314.log
  • stage2_modal_seed314.log

Credits

  • PR #1019: direct public record lineage this checkpointed baseline preserves.
  • PR #549: legal leaderboard base underneath that stack.
  • PR #609: inherited full-GPTQ and selective-pruning lineage used by this stack.
  • PR #478: XSA-all idea used by the inherited modeling stack.

@Jaksenc Jaksenc force-pushed the codex/parameter-golf-submission-baseline branch from 4ccea0e to cbd65d9 Compare April 8, 2026 15:21
@Jaksenc Jaksenc marked this pull request as ready for review April 8, 2026 17:06
@Jaksenc Jaksenc changed the title Non-record: 8xH100->1xH100 Two-Stage GPTQ Baseline — val_bpb 1.13072, 15,651,808 bytes Non-record: Checkpointed 8xH100->1xH100 GPTQ Baseline — val_bpb 1.13072, 15,651,808 bytes Apr 8, 2026
@Jaksenc Jaksenc changed the title Non-record: Checkpointed 8xH100->1xH100 GPTQ Baseline — val_bpb 1.13072, 15,651,808 bytes Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 (val_bpb=1.13072, seed 314) Apr 9, 2026
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 (val_bpb=1.13072, seed 314)

BPB: 1.13072 | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

What I found in the code (head SHA 5a29a9b7c32a, file records/track_non_record_16mb/2026-04-08_8xH100_TwoStage_GPTQ_Baseline/train_gpt.py):

Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.05s, dim=512, layers=11, vocab=1024, code=106816 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline.

Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.05s, dim=512, layers=11, vocab=1024, code=106816 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants