Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 (val_bpb=1.13072, seed 314)#1475
Conversation
4ccea0e to
cbd65d9
Compare
Community Review — Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112 (val_bpb=1.13072, seed 314)BPB: 1.13072 | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache What I found in the code (head SHA Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.05s, dim=512, layers=11, vocab=1024, code=106816 B, SMOKE_TEST_PASS Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline. Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.05s, dim=512, layers=11, vocab=1024, code=106816 B, SMOKE_TEST_PASS. Classification via deterministic AST-based |
Non-record: Checkpointed AR Self-Gen GPTQ + XSA-all + BigramHash 3072x112
val_bpb: 1.13071788 (saved 1-seed result, seed 314) | 15,651,808 bytes | Stage 1: 8xH100 80GB | Stage 2: 1xH100 80GB
This PR packages my strongest saved run on the public AR self-generated GPTQ + XSA-all + BigramHash 3072x112 stack. I am not submitting it as a leaderboard claim. The local contribution is a checkpointed two-stage execution path: Stage 1 trains and saves
final_model.pton8xH100, and Stage 2 runs GPTQ, artifact packing, and final evaluation on1xH100.Results
Change from PR #1019 lineage
Everything in the inherited modeling stack comes from the public
2026-03-25_ValCalib_GPTQ_XSA_BigramHash3072folder and PR #1019. This PR makes the following local additions:8xH100 -> 1xH100execution8xH100box without changing the scored model path.records/track_non_record_16mb/...with code, metadata, summaries, and logs.Quantization pipeline
Compliance / scope
1xH100.1.1147), so I am submitting it as a non-record contribution.Reproduction
SEED=314 SKIP_QUANTIZE=1 torchrun --standalone --nproc_per_node=8 train_gpt.py export CHECKPOINT_LOAD_PATH=/data/parameter-golf/checkpoints/record_seed314/final_model.pt torchrun --standalone --nproc_per_node=1 run_gptq.pyFiles
Only adds
records/track_non_record_16mb/2026-04-08_8xH100_TwoStage_GPTQ_Baseline/with:README.mdsubmission.jsonproxy_results.mdtrain_gpt.pyrun_gptq.pystock.envrequirements.txtstage1_modal_seed314.logstage2_modal_seed314.logCredits