Non-record: 11L GEPA + 12k Steps + Pure Int6 + Legal TTT (val_bpb=1.1079) by Christopher-Lee-McClendon · Pull Request #612 · openai/parameter-golf

Christopher-Lee-McClendon · 2026-03-24T12:10:03Z

Non-Record Submission: 11L GEPA + 12k Steps + Pure Int6 + Legal TTT

val_bpb = 1.1079 (1.10788263 exact) | Pre-TTT float: 1.1268 | Int6 quant: ~1.154 | TTT gain: −0.046 | Artifact: 14.79 MB

Summary

GEPA architecture (11L, 27M params) trained for 12,000 steps (7k peak-LR + 5k warmdown) on 4×A100-40GB. Pure int6 per-row quantization with 15-candidate GPTQ-lite clip search + zstd-22 compression. Legal score-first TTT (SGD, momentum 0.9, lr=0.002, 10 epochs, freeze first 2 blocks).

Key Results

Metric	Value
Final val_bpb	1.10788263
Pre-TTT float (step 12000)	1.1268
Post-quantization pre-TTT	~1.154
TTT improvement	−0.046
Model bytes	15,432,359
Code bytes	78,281
Total artifact	15,510,640 (< 16 MB ✅)
Training time	~100 min (4×A100)
Eval time	2072s

Novel Contributions

12k-step training with 5k-step warmdown — exploits unlimited-compute track
Pure int6 per-row quantization (no mixed int6/int8) with 15-percentile GPTQ-lite
Legal score-first TTT with LR warmup and SGD momentum

Track

track_non_record_16mb — unlimited compute, 16 MB artifact limit.

Checklist

README.md with approach description
submission.json with correct metadata
train.log demonstrating results
train_gpt.py (self-contained, no network calls)
Total submission ≤ 16 MB
Legal TTT (score-first, no val data cheating)
GPG-signed commit

Non-record submission: 11L GEPA architecture trained for 12000 steps (7k peak-LR + 5k warmdown) on 4xA100-40GB with pure int6 per-row quantization using 15-candidate GPTQ-lite clip search and zstd-22 compression. Legal score-first TTT (SGD, 10 epochs, momentum 0.9) drives final BPB from 1.154 (int6 quant) to 1.1079. - Pre-TTT float base: 1.1268 (step 12000) - Post-quant pre-TTT: ~1.154 - Final with legal TTT: 1.10788263 - Artifact: 15.51 MB (14.72 MB model + 78 KB code) - 27M parameters, pure int6 quantization

MatoTeziTanka · 2026-04-11T20:05:08Z

Community Review — Non-record: 11L GEPA + 12k Steps + Pure Int6 + Legal TTT (val_bpb=1.1079)

BPB: 1.1079 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern)

What I found in the code (head SHA 3bf3fefe03bf, file records/track_non_record_16mb/2026-03-24_11L_GEPA_12kSteps_PureInt6_LegalTTT/train_gpt.py):

The TTT path at line 399 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape the legal frontier uses (PRs #1416 erichroepke, #1423 aryanbhosale).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.03s, dim=512, layers=11, vocab=1024, code=78281 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.03s, dim=512, layers=11, vocab=1024, code=78281 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

notapplica mentioned this pull request Mar 24, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 11L GEPA + 12k Steps + Pure Int6 + Legal TTT (val_bpb=1.1079)#612

Non-record: 11L GEPA + 12k Steps + Pure Int6 + Legal TTT (val_bpb=1.1079)#612
Christopher-Lee-McClendon wants to merge 1 commit intoopenai:mainfrom
Christopher-Lee-McClendon:submission/11L-gepa-12k-pure-int6-legal-ttt

Christopher-Lee-McClendon commented Mar 24, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Christopher-Lee-McClendon commented Mar 24, 2026

Non-Record Submission: 11L GEPA + 12k Steps + Pure Int6 + Legal TTT

Summary

Key Results

Novel Contributions

Track

Checklist

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Community Review — Non-record: 11L GEPA + 12k Steps + Pure Int6 + Legal TTT (val_bpb=1.1079)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants