Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB) by JoeProAI · Pull Request #861 · openai/parameter-golf

JoeProAI · 2026-03-26T15:58:39Z

11L U-Net + Int5 QAT + Score-First Legal TTT

3-seed mean val_bpb: 1.13391 (std 0.00153) | 15.51 MB (16,265,723 bytes) | 8xH100 (~37 min)

What's different

Built on the PR #549 stack. Key additions:

Int5 QAT — weights quantized to [-15, 15] per-row (stored int8 + float16 scale). Tighter than int6, better zstd compression ratio.
Score-first TTT — AdamW on MLP-only params (up_proj, down_proj, gate_proj, scale). lr=0.0004, 1 epoch. Order: score chunk first, then adapt. Legal per PR Non-record: 11L Depth Recurrence + High-Yield Legal TTT (1.14458 BPB) #461 recipe.
MLP_HIDDEN=1536 — reduced from 1792 to fit artifact under 16 MB with int5.
15% weight pruning — zero smallest weights pre-quantization for better zstd compression.
Bigram hash embedding — 4096 buckets, 128-dim, added to token embeddings.
XSA on all 11 layers — full U-Net cross-layer shared attention.
Warmdown 6000 steps — longer QAT phase for better weight clustering near int5 boundaries.

3-Seed Results

Seed	val_bpb	Artifact
42 (submitted artifact)	1.13256182	15.51 MiB
314	1.13557402	15.60 MiB
2025	1.13360681	15.59 MiB
Mean	1.13391
Std	0.00153

All three seeds individually beat official SOTA (#549, 1.1194) by >0.01 BPB. All artifacts under 16 MiB.

Architecture

Param	Value
Layers	11
Model dim	512
Heads	8
MLP hidden	1536
Bigram buckets	4096
Bigram embed dim	128
Vocab size	256
Tie embeddings	false

Rule Compliance

Score-first TTT: tokens scored under inference_mode() before training on them
No val tokens used in artifact or training
No pre-eval adaptation
Submitted artifact: 15.51 MiB (under 16 MiB limit)
All validation artifacts under 16 MiB
Training time: ~37 min | Eval time: ~192s (under 600s budget)
3-seed validation (seeds 42, 314, 2025)

Train log, submission.json, and training script included.

…submission

…g to fit int6 under 16MB - INT6_CLIP_PERCENTILE now reads from env (default 99.99984, wave46 uses 99.0) - PRUNE_PCT added to 1.0677 script (was missing, wave46 uses 0.25) - Modal harness wave46_clip_prune.py for detached runs - Both levers push zeros into weight tensors for better zstd compression - Base architecture: SwiGLU + U-Net + XSA4 + BigramHash(8192) = 1.0677 BPB pre-compression

JoeProAI · 2026-03-31T17:39:38Z

Friendly bump in case this got buried in the queue. Just wanted to check whether PR #861 is missing any required artifacts, metadata, or formatting on our end. If it looks complete and is simply waiting for review, no rush at all — happy to wait our turn. Thanks.

JoeProAI · 2026-04-02T22:47:01Z

Reopened after accidental auto-close (branch cleanup on our end). This submission represents a significant investment of compute time and resources (~$1,000 in GPU costs) to get right, so wanted to make sure it's properly in the queue.

Submission is complete and compliant:

3-seed validation (seeds 42, 314, 2025) — mean val_bpb 1.13391 (std 0.00153)
Ranked non-record — competitive submission in the upper tier of the leaderboard
All artifacts under 16MB (submitted artifact: 15.51 MiB)
Score-first TTT compliant (tokens scored under inference_mode() before training)
No val tokens used in artifact or training
Training time ~37 min on 8xH100s, eval ~192s (both within budget)
Train log, submission.json, and training script all included

Happy to address any questions from the maintainers. Ready for review whenever the team has bandwidth. Thanks.

MatoTeziTanka · 2026-04-11T20:07:29Z

Community Review — Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)

BPB: 1.1326 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern)

What I found in the code (head SHA 28d45d26f589, file records/track_10min_16mb/2026-03-26_JoeProAI_11L_Int5_TTT_1.1326/train_gpt.py):

The TTT path at line 1012 implements the score-first-per-chunk pattern: each chunk is scored under torch.no_grad() / inference_mode() before the base_model.train() + SGD adaptation runs on that same chunk, with an is_last_chunk guard so the final chunk gets no adaptation pass. This is the structural shape the legal frontier uses (PRs #1416 erichroepke, #1423 aryanbhosale).

Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk ci is scored under weights adapted only on chunks 0..ci-1. No prequant_ttt_adapt_adamw(val_tokens, ...) multi-epoch fine-tune, no scored-region SLOT, no target-in-key n-gram cache.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.06s, dim=512, layers=11, vocab=1024, code=75440 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass.

Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.06s, dim=512, layers=11, vocab=1024, code=75440 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

JoeProAI · 2026-05-07T09:43:22Z

Final follow-up on PR #861.

The competition is now over, but this PR remains open without any formal maintainer review or acknowledgment. Since the event has concluded, I’d appreciate a definitive status update on whether this submission will still be reviewed, or whether non-record submissions like this are effectively being left unresolved.

I spent $1,600 in compute getting this into compliant shape because there was no clear signal that effort at this level would simply end with no resolution or communication. I’m not asking for special treatment, just closure and a clear statement of process so participants know how to interpret open submissions after the competition ends.

Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)

3d6acb1

JoeProAI mentioned this pull request Mar 26, 2026

Record: SwiGLU+VE128+NoTTT val_bpb=1.1181 (3-seed mean) #505

Closed

JoeProAI added 2 commits March 26, 2026 18:15

Add RESULTS.md, requirements.txt, and run_training.sh to PR openai#861 …

b68b95d

…submission

This was referenced Mar 28, 2026

Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1336 (15.59 MiB) #1040

Closed

Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1356 (15.60 MiB) #1041

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)#861

Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)#861
JoeProAI wants to merge 3 commits into
openai:mainfrom
JoeProAI:submission/joeproai-11l-int5-ttt-1.1326

JoeProAI commented Mar 26, 2026 •

edited

Loading

Uh oh!

JoeProAI commented Mar 31, 2026

Uh oh!

JoeProAI commented Apr 2, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

JoeProAI commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JoeProAI commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

11L U-Net + Int5 QAT + Score-First Legal TTT

What's different

3-Seed Results

Architecture

Rule Compliance

Uh oh!

JoeProAI commented Mar 31, 2026

Uh oh!

JoeProAI commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Community Review — Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)

Uh oh!

JoeProAI commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JoeProAI commented Mar 26, 2026 •

edited

Loading

JoeProAI commented Apr 2, 2026 •

edited

Loading