Non-record: 33.6M Int5 GPTQ + Legal s_0-only TTT (val_bpb=1.1182)#1004
Non-record: 33.6M Int5 GPTQ + Legal s_0-only TTT (val_bpb=1.1182)#1004ibarrajo wants to merge 5 commits intoopenai:mainfrom
Conversation
Train larger (33.6M params, d=576, MLP 3.5x), quantize harder (int5 GPTQ). Legal score-first TTT (AdamW, cosine LR, 3 epochs) + post-TTT temperature calibration (T=0.98). 3-seed mean 1.1145 BPB (std 0.0003). Based on PR openai#576. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Train 590s + GPTQ 3.8s = 593.9s < 600s (within budget) - 3% pruning → artifact 15.3MB with 711KB headroom - Added assertions: artifact < 16MB, train+gptq < 600s, eval < 600s - Seed 1337: val_bpb=1.1148 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Seed 1337: 1.1148 BPB, artifact 15.3MB, train+gptq 593.9s Seed 42: 1.1154 BPB, artifact 15.3MB, train+gptq 593.7s Seed 2025: 1.1148 BPB, artifact 15.8MB, train+gptq 593.9s Mean: 1.1150 (std 0.0003) All seeds: artifact < 16MB, train+gptq < 600s, eval < 600s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reports ONLY s_0 (cumulative first-pass score) — no re-eval after TTT - 5% pruning → artifact 15.5MB (465KB headroom) - Train+GPTQ: 593.8s < 600s - Eval (sliding + TTT): ~414s < 600s - Addresses PR openai#991 closure: removed illegal post-TTT re-scoring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Non-record. All assertions pass. Legal s_0-only TTT. Artifact 15.5MB (516KB headroom). Train+GPTQ 593.7s. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Updated with B6 run: val_bpb = 1.1179 (s_0 TTT, seed 1337). Changes from previous:
Non-record submission (SOTA is 1.1147). |
|
Updated results summary — B6 is our best legal approach at val_bpb=1.1179 (s_0 TTT, seed 1337). We tested 12 approaches this session exploring the design space:
Key findings:
All approaches use GPTQ within the training budget. Non-record (SOTA is 1.1147). Needs 2 more seeds for statistical validation. |
Community Review — Non-record: 33.6M Int5 GPTQ + Legal s_0-only TTT (val_bpb=1.1182)Compliance: LOOKS CLEAN — legal score-first-per-chunk TTT (PR #1413 pattern) GPTQ on train data, score-first TTT with is_last_chunk guard. Clean. Verdict: LOOKS CLEAN — legal TTT implementation matching the PR #1413 (dexhunter) pattern: each chunk scored under Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). TTT implementation follows the legal score-first discipline. Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually. |
Summary
val_bpb: 1.1182 (s_0 score only, single seed — additional seeds pending)
Resubmission addressing PR #991's closure. Key fix: reports ONLY the cumulative s_0 score from the first scoring pass. No post-TTT re-evaluation. No temperature calibration on re-scored tokens.
What changed from PR #991
Results
Rule compliance
Architecture
33.6M params (d=576, MLP 3.5x=1792, 11L), int5 GPTQ, XSA-all(11), BigramHash(8192), EMA(0.997), 5% magnitude pruning. Based on PR #576 by @cmcdnd.
🤖 Generated with Claude Code