Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)#861
Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)#861JoeProAI wants to merge 3 commits into
Conversation
…g to fit int6 under 16MB - INT6_CLIP_PERCENTILE now reads from env (default 99.99984, wave46 uses 99.0) - PRUNE_PCT added to 1.0677 script (was missing, wave46 uses 0.25) - Modal harness wave46_clip_prune.py for detached runs - Both levers push zeros into weight tensors for better zstd compression - Base architecture: SwiGLU + U-Net + XSA4 + BigramHash(8192) = 1.0677 BPB pre-compression
|
Friendly bump in case this got buried in the queue. Just wanted to check whether PR #861 is missing any required artifacts, metadata, or formatting on our end. If it looks complete and is simply waiting for review, no rush at all — happy to wait our turn. Thanks. |
|
Reopened after accidental auto-close (branch cleanup on our end). This submission represents a significant investment of compute time and resources (~$1,000 in GPU costs) to get right, so wanted to make sure it's properly in the queue. Submission is complete and compliant:
Happy to address any questions from the maintainers. Ready for review whenever the team has bandwidth. Thanks. |
Community Review — Non-record: 11L Int5 QAT + Score-First TTT — val_bpb 1.1326 (15.51 MB)BPB: 1.1326 | Compliance: LOOKS CLEAN — score-first-per-chunk TTT (legal #1416/#1423 pattern) What I found in the code (head SHA The TTT path at line 1012 implements the score-first-per-chunk pattern: each chunk is scored under Per Issue #402 and Issue #677, TTT is legal when each token is scored before the adapter updates on it, and that's what the code does here — chunk CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.06s, dim=512, layers=11, vocab=1024, code=75440 B, SMOKE_TEST_PASS Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending standard checks (3-seed validation, 16MB artifact cap, 10-min wallclock on 8×H100 SXM). The compliance picture matches the legal reference frontier and no flags were raised by the classification pass. Auto-classification caveat: this review was drafted by the AST-based classifier against a template derived from manually-reviewed cluster PRs (#1420, #1450, #1487, #1541, #1529, #1533, #1518). If I've misread a subtlety in your eval path — e.g., multi-epoch TTT that I mistook for single-pass, or a target-in-key lookup I missed in a helper function — please flag it and I'll re-run the audit manually. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.06s, dim=512, layers=11, vocab=1024, code=75440 B, SMOKE_TEST_PASS. Classification via deterministic AST-based |
|
Final follow-up on PR #861. The competition is now over, but this PR remains open without any formal maintainer review or acknowledgment. Since the event has concluded, I’d appreciate a definitive status update on whether this submission will still be reviewed, or whether non-record submissions like this are effectively being left unresolved. I spent $1,600 in compute getting this into compliant shape because there was no clear signal that effort at this level would simply end with no resolution or communication. I’m not asking for special treatment, just closure and a clear statement of process so participants know how to interpret open submissions after the competition ends. |
11L U-Net + Int5 QAT + Score-First Legal TTT
3-seed mean val_bpb: 1.13391 (std 0.00153) | 15.51 MB (16,265,723 bytes) | 8xH100 (~37 min)
What's different
Built on the PR #549 stack. Key additions:
3-Seed Results
All three seeds individually beat official SOTA (#549, 1.1194) by >0.01 BPB. All artifacts under 16 MiB.
Architecture
Rule Compliance
inference_mode()before training on themTrain log, submission.json, and training script included.