Exploratory: PR315-derived candidate and looped-depth gate#453
Exploratory: PR315-derived candidate and looped-depth gate#453Divyesh-Thirukonda wants to merge 2 commits intoopenai:mainfrom
Conversation
Community Review — Exploratory: PR315-derived candidate and looped-depth gateBPB: 1.1689 | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache What I found in the code (head SHA Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.09s, dim=512, layers=9, vocab=1024, code=67617 B, SMOKE_TEST_PASS Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline. Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.09s, dim=512, layers=9, vocab=1024, code=67617 B, SMOKE_TEST_PASS. Classification via deterministic AST-based |
Status
This PR is exploratory and is not claiming a new leaderboard record.
The attached
8xH100run for the main candidate is valid under the size cap, but it does not beat the existing PR315 frontier reference.What Is In This PR
Two record folders:
records/track_10min_16mb/2026-03-22_11L_XSA4_EMA_PartialRoPE_LNScale_Entropy_LongDocTTTrecords/track_non_record_16mb/2026-03-22_PR315_LoopedDepth_GateThe first folder captures a PR315-derived candidate with experimental codec and TTT branches behind flags. The second folder keeps looped-depth gate work separate from the primary path.
Official 8xH100 Result For The Main Candidate
From
train_seed42.log:step 4625on the600.037swallclock cap26152 MiBallocated /26526 MiBreserved15,733,011bytesfinal_quant_roundtrip_exact val_bpb = 1.16892776final_quant_sliding_window_exact val_bpb = 1.14586586This is worse than the checked-in PR315 reference (
1.1248sliding-windowval_bpb), so this PR should be treated as implementation and investigation work only.Next Step Outside This PR
The actual leaderboard path is now a separate exact-reproduction effort: recover PR315 throughput and score parity on the official image first, then make one change at a time.