Skip to content

Exploratory: PR315-derived candidate and looped-depth gate#453

Open
Divyesh-Thirukonda wants to merge 2 commits intoopenai:mainfrom
Divyesh-Thirukonda:codex/pr315-frontier-records
Open

Exploratory: PR315-derived candidate and looped-depth gate#453
Divyesh-Thirukonda wants to merge 2 commits intoopenai:mainfrom
Divyesh-Thirukonda:codex/pr315-frontier-records

Conversation

@Divyesh-Thirukonda
Copy link
Copy Markdown

@Divyesh-Thirukonda Divyesh-Thirukonda commented Mar 22, 2026

Status

This PR is exploratory and is not claiming a new leaderboard record.

The attached 8xH100 run for the main candidate is valid under the size cap, but it does not beat the existing PR315 frontier reference.

What Is In This PR

Two record folders:

  • records/track_10min_16mb/2026-03-22_11L_XSA4_EMA_PartialRoPE_LNScale_Entropy_LongDocTTT
  • records/track_non_record_16mb/2026-03-22_PR315_LoopedDepth_Gate

The first folder captures a PR315-derived candidate with experimental codec and TTT branches behind flags. The second folder keeps looped-depth gate work separate from the primary path.

Official 8xH100 Result For The Main Candidate

From train_seed42.log:

  • stopped at step 4625 on the 600.037s wallclock cap
  • peak memory 26152 MiB allocated / 26526 MiB reserved
  • total submission size 15,733,011 bytes
  • final_quant_roundtrip_exact val_bpb = 1.16892776
  • final_quant_sliding_window_exact val_bpb = 1.14586586

This is worse than the checked-in PR315 reference (1.1248 sliding-window val_bpb), so this PR should be treated as implementation and investigation work only.

Next Step Outside This PR

The actual leaderboard path is now a separate exact-reproduction effort: recover PR315 throughput and score parity on the official image first, then make one change at a time.

@Divyesh-Thirukonda Divyesh-Thirukonda changed the title Add PR315-derived candidate record and looped-depth gate Exploratory: PR315-derived candidate and looped-depth gate Mar 22, 2026
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Exploratory: PR315-derived candidate and looped-depth gate

BPB: 1.1689 | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

What I found in the code (head SHA c39d7a34387a, file records/track_10min_16mb/2026-03-21_11L_XSA4_EMA_PartialRoPE_LateQAT_1.1248/train_gpt.py):

Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.09s, dim=512, layers=9, vocab=1024, code=67617 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline.

Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.09s, dim=512, layers=9, vocab=1024, code=67617 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants