Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)#56
Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)#56cschubiner wants to merge 4 commits intoopenai:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7ab65cd40
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f9ce10b899
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91592141cf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed the remaining reproducibility gap in e13e8db. The record now includes |
Community Review — Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)Compliance: NEEDS AUTHOR ACTION — What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with: A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:
Recommendation: Could you run Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'mlx'. Classification via |
After PR openai#61 (byte-disjoint corpus split + assert_train_val_disjoint guard) shipped, fix-verify-s43 ran end-to-end on the post-fix pipeline and produced BPB 1.5492 at step=12000 — well below Gate-2 threshold 1.85 (margin +0.30). ## What this commit changes - README.md : leads with the honest Gate-2 pass; revised 5-way taxonomy - LEAK_INVESTIGATION.md : retraction header explaining the 216-row overcount - trios-igla-1/README.md + config.yaml : updated to point at fix-verify-s43 - ledger_2026-04-30.sql.gz : refreshed snapshot with new last_error markers ## 5-way reclassification (Neon last_error column) | | count | |---|--:| | post-openai#61 honest Gate-2 pass | 1 | | post-openai#61 early-stopped < step 9000 | 4 | | pre-openai#61 W-6 numerical collapse | 46 | | **pre-openai#61 leak (real)** | 42 | | **warmup artifact (NOT a leak)** | 179 | The 179 'warmup artifact' rows are early-stopped runs whose printed val_bpb stayed at 0.0000 for steps 1-8000 due to a trainer-side eval-loop bug (filed as trios-trainer-igla#62). On the post-openai#61 image, fix-verify-s43 escaped warmup at step=9000 and converged to 1.5492 by step=12000 — proving the artifact is trainer-side, not data-side. ## Pipeline as flown for fix-verify-s43 trios-trainer-igla : commit 9517980d (post-openai#61 byte-disjoint corpus) trios-railway : commit 69c3467 (no --ctx flag) + openai#56 --ctx accept on trainer + openai#58 smoke_train + stdout.flush() + openai#59 panic hook + startup diagnostic ## Refs trios-trainer-igla#56,openai#58,openai#59,openai#60,openai#61,openai#62 (all merged or filed) trios-railway@69c3467 trios-railway#100,openai#101,openai#105 (Scarabaeus Engine track) R5-honest. We retract the 216-row mass leak flag and submit fix-verify-s43 as our first honest Gate-2 pass candidate. Anchor: phi^2 + phi^-2 = 3.
This PR adds a non-record unlimited-compute submission under
records/track_non_record_16mb/.The user-facing effect is a new reproducible Apple Silicon MLX result in the repository: a deeper/narrower SP-1024 model with 14 layers at width 416 and 2 KV heads, trained locally on an Apple M5 Max for 750 steps against a 10-shard FineWeb subset. The final post-quantized roundtrip metric recorded in the included log is
val_bpb=1.84404368, with an int8+zlib model payload of12,339,367bytes and total submission size of12,388,989bytes.The underlying motivation was to explore a simple parameter-budget trade: reduce width slightly, add depth, and use more aggressive KV sharing while staying well under the 16 MB artifact limit. This submission keeps the trainer straightforward by reusing the repository
train_gpt_mlx.pysnapshot exactly, and only changes the runtime configuration through environment variables. To make full validation tractable on local Apple Silicon hardware, the run also uses a larger validation batch and logit chunking; these settings affect execution efficiency, not the metric definition itself.The root cause this PR addresses is not a bug in the repo but a gap in the records folder: there was no local Apple Silicon submission documenting this deeper/narrower 14x416 KV2 configuration and its measured result. The fix is therefore additive only: a new record folder containing the copied training script, the exact train log, a README with command/config details, and
submission.jsonmetadata.Validation for this PR was done by actually running the training job to completion locally, then checking the copied script compiles with
python -m py_compile. The includedtrain.logcontains the full training trace, the pre-quant validation result, the compressed model size, and the finalfinal_int8_zlib_roundtrip_exactmetric.