Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440) by cschubiner · Pull Request #56 · openai/parameter-golf

cschubiner · 2026-03-19T06:37:12Z

This PR adds a non-record unlimited-compute submission under records/track_non_record_16mb/.

The user-facing effect is a new reproducible Apple Silicon MLX result in the repository: a deeper/narrower SP-1024 model with 14 layers at width 416 and 2 KV heads, trained locally on an Apple M5 Max for 750 steps against a 10-shard FineWeb subset. The final post-quantized roundtrip metric recorded in the included log is val_bpb=1.84404368, with an int8+zlib model payload of 12,339,367 bytes and total submission size of 12,388,989 bytes.

The underlying motivation was to explore a simple parameter-budget trade: reduce width slightly, add depth, and use more aggressive KV sharing while staying well under the 16 MB artifact limit. This submission keeps the trainer straightforward by reusing the repository train_gpt_mlx.py snapshot exactly, and only changes the runtime configuration through environment variables. To make full validation tractable on local Apple Silicon hardware, the run also uses a larger validation batch and logit chunking; these settings affect execution efficiency, not the metric definition itself.

The root cause this PR addresses is not a bug in the repo but a gap in the records folder: there was no local Apple Silicon submission documenting this deeper/narrower 14x416 KV2 configuration and its measured result. The fix is therefore additive only: a new record folder containing the copied training script, the exact train log, a README with command/config details, and submission.json metadata.

Validation for this PR was done by actually running the training job to completion locally, then checking the copied script compiles with python -m py_compile. The included train.log contains the full training trace, the pre-quant validation result, the compressed model size, and the final final_int8_zlib_roundtrip_exact metric.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7ab65cd40

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cschubiner · 2026-03-19T06:47:17Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9ce10b899

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cschubiner · 2026-03-19T06:53:57Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91592141cf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cschubiner · 2026-03-19T07:30:41Z

Addressed the remaining reproducibility gap in e13e8db. The record now includes train_shards.txt with the exact 10 FineWeb train shards used for the run, and the README stages a local data_subset containing only those train shards plus the fixed validation shard before invoking train_gpt.py.

MatoTeziTanka · 2026-04-11T20:14:00Z

Community Review — Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

ModuleNotFoundError: No module named 'mlx'

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

PEP 701 f-string nesting — e.g. log(f" {cat}: {", ".join(...)}") is valid Python 3.12+ but invalid Python 3.10 because the inner ", " re-enters the outer double-quote context. One-character fix: ', ' instead of ", ". See PR Record: SP8192 + Improved Parallel Residuals + Muon 0.97 + LR 0.03 + Legal TTT — val_bpb 1.07785 (3-seed mean) #1541 / Record: SP8192 + Triple Recurrence + Banking + Fused MLP + Muon 0.97 — val_bpb 1.0778 (3-seed mean) #1523 for reference.
Missing flash_attn variants — e.g. from flash_attn_interface import flash_attn_varlen_func when the wrapper script only stubs flash_attn_func. Not a PR defect on H100s, but the eval image / CPU preflight path needs a guarded import.
Local compiled extension — e.g. import cutlass_evt_fusion from a records/*/cutlass_evt_fusion/ subfolder that isn't on the import path at smoke time. Usually an import-order issue inside the script.
Actual syntax error — typo, missing bracket, etc.

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'mlx'. Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

After PR openai#61 (byte-disjoint corpus split + assert_train_val_disjoint guard) shipped, fix-verify-s43 ran end-to-end on the post-fix pipeline and produced BPB 1.5492 at step=12000 — well below Gate-2 threshold 1.85 (margin +0.30). ## What this commit changes - README.md : leads with the honest Gate-2 pass; revised 5-way taxonomy - LEAK_INVESTIGATION.md : retraction header explaining the 216-row overcount - trios-igla-1/README.md + config.yaml : updated to point at fix-verify-s43 - ledger_2026-04-30.sql.gz : refreshed snapshot with new last_error markers ## 5-way reclassification (Neon last_error column) | | count | |---|--:| | post-openai#61 honest Gate-2 pass | 1 | | post-openai#61 early-stopped < step 9000 | 4 | | pre-openai#61 W-6 numerical collapse | 46 | | **pre-openai#61 leak (real)** | 42 | | **warmup artifact (NOT a leak)** | 179 | The 179 'warmup artifact' rows are early-stopped runs whose printed val_bpb stayed at 0.0000 for steps 1-8000 due to a trainer-side eval-loop bug (filed as trios-trainer-igla#62). On the post-openai#61 image, fix-verify-s43 escaped warmup at step=9000 and converged to 1.5492 by step=12000 — proving the artifact is trainer-side, not data-side. ## Pipeline as flown for fix-verify-s43 trios-trainer-igla : commit 9517980d (post-openai#61 byte-disjoint corpus) trios-railway : commit 69c3467 (no --ctx flag) + openai#56 --ctx accept on trainer + openai#58 smoke_train + stdout.flush() + openai#59 panic hook + startup diagnostic ## Refs trios-trainer-igla#56,openai#58,openai#59,openai#60,openai#61,openai#62 (all merged or filed) trios-railway@69c3467 trios-railway#100,openai#101,openai#105 (Scarabaeus Engine track) R5-honest. We retract the 216-row mass leak flag and submit fix-verify-s43 as our first honest Gate-2 pass candidate. Anchor: phi^2 + phi^-2 = 3.

add deep14 mlx submission

c7ab65c

cschubiner marked this pull request as ready for review March 19, 2026 06:39

cschubiner changed the title ~~[codex] add deep14 mlx submission~~ Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440) Mar 19, 2026

chatgpt-codex-connector Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread records/track_non_record_16mb/2026-03-19_Deep14x416_KV2_SP1024_MLX_750it/README.md Outdated

fix record README invocation

f9ce10b

chatgpt-codex-connector Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread records/track_non_record_16mb/2026-03-19_Deep14x416_KV2_SP1024_MLX_750it/README.md Outdated

add train_gpt entrypoint

9159214

chatgpt-codex-connector Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread records/track_non_record_16mb/2026-03-19_Deep14x416_KV2_SP1024_MLX_750it/README.md Outdated

cschubiner mentioned this pull request Mar 19, 2026

Add mirror recurrence non-record submission #57

Closed

Pin exact train shards for Deep14x416 submission

e13e8db

0hq added record submission ready for review valid submission does not beat SOTA and removed record submission ready for review labels Mar 19, 2026

brn-mwai mentioned this pull request Mar 20, 2026

Record: 11L Int6 + SmearGate + BigramHash + Depth Recurrence #268

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)#56

Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)#56
cschubiner wants to merge 4 commits intoopenai:mainfrom
cschubiner:codex/deep14-416-kv2-mlx-submission

cschubiner commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

cschubiner commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

cschubiner commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

cschubiner commented Mar 19, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cschubiner commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

cschubiner commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

cschubiner commented Mar 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

cschubiner commented Mar 19, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Community Review — Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MatoTeziTanka commented Apr 11, 2026 •

edited

Loading