Skip to content

Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)#56

Open
cschubiner wants to merge 4 commits intoopenai:mainfrom
cschubiner:codex/deep14-416-kv2-mlx-submission
Open

Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)#56
cschubiner wants to merge 4 commits intoopenai:mainfrom
cschubiner:codex/deep14-416-kv2-mlx-submission

Conversation

@cschubiner
Copy link
Copy Markdown

This PR adds a non-record unlimited-compute submission under records/track_non_record_16mb/.

The user-facing effect is a new reproducible Apple Silicon MLX result in the repository: a deeper/narrower SP-1024 model with 14 layers at width 416 and 2 KV heads, trained locally on an Apple M5 Max for 750 steps against a 10-shard FineWeb subset. The final post-quantized roundtrip metric recorded in the included log is val_bpb=1.84404368, with an int8+zlib model payload of 12,339,367 bytes and total submission size of 12,388,989 bytes.

The underlying motivation was to explore a simple parameter-budget trade: reduce width slightly, add depth, and use more aggressive KV sharing while staying well under the 16 MB artifact limit. This submission keeps the trainer straightforward by reusing the repository train_gpt_mlx.py snapshot exactly, and only changes the runtime configuration through environment variables. To make full validation tractable on local Apple Silicon hardware, the run also uses a larger validation batch and logit chunking; these settings affect execution efficiency, not the metric definition itself.

The root cause this PR addresses is not a bug in the repo but a gap in the records folder: there was no local Apple Silicon submission documenting this deeper/narrower 14x416 KV2 configuration and its measured result. The fix is therefore additive only: a new record folder containing the copied training script, the exact train log, a README with command/config details, and submission.json metadata.

Validation for this PR was done by actually running the training job to completion locally, then checking the copied script compiles with python -m py_compile. The included train.log contains the full training trace, the pre-quant validation result, the compressed model size, and the final final_int8_zlib_roundtrip_exact metric.

@cschubiner cschubiner marked this pull request as ready for review March 19, 2026 06:39
@cschubiner cschubiner changed the title [codex] add deep14 mlx submission Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440) Mar 19, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7ab65cd40

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@cschubiner
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9ce10b899

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@cschubiner
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91592141cf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@cschubiner
Copy link
Copy Markdown
Author

Addressed the remaining reproducibility gap in e13e8db. The record now includes train_shards.txt with the exact 10 FineWeb train shards used for the run, and the README stages a local data_subset containing only those train shards plus the fixed validation shard before invoking train_gpt.py.

@MatoTeziTanka
Copy link
Copy Markdown

MatoTeziTanka commented Apr 11, 2026

Community Review — Add Deep14x416 KV2 non-record MLX submission (val_bpb=1.8440)

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

ModuleNotFoundError: No module named 'mlx'

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'mlx'. Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

gHashTag pushed a commit to gHashTag/parameter-golf that referenced this pull request Apr 30, 2026
After PR openai#61 (byte-disjoint corpus split + assert_train_val_disjoint guard)
shipped, fix-verify-s43 ran end-to-end on the post-fix pipeline and produced
BPB 1.5492 at step=12000 — well below Gate-2 threshold 1.85 (margin +0.30).

## What this commit changes

- README.md  : leads with the honest Gate-2 pass; revised 5-way taxonomy
- LEAK_INVESTIGATION.md : retraction header explaining the 216-row overcount
- trios-igla-1/README.md + config.yaml : updated to point at fix-verify-s43
- ledger_2026-04-30.sql.gz : refreshed snapshot with new last_error markers

## 5-way reclassification (Neon last_error column)

|                                      | count |
|---|--:|
| post-openai#61 honest Gate-2 pass          |   1 |
| post-openai#61 early-stopped < step 9000   |   4 |
| pre-openai#61 W-6 numerical collapse       |  46 |
| **pre-openai#61 leak (real)**              |  42 |
| **warmup artifact (NOT a leak)**     | 179 |

The 179 'warmup artifact' rows are early-stopped runs whose printed
val_bpb stayed at 0.0000 for steps 1-8000 due to a trainer-side eval-loop
bug (filed as trios-trainer-igla#62). On the post-openai#61 image, fix-verify-s43
escaped warmup at step=9000 and converged to 1.5492 by step=12000 —
proving the artifact is trainer-side, not data-side.

## Pipeline as flown for fix-verify-s43

  trios-trainer-igla : commit 9517980d (post-openai#61 byte-disjoint corpus)
  trios-railway      : commit 69c3467 (no --ctx flag)
  + openai#56 --ctx accept on trainer
  + openai#58 smoke_train + stdout.flush()
  + openai#59 panic hook + startup diagnostic

## Refs

  trios-trainer-igla#56,openai#58,openai#59,openai#60,openai#61,openai#62 (all merged or filed)
  trios-railway@69c3467
  trios-railway#100,openai#101,openai#105 (Scarabaeus Engine track)

R5-honest. We retract the 216-row mass leak flag and submit fix-verify-s43
as our first honest Gate-2 pass candidate.

Anchor: phi^2 + phi^-2 = 3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants