Add mirrored-recurrence MLX non-record submission#84
Add mirrored-recurrence MLX non-record submission#84cschubiner wants to merge 1 commit intoopenai:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6853b49028
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| Configuration: | ||
| - Hardware: Apple `M5 Max`, MLX `0.31.1` | ||
| - Data: published `fineweb10B_sp1024` export, full validation split, `1/195` training shards |
There was a problem hiding this comment.
Specify the exact single training shard used
This README says the run used the published fineweb10B_sp1024 export with 1/195 train shards, but it never identifies which shard was kept or how DATA_PATH was prepared. The checked-in train.log shows the actual run only saw one shard (train_shards:1/195), so rerunning the documented command against a normal fineweb10B_sp1024 export will train on all 195 shards and produce a materially different experiment. As written, the submission is not reproducible.
Useful? React with 👍 / 👎.
| self.encoder_schedule = [i % self.unique_layers for i in range(self.num_encoder_layers)] | ||
| mirrored_decoder = list(reversed(self.encoder_schedule[: self.num_skip_weights])) | ||
| trailing_decoder = [i % self.unique_layers for i in range(self.num_skip_weights, self.num_decoder_layers)] | ||
| self.decoder_schedule = mirrored_decoder + trailing_decoder |
There was a problem hiding this comment.
Reject UNIQUE_LAYERS settings that leave blocks unused
This schedule only uses every allocated block when UNIQUE_LAYERS <= NUM_LAYERS // 2 or UNIQUE_LAYERS == NUM_LAYERS. For configurations that the constructor currently accepts in between those ranges—for example NUM_LAYERS=18, UNIQUE_LAYERS=12—encoder_schedule is still 0..8 and decoder_schedule becomes 8..0, so blocks[9:] are dead parameters that never participate in the forward pass but still count toward the 16 MB budget. Either the schedule needs to cover all unique blocks or those settings should be rejected.
Useful? React with 👍 / 👎.
Community Review — Add mirrored-recurrence MLX non-record submissionCompliance: NEEDS AUTHOR ACTION — What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with: A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:
Recommendation: Could you run Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'mlx'. Classification via |
Adds a non-record mirrored-recurrence submission under records/track_non_record_16mb/2026-03-19_MirrorRecurrence_MLX_M5Max_sp1024.
Summary:
This PR only adds the new records folder for the submission.