Use eager mx.eval() to fix running train script on 16GB Mac devices#100
Merged
0hq merged 1 commit intoopenai:mainfrom Mar 19, 2026
Merged
Use eager mx.eval() to fix running train script on 16GB Mac devices#1000hq merged 1 commit intoopenai:mainfrom
0hq merged 1 commit intoopenai:mainfrom
Conversation
…aluating the graph after each sub-batch step
maxivione
pushed a commit
to maxivione/parameter-golf
that referenced
this pull request
Mar 20, 2026
Use eager mx.eval() to fix running train script on 16GB Mac devices
This was referenced Mar 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adding a new flag,
MLX_EAGER_EVALto Hyperparameters, which forces the train script to materialize the loss/grad graph for each sub-batch (as defined byMLX_MAX_MICROBATCH_TOKENS), enabled by default.This is a fix to a crash I encountered while attempting to run the train script on a 16GB M4 MacBook Air, which was caused by MLX attempting to build the full computation graph when
loss_and_grad_chunkedwas called in warmup. After implementation, the train script executed successfully (warmup, training for 12 epochs before reaching wall time limit, and validation). With all other settings default, the run used a peak of 6.5GB of RAM, and the GPU was constantly at 100% utilization.