Skip to content

Use eager mx.eval() to fix running train script on 16GB Mac devices#100

Merged
0hq merged 1 commit intoopenai:mainfrom
sandsevenone:mlx_eager_eval
Mar 19, 2026
Merged

Use eager mx.eval() to fix running train script on 16GB Mac devices#100
0hq merged 1 commit intoopenai:mainfrom
sandsevenone:mlx_eager_eval

Conversation

@sandsevenone
Copy link
Copy Markdown
Contributor

Adding a new flag, MLX_EAGER_EVAL to Hyperparameters, which forces the train script to materialize the loss/grad graph for each sub-batch (as defined by MLX_MAX_MICROBATCH_TOKENS), enabled by default.

This is a fix to a crash I encountered while attempting to run the train script on a 16GB M4 MacBook Air, which was caused by MLX attempting to build the full computation graph when loss_and_grad_chunked was called in warmup. After implementation, the train script executed successfully (warmup, training for 12 epochs before reaching wall time limit, and validation). With all other settings default, the run used a peak of 6.5GB of RAM, and the GPU was constantly at 100% utilization.

…aluating the graph after each sub-batch step
Copy link
Copy Markdown
Contributor

@0hq 0hq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@0hq 0hq merged commit 2081ba1 into openai:main Mar 19, 2026
@sandsevenone sandsevenone deleted the mlx_eager_eval branch March 20, 2026 03:34
maxivione pushed a commit to maxivione/parameter-golf that referenced this pull request Mar 20, 2026
Use eager mx.eval() to fix running train script on 16GB Mac devices
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants