Skip to content

Fix MLX multi-batch validation memory growth#32

Merged
0hq merged 2 commits intoopenai:mainfrom
yhn112:fix-mlx-eval-memory-growth
Mar 18, 2026
Merged

Fix MLX multi-batch validation memory growth#32
0hq merged 2 commits intoopenai:mainfrom
yhn112:fix-mlx-eval-memory-growth

Conversation

@yhn112
Copy link
Copy Markdown
Contributor

@yhn112 yhn112 commented Mar 18, 2026

Summary

  • materialize each MLX validation batch loss before accumulating it, instead of building one lazy graph across the entire validation split
  • keep lightweight validation progress logging so long validation passes do not look hung

Why this happens

This only shows up when validation spans multiple batches. In eval_val(), total_loss was accumulated as an mx.array, so MLX kept extending a lazy graph until the final mx.eval(...). Memory usage then grew steadily during validation, and the script looked like it hung.

This is especially easy to hit in the MLX smoke flow because validation always runs over the full fineweb_val_* split, while the README example uses a small VAL_BATCH_SIZE. In practice the first visible symptom is often after the final training step, which makes it look like a post-training hang.

Single-batch validation does not exhibit the problem.

Logging

The progress logging is intentional here: once validation is split across many batches, a long final validation pass can otherwise look indistinguishable from a freeze.

Validation

  • python -m py_compile train_gpt_mlx.py
  • reproduced locally with a short MLX run where memory growth started only once final validation began, then verified the fix stopped the growth

Copy link
Copy Markdown
Contributor

@0hq 0hq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Missed this.

@0hq 0hq merged commit 8253577 into openai:main Mar 18, 2026
kxddry pushed a commit to kxddry/parameter-golf that referenced this pull request Mar 19, 2026
Fix MLX multi-batch validation memory growth
@yhn112 yhn112 deleted the fix-mlx-eval-memory-growth branch March 19, 2026 10:39
gHashTag added a commit to gHashTag/parameter-golf that referenced this pull request Apr 30, 2026
… fix) (openai#32)

Anchor: phi^2 + phi^-2 = 3.

The first scheduled hourly run failed with:

    jq: parse error: Invalid numeric literal at line 1, column 2

Root cause: 'tail -n 1' on the merged 2>&1 stream caught the
tracing-subscriber INFO line ('audit triplet sealed experience=...')
that prints AFTER the --json blob, instead of the JSON itself.

Fix:

  1. Split stdout (json + text summary) and stderr (tracing) into
     separate files: '> /tmp/accN.txt 2> /tmp/accN.log'.
     The Rust binary already writes tracing to stderr in main.rs
     (with_writer(io::stderr)), so the separation is honest, not
     a workaround.

  2. Replace 'tail -n 1' with 'grep -E "^\s*\{" | tail -n 1'.
     The --json blob is a single line whose first non-whitespace char
     is '{', and tri-railway prints exactly one such line. This is
     robust against future text additions to the human summary.

  3. Synthesize a DRIFT-shaped fallback JSON when grep finds nothing
     (network error, etc.) so the digest step never crashes; the
     workflow goes red on combined exit=1 instead.

  4. Echo both stdout and stderr to the workflow log for triage.

Verified locally against the live IGLA project (Acc1 token):

    grep -E '^\s*\{' /tmp/test_stdout.txt | tail -n 1 > /tmp/test.json
    jq -r '.verdict, .services, .exit_code'
    => NOT YET, 18, 2

Refs openai#16.

Co-authored-by: Perplexity Computer <computer@perplexity.ai>
gHashTag added a commit to gHashTag/parameter-golf that referenced this pull request Apr 30, 2026
…penai#33)

The previous fix (PR openai#32) extracted the JSON correctly but then piped
the raw verdict string ('NOT YET') into $GITHUB_OUTPUT without a key,
which the runner rejects:

    Unable to process file command 'output' successfully.
    Invalid format 'NOT YET'

Fix: write 'verdict=<value>' instead. Also replace the space inside
the verdict ('GATE-2 PASS', 'NOT YET') with an underscore so the value
is a single token, since GITHUB_OUTPUT doesn't accept multi-word
unencoded values without the multiline EOF marker.

This output is informational only — the digest step reads from the
JSON file directly via jq, so the encoding change has no downstream
effect.

Refs openai#16.

Co-authored-by: Perplexity Computer <computer@perplexity.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants