Fix MLX multi-batch validation memory growth#32
Merged
0hq merged 2 commits intoopenai:mainfrom Mar 18, 2026
Merged
Conversation
kxddry
pushed a commit
to kxddry/parameter-golf
that referenced
this pull request
Mar 19, 2026
Fix MLX multi-batch validation memory growth
gHashTag
added a commit
to gHashTag/parameter-golf
that referenced
this pull request
Apr 30, 2026
… fix) (openai#32) Anchor: phi^2 + phi^-2 = 3. The first scheduled hourly run failed with: jq: parse error: Invalid numeric literal at line 1, column 2 Root cause: 'tail -n 1' on the merged 2>&1 stream caught the tracing-subscriber INFO line ('audit triplet sealed experience=...') that prints AFTER the --json blob, instead of the JSON itself. Fix: 1. Split stdout (json + text summary) and stderr (tracing) into separate files: '> /tmp/accN.txt 2> /tmp/accN.log'. The Rust binary already writes tracing to stderr in main.rs (with_writer(io::stderr)), so the separation is honest, not a workaround. 2. Replace 'tail -n 1' with 'grep -E "^\s*\{" | tail -n 1'. The --json blob is a single line whose first non-whitespace char is '{', and tri-railway prints exactly one such line. This is robust against future text additions to the human summary. 3. Synthesize a DRIFT-shaped fallback JSON when grep finds nothing (network error, etc.) so the digest step never crashes; the workflow goes red on combined exit=1 instead. 4. Echo both stdout and stderr to the workflow log for triage. Verified locally against the live IGLA project (Acc1 token): grep -E '^\s*\{' /tmp/test_stdout.txt | tail -n 1 > /tmp/test.json jq -r '.verdict, .services, .exit_code' => NOT YET, 18, 2 Refs openai#16. Co-authored-by: Perplexity Computer <computer@perplexity.ai>
gHashTag
added a commit
to gHashTag/parameter-golf
that referenced
this pull request
Apr 30, 2026
…penai#33) The previous fix (PR openai#32) extracted the JSON correctly but then piped the raw verdict string ('NOT YET') into $GITHUB_OUTPUT without a key, which the runner rejects: Unable to process file command 'output' successfully. Invalid format 'NOT YET' Fix: write 'verdict=<value>' instead. Also replace the space inside the verdict ('GATE-2 PASS', 'NOT YET') with an underscore so the value is a single token, since GITHUB_OUTPUT doesn't accept multi-word unencoded values without the multiline EOF marker. This output is informational only — the digest step reads from the JSON file directly via jq, so the encoding change has no downstream effect. Refs openai#16. Co-authored-by: Perplexity Computer <computer@perplexity.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why this happens
This only shows up when validation spans multiple batches. In
eval_val(),total_losswas accumulated as anmx.array, so MLX kept extending a lazy graph until the finalmx.eval(...). Memory usage then grew steadily during validation, and the script looked like it hung.This is especially easy to hit in the MLX smoke flow because validation always runs over the full
fineweb_val_*split, while the README example uses a smallVAL_BATCH_SIZE. In practice the first visible symptom is often after the final training step, which makes it look like a post-training hang.Single-batch validation does not exhibit the problem.
Logging
The progress logging is intentional here: once validation is split across many batches, a long final validation pass can otherwise look indistinguishable from a freeze.
Validation
python -m py_compile train_gpt_mlx.py