docs: increase MLX smoke validation batch size by brendanboyle87 · Pull Request #36 · openai/parameter-golf

brendanboyle87 · 2026-03-18T23:53:33Z

Summary

update the README MLX smoke command to use VAL_BATCH_SIZE=524288
keep the rest of the local trial-run example unchanged

Why

The default validation batch size setting in the README trial run takes a very long time on a local Mac run for an M4 Max Mac Studio with 128GB, so this raises the documented MLX smoke-test value to a more practical local setting.

- add a PR-audit research log entry covering the clean takeaways from pull requests openai#36 through openai#70 - promote long-context training plus matching long-context eval as a first-class clean branch based on PR openai#61 and PR openai#63 - refine mixed-precision export notes to emphasize using int6/int8 byte savings to fund wider MLP capacity, based on PR openai#65 - update the current snapshot and research thesis so future agents do not over-focus on exporter-only ideas after the broader PR sweep

cocohearts · 2026-03-20T18:17:35Z

??? this is increasing val batch size??

brendanboyle87 · 2026-03-20T19:13:01Z

??? this is increasing val batch size??

Sorry if I was off base here

This was based on the fact that this script is for local mlx dev. there was no intermediate output so I was trying to figure out how long validation would take. Codex gave an estimate in hours vs minutes

“On this machine, a full validation with the old VAL_BATCH_SIZE=8192 is roughly a 5 to 6+ hour job. With VAL_BATCH_SIZE=524288, it is about 5 minutes.

The reason is in train_gpt_mlx.py:766: validation uses VAL_BATCH_SIZE // GRAD_ACCUM_STEPS. With GRAD_ACCUM_STEPS=8 and TRAIN_SEQ_LEN=1024, 8192 means only 1024 eval tokens per batch, which is exactly 1 sequence. 524288 means 65536 eval tokens, or 64 sequences per
batch. On the local validation split here, that works out to 60,568 eval batches vs 947 eval batches.”

…penai#143) (openai#36) * feat(dr): railway template + hourly fleet snapshot + DR runbook (refs openai#143) Anchor: phi^2 + phi^-2 = 3. Closes the 'one-click DR' gap. After a Railway-account ban, payment lapse, or project deletion, the operator can rebuild the IGLA fleet in under 15 minutes via any of three paths: A. Railway template marketplace button (railway-template.json) B. GitHub Actions workflow_dispatch (deploy-from-template.yml) C. tri-railway CLI from fleet snapshot (disaster-recovery/fleet-snapshot.json) Files added: railway-template.json Marketplace-ready template with 6 services pinned to GHCR images: - trios-mcp-public (control plane) - igla-final-seed-{42,43,44} (champion lane A, 60K steps) - trios-dwagent (auto-claim further trials) - neon-backup-r2 (hourly pg_dump to Cloudflare R2) All env vars are placeholder-substituted; no hard-coded secrets. scripts/snapshot-fleet.sh Probes Railway GraphQL for every (alias, project) tuple and emits a single fleet-snapshot.json. Verified locally against the live fleet: 29 services across 4 projects across 2 accounts. .github/workflows/fleet-snapshot.yml Hourly cron at :15 UTC (offset from the audit watchdog at :05). Runs the snapshot, commits if the file changed, otherwise no-op. Survives any single account ban because the snapshot is in git. .github/workflows/deploy-from-template.yml Operator-triggered (workflow_dispatch) DR provisioning. Reads railway-template.json, calls Railway GraphQL to create one project + N services in the chosen account_alias, then writes disaster-recovery/last-restore.json with the new UUIDs. disaster-recovery/fleet-snapshot.json Initial snapshot at this commit (29 services, 4 projects). docs/DISASTER_RECOVERY.md Full runbook: what survives a ban (table), three trigger paths, required secrets, recovery sequence in detail, cost expectations, why the template only ships 6 of the 29 services. README.md DR section reorganised — three trigger paths instead of one. Old links to restore-fleet.json / docs/DR.md remain valid via the runbook's history section. Operator action items (not blocking the merge): 1. https://github.com/settings/personal-access-tokens — rotate the PAT shared in chat (out of caution); update TRIOS_REPO_PAT. 2. https://railway.com/template/new — publish railway-template.json to the marketplace; this gives the README its real button URL. 3. https://dash.cloudflare.com/r2 — create the bucket 'igla-ledger-backups'; set R2_* secrets in Actions Secrets. Refs trios#143 / openai#16. * fix(dr): rewrite snapshot in Rust as 'tri-railway snapshot fleet' (L1) CI flagged the original scripts/snapshot-fleet.sh under L1 — Rust only, zero shell scripts. Replace it with a first-class subcommand: tri-railway snapshot fleet \ --out disaster-recovery/fleet-snapshot.json \ --account 'alias=acc1,token_env=...,project_env=...,label=...' \ --email 'acc1=user@host' The subcommand reads tokens and project UUIDs from process env (so GitHub Actions can wire secrets via job-level env: blocks), probes Railway GraphQL via the existing transport client, and writes the exact same JSON the shell script produced. Tokens are NEVER recorded in the snapshot — only the env-var name is kept under .token_secret. fleet-snapshot.yml updated to call the binary instead of bash. Locally verified against the live fleet: 29 services across 4 projects across 2 accounts, byte-identical output to the previous shell version. Refs trios#143 / openai#16. --------- Co-authored-by: Perplexity Computer <computer@perplexity.ai>

docs: increase MLX smoke validation batch size

02eb5d1

0hq added the enhancement New feature or request label Mar 19, 2026

cocohearts closed this Mar 20, 2026

brendanboyle87 deleted the mlx-val-batch-size branch March 20, 2026 19:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: increase MLX smoke validation batch size#36

docs: increase MLX smoke validation batch size#36
brendanboyle87 wants to merge 1 commit intoopenai:mainfrom
brendanboyle87:mlx-val-batch-size

brendanboyle87 commented Mar 18, 2026

Uh oh!

cocohearts commented Mar 20, 2026

Uh oh!

brendanboyle87 commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

brendanboyle87 commented Mar 18, 2026

Summary

Why

Uh oh!

cocohearts commented Mar 20, 2026

Uh oh!

brendanboyle87 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brendanboyle87 commented Mar 20, 2026 •

edited

Loading