Non-record: Turbo-Muon + EngramLite(10240) + VE(8,9,10) — val_bpb 1.1431 by SergheiBrinza · Pull Request #1205 · openai/parameter-golf

SergheiBrinza · 2026-04-01T01:18:36Z

Summary

Non-record submission based on the PR #1089 Turbo-Muon + EngramLite stack with hyperparameter tuning.

val_bpb: 1.1431 (3-seed mean, std 0.0007)

Seed	val_bpb (sliding)
1337	1.1425
42	1.1438
2024	1.1431

Changes from PR #1089

Higher LR (0.030 vs 0.025) for faster convergence
Wider EngramLite (10240x48 vs 8192x32) for more n-gram coverage
VE on layers 8,9,10 (vs 9,10) for additional token identity injection
Warmdown 4500 (vs 3500) for smoother weight averaging
Muon momentum warmup 1000 steps (vs 1500)

Key Finding

The increased model size (~31.6M vs 30.7M params) pushed the artifact to 16.36MB pre-compression, forcing all 66 weight groups into int5 with 0 promotions to int6/int7 and 20.5% selective pruning. This aggressive quantization likely offset the architectural gains. The 16MB budget is extremely tight — even small parameter increases can cascade into significant quality loss through the quantization pipeline.

Hardware

8xH100 80GB SXM, 600s training, ~5550 steps at 106ms/step.

… 1.1431 Based on PR openai#1089 stack with hyperparameter tuning: - Higher LR (0.030 vs 0.025) for faster convergence - Wider EngramLite (10240x48 vs 8192x32) - VE on layers 8,9,10 (vs 9,10) - Warmdown 4500 (vs 3500) - Muon momentum warmup 1000 steps (vs 1500) 3-seed mean: 1.1431 (std 0.0007) Seeds: 1337=1.1425, 42=1.1438, 2024=1.1431

MatoTeziTanka · 2026-04-12T05:58:05Z

Community Review — Non-record: Turbo-Muon + EngramLite(10240) + VE(8,9,10) — val_bpb 1.1431

Compliance: LOOKS CLEAN — legal score-first-per-chunk TTT (PR #1413 pattern)

PR #1205 Audit — Two Submissions Head SHA: `974948e` --- ## Submission 1: 2026-03-21_MixedQuant_BigramHash_SWA (val_bpb: 1.2421) BigramHash implementation (lines 525–527): `python prev = F.pad(input_ids[:, :-1], (1, 0), value=0) bh = (prev * 7919 + input_ids) % self.bigram_hash_size x = self.tok_emb(input_ids) + self.bigram_proj(self.bigram_embed(bh))` Hash key uses `prev` (context shift of input) and `input_ids` (current input token). Target IDs (`target_ids`) are NOT XOR'd into the hash key. This is the correct BigramHash pattern — no illegal target-leakage into the hash. No XOR anywhere in the hash construction. No TTT: `eval_val()` (lines 179–211) runs entirely under `torch.inference_mode()`, calls no optimizer, performs no backward pass. No TTT variables or functions present in submission 1. No scored-region SLOT detected. No multi-epoch val training. VERDICT: PURE_NEURAL_CLEAN --- ## Submission 2: 2026-04-01_TurboMuon_EngramLite_Improved (val_bpb: 1.1431) EngramLite n-gram hash (lines 887–907): Multi-head bigram+trigram hashing over `input_ids` and `prev_ids` (shifted context). Example: `python bi_h0 = (prev_ids * 1009 + input_ids) % B tri_h0 = ((pp_ids * 36313) ^ (prev_ids * 27191) ^ (input_ids * 4903)) % B` XOR is used for hash mixing but only among context tokens (`pp_ids`, `prev_ids`, `input_ids`). `target_ids` is never incorporated into the hash key. No illegal target-leakage. TTT (lines 1261–1562): `eval_val_sliding_ttt()` implements score-first TTT. Structure is canonical PR #1413 pattern: - PHASE 1 (lines 1384–1456): Score each chunk under `torch.no_grad()` / `train(False)` — loss accumulated before any weight update. - PHASE 2 (lines 1469–1535): Train on scored chunk only; guarded by `is_last = ci == num_chunks - 1` / `if not is_last...

Verdict: LOOKS CLEAN — legal TTT implementation matching the PR #1413 (dexhunter) pattern: each chunk scored under torch.no_grad() before optimizer.step(), with is_last_chunk guard preventing adaptation on the final scored chunk.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). TTT implementation follows the legal score-first discipline.

Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually.

Personal case-study of my participation in the OpenAI Model Craft Challenge, plus the April Turbo-Muon submission brought to main so internal links resolve. Contents: - README.md: personal narrative and results tables - docs/METHODS.md: technical breakdown of each technique used - docs/EXPERIMENTS.md: verified runs and post-mortem of 020_ultimate - docs/UPSTREAM_README.md: original OpenAI README preserved for context - scripts/plot_curves.py: build training curves from train_*.log - assets/loss_curves.png: training dynamics of both submissions - Rewritten README for the 2026-03-21 submission - Full 2026-04-01 Turbo-Muon submission ported from the PR branch: README, submission.json, train_gpt.py, three seed logs Results on main: - 2026-03-21 Mixed Quantization + BigramHash + SWA: val_bpb 1.2421 - 2026-04-01 Turbo-Muon + EngramLite (3 seeds, std 0.0007): val_bpb 1.1431 Upstream PRs: - openai#370 - openai#1205

SergheiBrinza added 2 commits March 21, 2026 21:32

Add submission: Mixed Quantization + BigramHash + SWA (val_bpb 1.2421)

3b714bc

SergheiBrinza force-pushed the submission/2026-04-01_TurboMuon_EngramLite_Improved branch from 2d2f0d7 to 974948e Compare April 1, 2026 01:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Turbo-Muon + EngramLite(10240) + VE(8,9,10) — val_bpb 1.1431#1205

Non-record: Turbo-Muon + EngramLite(10240) + VE(8,9,10) — val_bpb 1.1431#1205
SergheiBrinza wants to merge 2 commits intoopenai:mainfrom
SergheiBrinza:submission/2026-04-01_TurboMuon_EngramLite_Improved

SergheiBrinza commented Apr 1, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SergheiBrinza commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes from PR #1089

Key Finding

Hardware

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Community Review — Non-record: Turbo-Muon + EngramLite(10240) + VE(8,9,10) — val_bpb 1.1431

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SergheiBrinza commented Apr 1, 2026 •

edited

Loading