Record: SwiGLU + MLP 3x + Int6 + LoRA TTT, val_bpb=1.1670 (8xH100) by polarizedfortnite-cpu · Pull Request #81 · openai/parameter-golf

polarizedfortnite-cpu · 2026-03-19T13:11:19Z

No description provided.

MatoTeziTanka · 2026-04-12T13:56:35Z

Community Review — Record: SwiGLU + MLP 3x + Int6 + LoRA TTT, val_bpb=1.1670 (8xH100)

Compliance: LOOKS CLEAN — legal score-first-per-chunk TTT (PR #1413 pattern)

PR #81 — flux_depth_recurrence_int6

Head SHA: 942a986
Claimed BPB: 1.1670 (raw 1.2004, TTT improvement 0.0334)
Techniques: SwiGLU, 10 layers, int6 quantization, zstd22, LoRA TTT eval, QAT last 25%, document-isolated eval

Check 1 — N-gram family bug (CLOSE trigger: target token in hash key)

No n-gram or hash-based models present anywhere in train_gpt.py. No BigramHash, no n-gram LUT, no token-keyed hash lookup. This check does not apply. CLEAN.

Check 2 — Pre-Quant TTT (CLOSE trigger: multi-epoch AdamW on val_tokens without score-first)

The TTT optimizer is torch.optim.Adam (not AdamW — no weight decay parameter). Single-pass per document, no epoch loop. The optimizer is reset per document batch via _reset_ttt_optimizer / lora.reset() before each document group. No multi-epoch loop over val_tokens. CLEAN.

Check 3 — Legal TTT / Score-first-per-chunk

The TTT loop in eval_val_ttt_lora (lines 967–1016) follows score-first discipline correctly:

Forward pass runs on chunk ci (lines 991–996).
Score is accumulated immediately after forward, unconditionally (lines 998–1006, _accumulate_bpb called before any gradient step).
Train step only runs if needs_train (lines 1008–1016), which requires ci < nc - 1 — i.e., the model only trains on chunks that are not the final chunk of a document. The final chunk's loss is scored but never trained on.

This is the correct legal pattern matching PR #1413. Score happens before training for every chunk. CLEAN.

Check 4 — Scored-region SLOT

No scored-region manipulation detected. The BPB accumulation (_accumulate_bpb) operates over the natural document boundaries found by _find_docs. No selective region skipping or windowed re-scoring of only favorable regions. Eval covers all val documents assigned to each rank. HOLD not triggered. CLEAN.

Verdict: LOOKS CLEAN — legal TTT implementation matching the PR #1413 (dexhunter) pattern: each chunk scored under torch.no_grad() before optimizer.step(), with is_last_chunk guard preventing adaptation on the final scored chunk.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). TTT implementation follows the legal score-first discipline.

Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually.

Consolidate all Neon schema into canonical migrations.rs: - Add igla_race_trials + gardener_runs DDL (were orphaned in neon.rs/event.rs) - Add run_migrate() for direct Neon DDL application - Add tri-railway audit migrate subcommand (applies via NEON_DATABASE_URL) - Create versioned SQL files: migrations/0001, 0002 - Deprecate neon::GARDENER_DDL in favor of canonical source - 3 new tests (igla_race_trials index, gardener_runs index, expanded canonical check) Closes openai#81 Agent: GENERAL

Critical fixes from ADR-001 audit: - Add spawn_heartbeat() background task (60s interval) — prevents gardener stale-eviction during long experiments - Change telemetry from 100-step to 10-step reporting per ADR-001 - Fix DDL pull-queue index from ASC to DESC (matches claim SQL) - Add openssl TLS to run_migrate() for Neon SSL connections - Make igla_race_trials index creation safe for divergent schemas - Arc<Client> wrapper for thread-safe heartbeat access 42 tests green (19 audit + 23 seed-agent). Closes openai#81 Agent: GENERAL

- bin/seed-agent: full worker crate (claim, early-stop, telemetry, trainer) - crates/trios-igla-race: queue/DB abstraction (neon, pull_queue, ASHA, status) - bin/tri-gardener/bpb_source: leaderboard query source - crates/trios-railway-core/multiclient: multi-account Railway client - .gitignore: exclude **/target/, .env, worker logs, scripts - 72 experiments completed (budget 2K/5K), power-law analysis done - GF16-E0090 identified as top candidate for long GPU runs Agent: GENERAL Closes openai#81

All 123 'done' experiments confirmed as MockTrainer simulations. Real trainer (trios-igla-race/seed_agent.rs) never deployed to Railway. MCP tools 6/6 operational. Closes openai#81 Agent: GENERAL

polarizedfortnite-cpu added 4 commits March 19, 2026 21:06

Create submission.json

9233b6e

Create README.md

4da05e2

Create trainlog.txt

0c5d08a

Add files via upload

210b0ba

notapplica mentioned this pull request Mar 19, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

polarizedfortnite-cpu added 6 commits March 20, 2026 12:43

Update README.md

5a0a303

Update submission.json

cd8ba84

Update trainlog.txt

0461861

Update train_gpt.py

02e4bf3

Delete records/flux_depth_recurrence_int6/patch_int6.py

caef182

Add files via upload

b459056

polarizedfortnite-cpu changed the title ~~Non-record: Depth Recurrence 4x3 + SwiGLU + Int6 Quant + Sliding Window, val_bpb=1.2269 (4xH100)~~ Record: SwiGLU + MLP 3x + Int6 + LoRA TTT, val_bpb=1.1670 (8xH100) Mar 20, 2026

polarizedfortnite-cpu added 8 commits March 20, 2026 13:44

Create patch_smear_bygram.py

16b2f01

Create run.sh

f2fe4de

Create setup.sh

5ea687e

Delete records/flux_depth_recurrence_int6/patch_smear_bygram.py

cee17c3

Create patch_smeargate.py

9c4a578

Create patch_smear_muonwd.py

20ad978

Create patch_all.py

68065dd

Create train_gpt_patched.py

942a986

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SwiGLU + MLP 3x + Int6 + LoRA TTT, val_bpb=1.1670 (8xH100)#81

Record: SwiGLU + MLP 3x + Int6 + LoRA TTT, val_bpb=1.1670 (8xH100)#81
polarizedfortnite-cpu wants to merge 18 commits intoopenai:mainfrom
polarizedfortnite-cpu:main

polarizedfortnite-cpu commented Mar 19, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

polarizedfortnite-cpu commented Mar 19, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Community Review — Record: SwiGLU + MLP 3x + Int6 + LoRA TTT, val_bpb=1.1670 (8xH100)

PR #81 — flux_depth_recurrence_int6

Check 1 — N-gram family bug (CLOSE trigger: target token in hash key)

Check 2 — Pre-Quant TTT (CLOSE trigger: multi-epoch AdamW on val_tokens without score-first)

Check 3 — Legal TTT / Score-first-per-chunk

Check 4 — Scored-region SLOT

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants