Skip to content

Record: SwiGLU + MLP 3x + Int6 + LoRA TTT, val_bpb=1.1670 (8xH100)#81

Open
polarizedfortnite-cpu wants to merge 18 commits intoopenai:mainfrom
polarizedfortnite-cpu:main
Open

Record: SwiGLU + MLP 3x + Int6 + LoRA TTT, val_bpb=1.1670 (8xH100)#81
polarizedfortnite-cpu wants to merge 18 commits intoopenai:mainfrom
polarizedfortnite-cpu:main

Conversation

@polarizedfortnite-cpu
Copy link
Copy Markdown

No description provided.

@polarizedfortnite-cpu polarizedfortnite-cpu changed the title Non-record: Depth Recurrence 4x3 + SwiGLU + Int6 Quant + Sliding Window, val_bpb=1.2269 (4xH100) Record: SwiGLU + MLP 3x + Int6 + LoRA TTT, val_bpb=1.1670 (8xH100) Mar 20, 2026
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Record: SwiGLU + MLP 3x + Int6 + LoRA TTT, val_bpb=1.1670 (8xH100)

Compliance: LOOKS CLEAN — legal score-first-per-chunk TTT (PR #1413 pattern)

PR #81 — flux_depth_recurrence_int6

Head SHA: 942a986
Claimed BPB: 1.1670 (raw 1.2004, TTT improvement 0.0334)
Techniques: SwiGLU, 10 layers, int6 quantization, zstd22, LoRA TTT eval, QAT last 25%, document-isolated eval


Check 1 — N-gram family bug (CLOSE trigger: target token in hash key)

No n-gram or hash-based models present anywhere in train_gpt.py. No BigramHash, no n-gram LUT, no token-keyed hash lookup. This check does not apply. CLEAN.

Check 2 — Pre-Quant TTT (CLOSE trigger: multi-epoch AdamW on val_tokens without score-first)

The TTT optimizer is torch.optim.Adam (not AdamW — no weight decay parameter). Single-pass per document, no epoch loop. The optimizer is reset per document batch via _reset_ttt_optimizer / lora.reset() before each document group. No multi-epoch loop over val_tokens. CLEAN.

Check 3 — Legal TTT / Score-first-per-chunk

The TTT loop in eval_val_ttt_lora (lines 967–1016) follows score-first discipline correctly:

  1. Forward pass runs on chunk ci (lines 991–996).
  2. Score is accumulated immediately after forward, unconditionally (lines 998–1006, _accumulate_bpb called before any gradient step).
  3. Train step only runs if needs_train (lines 1008–1016), which requires ci < nc - 1 — i.e., the model only trains on chunks that are not the final chunk of a document. The final chunk's loss is scored but never trained on.

This is the correct legal pattern matching PR #1413. Score happens before training for every chunk. CLEAN.

Check 4 — Scored-region SLOT

No scored-region manipulation detected. The BPB accumulation (_accumulate_bpb) operates over the natural document boundaries found by _find_docs. No selective region skipping or windowed re-scoring of only favorable regions. Eval covers all val documents assigned to each rank. HOLD not triggered. CLEAN.

Verdict: LOOKS CLEAN — legal TTT implementation matching the PR #1413 (dexhunter) pattern: each chunk scored under torch.no_grad() before optimizer.step(), with is_last_chunk guard preventing adaptation on the final scored chunk.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). TTT implementation follows the legal score-first discipline.


Reviewed by @MatoTeziTankaThe Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually.

gHashTag added a commit to gHashTag/parameter-golf that referenced this pull request Apr 30, 2026
Consolidate all Neon schema into canonical migrations.rs:
- Add igla_race_trials + gardener_runs DDL (were orphaned in neon.rs/event.rs)
- Add run_migrate() for direct Neon DDL application
- Add tri-railway audit migrate subcommand (applies via NEON_DATABASE_URL)
- Create versioned SQL files: migrations/0001, 0002
- Deprecate neon::GARDENER_DDL in favor of canonical source
- 3 new tests (igla_race_trials index, gardener_runs index, expanded canonical check)

Closes openai#81
Agent: GENERAL
gHashTag added a commit to gHashTag/parameter-golf that referenced this pull request Apr 30, 2026
Critical fixes from ADR-001 audit:
- Add spawn_heartbeat() background task (60s interval) — prevents
  gardener stale-eviction during long experiments
- Change telemetry from 100-step to 10-step reporting per ADR-001
- Fix DDL pull-queue index from ASC to DESC (matches claim SQL)
- Add openssl TLS to run_migrate() for Neon SSL connections
- Make igla_race_trials index creation safe for divergent schemas
- Arc<Client> wrapper for thread-safe heartbeat access

42 tests green (19 audit + 23 seed-agent).

Closes openai#81
Agent: GENERAL
gHashTag added a commit to gHashTag/parameter-golf that referenced this pull request Apr 30, 2026
- bin/seed-agent: full worker crate (claim, early-stop, telemetry, trainer)
- crates/trios-igla-race: queue/DB abstraction (neon, pull_queue, ASHA, status)
- bin/tri-gardener/bpb_source: leaderboard query source
- crates/trios-railway-core/multiclient: multi-account Railway client
- .gitignore: exclude **/target/, .env, worker logs, scripts
- 72 experiments completed (budget 2K/5K), power-law analysis done
- GF16-E0090 identified as top candidate for long GPU runs

Agent: GENERAL
Closes openai#81
gHashTag pushed a commit to gHashTag/parameter-golf that referenced this pull request Apr 30, 2026
All 123 'done' experiments confirmed as MockTrainer simulations.
Real trainer (trios-igla-race/seed_agent.rs) never deployed to Railway.
MCP tools 6/6 operational.

Closes openai#81
Agent: GENERAL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants