Record: 11L Int6 + SmearGate + BigramHash + Depth Recurrence by brn-mwai · Pull Request #268 · openai/parameter-golf

brn-mwai · 2026-03-20T21:15:20Z

Summary

Competitive recipe with a novel depth recurrence option for the 10-min 16MB track.

Architecture

11 layers, 512 dim, 3x MLP expansion (hidden=1536)
8 attention heads, 4 KV heads (GQA)
~27M parameters, quantized to int6 + zstd-22

Techniques

Technique	Source
Int6 per-row quantization + zstd-22	Competition meta
11 layers with 3x MLP	Funded by int6 byte savings
SmearGate (learned token blending)	PR #135
BigramHash (4096x128 hash table)	PR #162
Muon optimizer with weight decay (0.03)	PR #179
Stochastic Weight Averaging (last 50%)	PR #162
Sliding window eval (stride=64)	PR #56
FP16 embedding passthrough	PR #42
Orthogonal initialization	PR #135
Depth recurrence (optional mode)	Novel

Novel: Depth Recurrence

Optional mode that shares transformer blocks across multiple loops:

2 prelude + 1 shared (looped 7x) + 2 coda = 5 unique blocks, 11 effective depth
Per-iteration embeddings tell the shared block which pass it's on
Freed parameter budget can go to wider model (640+ dim)
Motivated by MDL theory: lower L(model) frees bits for L(data|model)

Vectorized Int6 Packing

Rewrote the int6 bit-packing from Python loops to vectorized NumPy ops. 27M params pack in ~90ms instead of minutes.

Validation

BPB: Pending 8xH100 validation (applying for compute grant)
Code has been syntax-verified and int6 roundtrip tested locally

Checklist

Submission folder in records/track_10min_16mb/
README.md with approach description
submission.json with metadata
train_gpt.py (single file, self-contained)
Training log (pending compute)
BPB score (pending compute)

… pending) Competitive recipe with novel depth recurrence option: - 11 layers, 512 dim, 3x MLP, Int6 quant + zstd-22 - SmearGate, BigramHash, Muon WD, SWA, sliding window eval - Optional depth recurrence: 5 unique blocks, 11 effective depth - Vectorized int6 packing, FP16 embedding passthrough BPB pending 8xH100 validation run.

mohosy · 2026-03-21T00:00:18Z

yoo depth recurrence is lowkey smart, sharing blocks like that frees up so much param budget. lmk when you get a score on this im curious how it stacks up

…ith repeat_interleave)

MatoTeziTanka · 2026-04-12T14:00:42Z

Community Review — Record: 11L Int6 + SmearGate + BigramHash + Depth Recurrence

Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

PR #268 — Brian Mwai (@brn-mwai)
Title: Record: 11L Int6 + SmearGate + BigramHash + Depth Recurrence
Head SHA: 498bec4
Score: Not submitted (all null in submission.json)

Check 1: N-gram Family Bug (CLOSE trigger)

Result: CLEAN

BigramHash.forward (line 682–688):

prev = F.pad(input_ids[:, :-1], (1, 0))   # prev[t] = input_ids[t-1]
h = (prev.long() * 2654435761 + input_ids.long()) % self.table_size

input_ids is the input context sequence (x). At position t, input_ids[t] is the current context token (known), not the target. The target is y[t] = input_ids[t+1]. The hash key is (context[t-1], context[t]) — a bigram of two already-seen tokens. No target token leakage. This is the legal BigramHash pattern, not the CLOSE variant.

Check 2: Pre-Quant TTT (CLOSE trigger)

Result: CLEAN — no TTT present

There is no test-time training in this submission. val_tokens is only ever consumed in eval_val and eval_val_standard, both of which run under torch.inference_mode() with model.eval(). No gradient step is taken on val data at any point. No AdamW, no fine-tuning loop on val tokens. The submission uses SWA (stochastic weight averaging over train checkpoints) and sliding-window evaluation — both legal.

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the audit — this looks like a clean pure-neural submission.

Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually.

Fix GQA attention for PyTorch 2.4 compatibility (replace enable_gqa w…

498bec4

…ith repeat_interleave)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L Int6 + SmearGate + BigramHash + Depth Recurrence#268

Record: 11L Int6 + SmearGate + BigramHash + Depth Recurrence#268
brn-mwai wants to merge 2 commits intoopenai:mainfrom
brn-mwai:submission/brn-mwai-v1

brn-mwai commented Mar 20, 2026

Uh oh!

mohosy commented Mar 21, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

brn-mwai commented Mar 20, 2026

Summary

Architecture

Techniques

Novel: Depth Recurrence

Vectorized Int6 Packing

Validation

Checklist

Uh oh!

mohosy commented Mar 21, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Community Review — Record: 11L Int6 + SmearGate + BigramHash + Depth Recurrence

Check 1: N-gram Family Bug (CLOSE trigger)

Check 2: Pre-Quant TTT (CLOSE trigger)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants