Record: 11L Int6 + SmearGate + BigramHash + Depth Recurrence#268
Record: 11L Int6 + SmearGate + BigramHash + Depth Recurrence#268brn-mwai wants to merge 2 commits intoopenai:mainfrom
Conversation
… pending) Competitive recipe with novel depth recurrence option: - 11 layers, 512 dim, 3x MLP, Int6 quant + zstd-22 - SmearGate, BigramHash, Muon WD, SWA, sliding window eval - Optional depth recurrence: 5 unique blocks, 11 effective depth - Vectorized int6 packing, FP16 embedding passthrough BPB pending 8xH100 validation run.
|
yoo depth recurrence is lowkey smart, sharing blocks like that frees up so much param budget. lmk when you get a score on this im curious how it stacks up |
…ith repeat_interleave)
Community Review — Record: 11L Int6 + SmearGate + BigramHash + Depth RecurrenceCompliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache PR #268 — Brian Mwai (@brn-mwai) Check 1: N-gram Family Bug (CLOSE trigger)Result: CLEAN
prev = F.pad(input_ids[:, :-1], (1, 0)) # prev[t] = input_ids[t-1]
h = (prev.long() * 2654435761 + input_ids.long()) % self.table_size
Check 2: Pre-Quant TTT (CLOSE trigger)Result: CLEAN — no TTT present There is no test-time training in this submission. Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the audit — this looks like a clean pure-neural submission. Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually. |
Summary
Competitive recipe with a novel depth recurrence option for the 10-min 16MB track.
Architecture
Techniques
Novel: Depth Recurrence
Optional mode that shares transformer blocks across multiple loops:
Vectorized Int6 Packing
Rewrote the int6 bit-packing from Python loops to vectorized NumPy ops. 27M params pack in ~90ms instead of minutes.
Validation
Checklist
records/track_10min_16mb/README.mdwith approach descriptionsubmission.jsonwith metadatatrain_gpt.py(single file, self-contained)