submission: LeakyReLU2 + TrigramHashEmbedding (1.1448 bpb)#884
submission: LeakyReLU2 + TrigramHashEmbedding (1.1448 bpb)#884BhatiaUday wants to merge 1 commit intoopenai:mainfrom
Conversation
Novel hash-based trigram embedding on PR openai#414 stack with LeakyReLU(0.5)^2. 3-seed mean sliding window val_bpb: 1.14485, all artifacts under 16MB. Validated on 1xH100 NVL with proportional wallclock (4054s = 600s x 6.76). Track: non-record (1xH100 validation).
70c9ab9 to
16e1e8c
Compare
Community Review — submission: LeakyReLU2 + TrigramHashEmbedding (1.1448 bpb)BPB: 1.1448 | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache What I found in the code (head SHA Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.22s, dim=512, layers=11, vocab=1024, code=70472 B, SMOKE_TEST_PASS Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline. Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.22s, dim=512, layers=11, vocab=1024, code=70472 B, SMOKE_TEST_PASS. Classification via deterministic AST-based |
LeakyReLU² + TrigramHashEmbedding (non-record track)
val_bpb: 1.1448 (3-seed mean, sliding window stride 64) | ~15.6 MB
Summary
Novel hash-based TrigramHashEmbedding on PR #414 stack (11L EMA + GPTQ-lite) with LeakyReLU(0.5)² from PR #549. The trigram embedding extends BigramHashEmbedding (PR #198) from 2-grams to 3-grams using XOR prime hashing into 2048 buckets, capturing richer local context at the input layer.
Results (3 seeds, 1×H100 NVL, proportional wallclock)
Compute Note
Validated on 1×H100 NVL 96GB with proportional wallclock (4054s = 600s x 6.76) to match the 8xH100 training trajectory. The script uses
grad_accum = 8 // world_size(auto: 1 on 8-GPU, 8 on 1-GPU) for identical effective batch size. Defaults toMAX_WALLCLOCK_SECONDS=600on 8xH100.Key Changes from Base
Files
records/track_non_record_16mb/2026-03-26_LeakyReLU2_TrigramHash/train_gpt.pyrecords/track_non_record_16mb/2026-03-26_LeakyReLU2_TrigramHash/submission.jsonrecords/track_non_record_16mb/2026-03-26_LeakyReLU2_TrigramHash/requirements.txtrecords/track_non_record_16mb/2026-03-26_LeakyReLU2_TrigramHash/README.md