submission: LeakyReLU2 + TrigramHashEmbedding (1.1448 bpb) by BhatiaUday · Pull Request #884 · openai/parameter-golf

BhatiaUday · 2026-03-26T18:41:30Z

LeakyReLU² + TrigramHashEmbedding (non-record track)

val_bpb: 1.1448 (3-seed mean, sliding window stride 64) | ~15.6 MB

Summary

Novel hash-based TrigramHashEmbedding on PR #414 stack (11L EMA + GPTQ-lite) with LeakyReLU(0.5)² from PR #549. The trigram embedding extends BigramHashEmbedding (PR #198) from 2-grams to 3-grams using XOR prime hashing into 2048 buckets, capturing richer local context at the input layer.

Results (3 seeds, 1×H100 NVL, proportional wallclock)

Seed	Steps	Sliding val_bpb	RT val_bpb	Artifact
1337	4,145	1.14587	1.16935	15,642,196
42	4,142	1.14562	1.16925	15,587,464
2025	4,142	1.14306	1.16677	15,591,832
Mean	4,143	1.14485	1.16846	15,607,164

Compute Note

Validated on 1×H100 NVL 96GB with proportional wallclock (4054s = 600s x 6.76) to match the 8xH100 training trajectory. The script uses grad_accum = 8 // world_size (auto: 1 on 8-GPU, 8 on 1-GPU) for identical effective batch size. Defaults to MAX_WALLCLOCK_SECONDS=600 on 8xH100.

Key Changes from Base

TrigramHashEmbedding (2048x48 -> project to 512): XOR prime-hash of 3 consecutive tokens, zero-init, learnable scale gate
LeakyReLU(0.5)^2: From PR Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549, preserves negative gradient flow
Int6 GPTQ-lite + LZMA compression
Sliding window eval stride 64

Files

records/track_non_record_16mb/2026-03-26_LeakyReLU2_TrigramHash/train_gpt.py
records/track_non_record_16mb/2026-03-26_LeakyReLU2_TrigramHash/submission.json
records/track_non_record_16mb/2026-03-26_LeakyReLU2_TrigramHash/requirements.txt
records/track_non_record_16mb/2026-03-26_LeakyReLU2_TrigramHash/README.md

Novel hash-based trigram embedding on PR openai#414 stack with LeakyReLU(0.5)^2. 3-seed mean sliding window val_bpb: 1.14485, all artifacts under 16MB. Validated on 1xH100 NVL with proportional wallclock (4054s = 600s x 6.76). Track: non-record (1xH100 validation).

MatoTeziTanka · 2026-04-11T20:08:24Z

Community Review — submission: LeakyReLU2 + TrigramHashEmbedding (1.1448 bpb)

BPB: 1.1448 | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

What I found in the code (head SHA 16e1e8c18b2f, file records/track_non_record_16mb/2026-03-26_LeakyReLU2_TrigramHash/train_gpt.py):

Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.22s, dim=512, layers=11, vocab=1024, code=70472 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline.

Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.22s, dim=512, layers=11, vocab=1024, code=70472 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

BhatiaUday force-pushed the submission/leakyrelu2-trigram-hash branch from 70c9ab9 to 16e1e8c Compare March 26, 2026 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

submission: LeakyReLU2 + TrigramHashEmbedding (1.1448 bpb)#884

submission: LeakyReLU2 + TrigramHashEmbedding (1.1448 bpb)#884
BhatiaUday wants to merge 1 commit intoopenai:mainfrom
BhatiaUday:submission/leakyrelu2-trigram-hash

BhatiaUday commented Mar 26, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BhatiaUday commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LeakyReLU² + TrigramHashEmbedding (non-record track)

Summary

Results (3 seeds, 1×H100 NVL, proportional wallclock)

Compute Note

Key Changes from Base

Files

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Community Review — submission: LeakyReLU2 + TrigramHashEmbedding (1.1448 bpb)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BhatiaUday commented Mar 26, 2026 •

edited

Loading