Non-record: 11L LeakyReLU² + Int6 + EMA (~1.1200 BPB) by vibhu1510 · Pull Request #2 · vibhu1510/parameter-golf-vibhu

vibhu1510 · 2026-03-25T02:59:47Z

Summary

Non-record submission applying LeakyReLU(0.5)² activation to the PR openai#414 base (1.1233 BPB), targeting ~1.1200 BPB. Single activation change provides ~-0.003 BPB by preserving negative gradient flow while maintaining relu² inductive bias.

Architecture

Component	Setting
Layers	11 (512d, 8H, 4KV)
MLP	3× with LeakyReLU(0.5)²
BigramHash	2048
XSA	Last 4 layers
RoPE	Partial (16/64 dims)
LN Scale	1/√(layer+1)
VE128	Layers 9-10
Warmdown	3500 steps
EMA	decay=0.997
Late QAT	scale < 0.15
Quantization	Int6 per-row + zstd-22
Eval	Sliding window stride=64

Key change

# Before (relu²):
x = torch.relu(self.fc(x)).square()

# After (leaky relu²):
x = F.leaky_relu(self.fc(x), negative_slope=0.5).square()

Reproduction

RUN_ID=leakyrelu2_seed1337 \
SEED=1337 \
DATA_PATH=./data/datasets/fineweb10B_sp1024/ \
TOKENIZER_PATH=./data/tokenizers/fineweb_1024_bpe.model \
VOCAB_SIZE=1024 \
torchrun --standalone --nproc_per_node=8 \
  records/track_10min_16mb/2026-03-25_11L_LeakyReLU2_PartialRoPE_Int6_EMA/train_gpt.py

Train logs will be added after 8×H100 SXM runs are completed.

Credits

Base model: PR #414 by @signalrush (1.1233 BPB)
LeakyReLU² activation: PR #493 by @parinzee, PR #518 by @sofiabod

Checklist

README.md with submission details
submission.json with metadata
train_gpt.py script (compiles and runs within records folder)
Train logs from 3-seed runs on 8×H100 SXM (pending compute access)
PR only adds files to /records subfolder

https://claude.ai/code/session_01NxwYaRCHETG1Spm3Ag8hiw

Apply LeakyReLU(0.5)² activation to PR openai#414 base (1.1233 BPB). Single activation change provides ~-0.003 BPB by preserving negative gradient flow while maintaining relu² inductive bias. Built on PR openai#414 stack with int6 GPTQ-lite quantization, EMA(0.997), partial RoPE (16/64), LN scale, XSA last 4, warmdown 3500, late QAT. https://claude.ai/code/session_01NxwYaRCHETG1Spm3Ag8hiw

vibhu1510 merged commit 0fad402 into main Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 11L LeakyReLU² + Int6 + EMA (~1.1200 BPB)#2

Non-record: 11L LeakyReLU² + Int6 + EMA (~1.1200 BPB)#2
vibhu1510 merged 1 commit intomainfrom
submission/11L-leakyrelu2-int6-ema

vibhu1510 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vibhu1510 commented Mar 25, 2026

Summary

Architecture

Key change

Reproduction

Credits

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants