adaLN_recurrence [val_bpb=1.255 on 4 x H100] by dmitriymyan1 · Pull Request #1944 · openai/parameter-golf

dmitriymyan1 · 2026-04-29T19:13:12Z

Summary

Adds adaLN (adaptive layer norm) conditioned on recurrence iteration to the Parallel Residuals + Mini Depth Recurrence baseline
Allows weight-tied recurrent layers (4, 5) to distinguish their first vs second pass via lightweight per-channel affine modulation (~6.6K extra parameters, ~zero compute overhead)
Zero-initialized projection ensures training starts identically to the baseline

Early Result

Smoke-test on 4×H100 / 600s (50% of submission compute): val_bpb 1.2551 (val_loss 2.1193 nats), 15.26 MB quantized artifact. Only ~400 recurrent training steps ran before wallclock cap — loss curve still descending cleanly at cutoff. Full 8×H100 run pending.

Files

train_gpt.py — training script with adaLN support (FILM_ENABLED=1)
README.md — approach description and reproducibility instructions
requirements.txt — dependencies (adds brotli)

Add adaLN (adaptive layer norm) conditioned on recurrence iteration to the Parallel Residuals + Mini Depth Recurrence baseline. Allows weight-tied recurrent layers to distinguish first vs second pass with ~zero compute overhead (~6.6K extra parameters). Early result: val_bpb 1.2551 on 4xH100/600s (half compute, only ~400 recurrent steps before wallclock cap). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cocohearts mentioned this pull request May 2, 2026

Update leaderboard with May 1 audited rows #2146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adaLN_recurrence [val_bpb=1.255 on 4 x H100]#1944

adaLN_recurrence [val_bpb=1.255 on 4 x H100]#1944
dmitriymyan1 wants to merge 1 commit intoopenai:mainfrom
dmitriymyan1:adaLN_recurrence

dmitriymyan1 commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dmitriymyan1 commented Apr 29, 2026

Summary

Early Result

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant