[Non-record] Universal Transformer Depth Recurrence INT6 by thestbobo · Pull Request #1640 · openai/parameter-golf

thestbobo · 2026-04-15T10:59:46Z

Universal Transformer with Depth Recurrence (non-record track, 16 MB)

Status: Draft — architecture submitted, awaiting 8×H100 compute run.
Requesting compute credits to run the full 4-hour training.

Why this is interesting (per the repo wishlist)

This implements Universal Transformers (Dehghani et al., ICLR 2019), which
are explicitly listed on the challenge wishlist. The key idea: K=6 unique blocks
applied R=4 times each = 24 effective layers at the parameter cost of 6.

Architecture summary

Depth recurrence: 6 unique blocks × 4 steps = 24 effective layers
FiLM conditioning: each block learns step-specific (γ, β) ∈ R^[4×512]
so shared weights can express different transformations at each depth
U-Net skips adapted for the recurrence loop (first 12 steps store, last 12 pop)
BigramHash(2048, 512): bigram context embeddings for cheap local structure
LeakyReLU(0.5)² activations (more expressive than ReLU² for weight-tied models)
INT6 QAT with STE starting after 10% of training
GPTQ-style per-row optimal clipping (grid search over {85,90,95,100}th pct)
Muon WD=0.04, 4-hour wallclock budget

Results

Pending full compute run. Smoke test (50 iterations, single T4) confirms:

code compiles and runs
loss decreases
artifact ≤ 16 MB

…[non_record_16mb, DRAFT]

Three teams proposed iteration embeddings (openai#1552, openai#1554, openai#1640) — all open, no results yet. MLP-only loop confirmed novel across all ~300 PRs scanned. Residual 1/L partially covered by frozen alpha in openai#1779 baseline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

submission: Universal Transformer depth recurrence + FiLM + INT6 QAT …

4abc20c

…[non_record_16mb, DRAFT]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Non-record] Universal Transformer Depth Recurrence INT6#1640

[Non-record] Universal Transformer Depth Recurrence INT6#1640
thestbobo wants to merge 1 commit intoopenai:mainfrom
thestbobo:submission/ut-depth-recurrence-int6-film

thestbobo commented Apr 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thestbobo commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Universal Transformer with Depth Recurrence (non-record track, 16 MB)

Why this is interesting (per the repo wishlist)

Architecture summary

Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thestbobo commented Apr 15, 2026 •

edited

Loading