Skip to content

[Non-record] Universal Transformer Depth Recurrence INT6#1640

Draft
thestbobo wants to merge 1 commit intoopenai:mainfrom
thestbobo:submission/ut-depth-recurrence-int6-film
Draft

[Non-record] Universal Transformer Depth Recurrence INT6#1640
thestbobo wants to merge 1 commit intoopenai:mainfrom
thestbobo:submission/ut-depth-recurrence-int6-film

Conversation

@thestbobo
Copy link
Copy Markdown

@thestbobo thestbobo commented Apr 15, 2026

Universal Transformer with Depth Recurrence (non-record track, 16 MB)

Status: Draft — architecture submitted, awaiting 8×H100 compute run.
Requesting compute credits to run the full 4-hour training.

Why this is interesting (per the repo wishlist)

This implements Universal Transformers (Dehghani et al., ICLR 2019), which
are explicitly listed on the challenge wishlist. The key idea: K=6 unique blocks
applied R=4 times each = 24 effective layers at the parameter cost of 6.

Architecture summary

  • Depth recurrence: 6 unique blocks × 4 steps = 24 effective layers
  • FiLM conditioning: each block learns step-specific (γ, β) ∈ R^[4×512]
    so shared weights can express different transformations at each depth
  • U-Net skips adapted for the recurrence loop (first 12 steps store, last 12 pop)
  • BigramHash(2048, 512): bigram context embeddings for cheap local structure
  • LeakyReLU(0.5)² activations (more expressive than ReLU² for weight-tied models)
  • INT6 QAT with STE starting after 10% of training
  • GPTQ-style per-row optimal clipping (grid search over {85,90,95,100}th pct)
  • Muon WD=0.04, 4-hour wallclock budget

Results

Pending full compute run. Smoke test (50 iterations, single T4) confirms:

  • code compiles and runs
  • loss decreases
  • artifact ≤ 16 MB

leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 26, 2026
Three teams proposed iteration embeddings (openai#1552, openai#1554, openai#1640) — all open,
no results yet. MLP-only loop confirmed novel across all ~300 PRs scanned.
Residual 1/L partially covered by frozen alpha in openai#1779 baseline.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant