Skip to content

Non-record: MDLM Diffusion — val_var_bpb 1.1465 (first diffusion to beat AR baseline)#1106

Merged
valerio-oai merged 1 commit intoopenai:mainfrom
agalimova:submission/mdlm-diffusion
May 3, 2026
Merged

Non-record: MDLM Diffusion — val_var_bpb 1.1465 (first diffusion to beat AR baseline)#1106
valerio-oai merged 1 commit intoopenai:mainfrom
agalimova:submission/mdlm-diffusion

Conversation

@agalimova
Copy link
Copy Markdown
Contributor

Summary

val_var_bpb: 1.1465 (512 eval steps) | 33M params | 2xH100 80GB HBM3 | Non-record

First discrete diffusion model to beat the AR baseline (1.22 BPB). Beats previous best diffusion (#820, 1.625; #1053, 1.360) by 0.21+ BPB.

Results

Model BPB
AR SOTA (merged #1) 1.1194
This (MDLM) 1.1465
AR baseline 1.2244
#1053 MDLM 1.360
#820 MDLM 1.625

Approach

MDLM (Sahoo et al. 2024) with log-linear noise, adaLN timestep conditioning, frozen visible-token logits, antithetic sampling, discrete absorbing-mask ELBO eval. 11L 512d, 6000 steps, AdamW.

Key findings from 27 experiments

  • Masking eps=0.1 >> 0.001 (biggest win)
  • Eval method matters: MC ELBO = 2.41 BPB, discrete ELBO = 1.15 (same model)
  • AR tricks that don't transfer: LeakyReLU^2, BigramHash

Hardware

Developed on GB10 (Project DIGITS). Validated on 2xH100 (TensorPool, 31 min). 8xH100 unavailable (#821). Extrapolated: ~8 min on 8xH100.

First discrete diffusion model to beat the AR baseline (1.22) in
parameter-golf. MDLM with log-linear noise, adaLN, frozen visible-token
logits, discrete ELBO eval. 27 hyperparameter experiments. Validated on
2xH100 (TensorPool), 31 min training.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
He-Wenhao pushed a commit to He-Wenhao/parameter-golf that referenced this pull request Apr 10, 2026
PR openai#1106 found eps=0.1 >> 0.001 was the single biggest improvement.
With eps=0.1, 10% of tokens remain visible at t=1, giving the model
anchors for denoising. Larger terminal KL but much easier task.

Also revert lr=1e-3, warmdown=1000 (v8's lr=2e-3 made artifact >16MB).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
He-Wenhao pushed a commit to He-Wenhao/parameter-golf that referenced this pull request Apr 10, 2026
- train_mdlm_combined.py: full MDLM training script (PR openai#1053 infra + PR openai#1106 MDLM + our innovations)
- sweep.sh/sweep2.sh: 12-experiment hyperparameter sweep (eps, arch, loss, seq_len)
- results.tsv: updated with v10-v13 experiments, corrected descriptions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@valerio-oai valerio-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Selected for the notable non-record submissions section.

@valerio-oai valerio-oai merged commit 16af8e1 into openai:main May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants