Record: SP8192 + BOS-Fix SmearGate + LQER Asym + Phased TTT (10L) — val_bpb 1.07171 by wfproc · Pull Request #2072 · openai/parameter-golf

wfproc · 2026-05-01T02:24:04Z

Summary

Applies the full SOTA stack from PR #1851 (BOS-fixed SmearGate + LQER Asymmetric + Phased TTT + layer looping) with the SP8192 tokenizer. Uses 10 transformer layers instead of 11 to fit the larger 8192-vocab embedding table under the 16MB artifact limit with brotli compression.

val_bpb: 1.07171 | 15.37 MB | 8xH100 SXM, 596s | Seed 314

Results

Metric	Value
Pre-quant val_bpb	1.07399
Post-GPTQ val_bpb	1.08251
Post-TTT val_bpb	1.07171
Artifact size	15,373,365 bytes
Training steps	5,218
Training time	596s

Changes vs PR #1851

SP8192 tokenizer instead of SP1024
10 layers instead of 11 (required to fit under 16MB with 8192-vocab embedding)
All other settings identical: BOS-fixed SmearGate, GPTQ int6 + LQER Asymmetric, Phased TTT (1 phase, 2000 prefix docs), layer looping, SparseAttnGate

Run Command

TORCHINDUCTOR_CACHE_DIR=/workspace/inductor_cache \
RUN_ID=sota_sp8192_10L SEED=314 VOCAB_SIZE=8192 NUM_LAYERS=10 \
SMEAR_GATE_ENABLED=1 SPARSE_ATTN_GATE_ENABLED=1 SPARSE_ATTN_GATE_SCALE=0.5 \
LQER_ENABLED=1 LQER_ASYM_ENABLED=1 \
MLP_CLIP_SIGMAS=11.5 EMBED_CLIP_SIGMAS=14.0 \
WARMDOWN_FRAC=0.85 MIN_LR=0.1 \
torchrun --standalone --nproc_per_node=8 train_gpt.py

Lineage

Built on PR #1851 (@aquariouseworkman). SP8192 data from sproos/parameter-golf-tokenizers.

Research contribution: confirmed torch.compile constant-folds Late QAT in openai#315-derived code, tested tensor-scale STE fix, swept 7 untried techniques from recent papers. All negative on 1xH100. Includes anti-layer diagnostic, prune-then-quantize, and spectral SVD compression implementations as env var toggles.

…submission/qat-deadcode-analysis

…val_bpb 1.07171 Full PR openai#1851 SOTA stack with SP8192 tokenizer (10 layers to fit 16MB limit).

wfproc added 3 commits March 28, 2026 19:41

Merge branch 'main' of https://github.com/openai/parameter-golf into …

97a4505

…submission/qat-deadcode-analysis

[Record] SP8192 + BOS-Fix SmearGate + LQER Asym + Phased TTT (10L) — …

915286f

…val_bpb 1.07171 Full PR openai#1851 SOTA stack with SP8192 tokenizer (10 layers to fit 16MB limit).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP8192 + BOS-Fix SmearGate + LQER Asym + Phased TTT (10L) — val_bpb 1.07171#2072

Record: SP8192 + BOS-Fix SmearGate + LQER Asym + Phased TTT (10L) — val_bpb 1.07171#2072
wfproc wants to merge 3 commits intoopenai:mainfrom
wfproc:submission/sp8192-sota-10l-ttt-1.07171

wfproc commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wfproc commented May 1, 2026

Summary

Results

Changes vs PR #1851

Run Command

Lineage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant