Skip to content

Record: SP8192 + BOS-Fix SmearGate + LQER Asym + Phased TTT (10L) — val_bpb 1.07171#2072

Open
wfproc wants to merge 3 commits intoopenai:mainfrom
wfproc:submission/sp8192-sota-10l-ttt-1.07171
Open

Record: SP8192 + BOS-Fix SmearGate + LQER Asym + Phased TTT (10L) — val_bpb 1.07171#2072
wfproc wants to merge 3 commits intoopenai:mainfrom
wfproc:submission/sp8192-sota-10l-ttt-1.07171

Conversation

@wfproc
Copy link
Copy Markdown

@wfproc wfproc commented May 1, 2026

Summary

Applies the full SOTA stack from PR #1851 (BOS-fixed SmearGate + LQER Asymmetric + Phased TTT + layer looping) with the SP8192 tokenizer. Uses 10 transformer layers instead of 11 to fit the larger 8192-vocab embedding table under the 16MB artifact limit with brotli compression.

val_bpb: 1.07171 | 15.37 MB | 8xH100 SXM, 596s | Seed 314

Results

Metric Value
Pre-quant val_bpb 1.07399
Post-GPTQ val_bpb 1.08251
Post-TTT val_bpb 1.07171
Artifact size 15,373,365 bytes
Training steps 5,218
Training time 596s

Changes vs PR #1851

  • SP8192 tokenizer instead of SP1024
  • 10 layers instead of 11 (required to fit under 16MB with 8192-vocab embedding)
  • All other settings identical: BOS-fixed SmearGate, GPTQ int6 + LQER Asymmetric, Phased TTT (1 phase, 2000 prefix docs), layer looping, SparseAttnGate

Run Command

TORCHINDUCTOR_CACHE_DIR=/workspace/inductor_cache \
RUN_ID=sota_sp8192_10L SEED=314 VOCAB_SIZE=8192 NUM_LAYERS=10 \
SMEAR_GATE_ENABLED=1 SPARSE_ATTN_GATE_ENABLED=1 SPARSE_ATTN_GATE_SCALE=0.5 \
LQER_ENABLED=1 LQER_ASYM_ENABLED=1 \
MLP_CLIP_SIGMAS=11.5 EMBED_CLIP_SIGMAS=14.0 \
WARMDOWN_FRAC=0.85 MIN_LR=0.1 \
torchrun --standalone --nproc_per_node=8 train_gpt.py

Lineage

Built on PR #1851 (@aquariouseworkman). SP8192 data from sproos/parameter-golf-tokenizers.

wfproc added 3 commits March 28, 2026 19:41
Research contribution: confirmed torch.compile constant-folds Late QAT
in openai#315-derived code, tested tensor-scale STE fix, swept 7 untried
techniques from recent papers. All negative on 1xH100. Includes
anti-layer diagnostic, prune-then-quantize, and spectral SVD compression
implementations as env var toggles.
…val_bpb 1.07171

Full PR openai#1851 SOTA stack with SP8192 tokenizer (10 layers to fit 16MB limit).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant