Experiment: SmearGate BOS Fix + train-only logit calibration by someone114514 · Pull Request #1884 · openai/parameter-golf

someone114514 · 2026-04-28T05:32:32Z

Summary

Experimental variant of #1868 / #1851 SmearGate BOS Fix that adds a fixed post-GPTQ logit calibration pass.

The added calibration is deliberately small and train-only:

global logit temperature
coarse token-group bias buckets: byte length, starts-with-space, newline, digit, punctuation, alpha/case
no validation-derived fitting state
no frequency buckets by default
frozen after fitting, then applied before softmax in quantized diagnostic eval and phased score-first TTT

This is intended as a direct test of whether the post-GPTQ calibration signal observed locally transfers to the stronger #1868 stack.

Controls

Defaults added in this branch:

LOGIT_CALIB_ENABLED=1
LOGIT_CALIB_TOKENS=100000
LOGIT_CALIB_STRIDE=64
LOGIT_CALIB_BATCH_SEQS=8
LOGIT_CALIB_LR=0.003
LOGIT_CALIB_L2=0.01
LOGIT_CALIB_EPOCHS=1
LOGIT_CALIB_APPLY_TTT_UPDATE=1

Set LOGIT_CALIB_ENABLED=0 to recover the original #1868 behavior.

Legality / causality

Calibration is fitted only from training-token shards after GPTQ. It does not read validation targets or build validation-time state. At validation time the correction is a fixed affine transformation of logits before normal softmax, so the distribution remains normalized.

Status

No new 8xH100 score yet. This branch is prepared for a direct single-seed run against the #1868 reproduction command.

@aquariouseworkman

3-seed reproduction of PR openai#1851 (SmearGate BOS document boundary fix). Code is byte-identical to openai#1851 by @aquariouseworkman. Results (post-TTT BPB): Seed 42: 1.06128 (original openai#1851 author) Seed 314: 1.06087 (this submission) Seed 1234: 1.06220 (this submission) Mean: 1.06145 ± 0.00068 All artifacts < 16,000,000 bytes. All runs < 600s. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ams) After 4 parallel research agents reviewed 30+ open PRs and compliance issues, two new findings: 1. PR openai#1923 (AsymLogit) flagged "empirical negative" by sunnypatneedi 4-29 frontier-scan, BUT only on PR openai#1855 base with default WD=1.0. Never tested on PR openai#1908 + WD=2.0 combo. V19's specific stack is NOT directly invalidated. 2. PR openai#1925 simon-marcus 1.06049 (3-seed verified, vs PR openai#1855 base 1.06108 = -0.00059 BPB). Just 2 hparam env vars: MATRIX_LR 0.026 -> 0.028 PHASED_TTT_PREFIX_DOCS 2500 -> 3500 Orthogonal axis to AsymLogit (LR/TTT prefix vs logit head). Adds two new scout scripts: - run_v19c_stacked_scout.sh: PR openai#1908 + AsymLogit + simon-marcus + WD=2.0 (full stack, recommended first scout) - run_v19b_simonmarcus_scout.sh: PR openai#1908 + simon-marcus + WD=2.0 (ablation if V19c wins partially) Decision rule (CaseOps val baseline 0.97651, community floor 0.0006): V19c < 0.97591 -> CLEAR WIN, run 3-seed V19c 0.97591-0.9755 -> borderline, ablate via V19a/V19b V19c > 0.9755 -> abandon stack, try Lead B (PR openai#1884) Other research findings: - PR openai#1898 SpinQuant flagged regression vs parent openai#1851 (skip) - PR openai#1929 SLOT banned per openai#1722 precedent - PR openai#1911 pre-quant TTT chain banned per openai#1735 precedent - cocohearts 4-28 PR openai#1902 confirmed PR openai#1855 as official openai#1 - regina-openai + Alex Zhao 48h zero activity - CaseOps de-facto legal (PR openai#1855 merged into chain)

Christopher-Lee-McClendon and others added 6 commits April 27, 2026 15:19

Add train-only logit calibration to SmearGate BOS stack

187282c

Fix CaseOps data preparation instructions

e0b0f7f

Add scripts for CaseOps download and calibrated run

bcca52d

Tune post-GPTQ MLP clipping for SmearGate

c13c514

Restore size-safe SmearGate defaults

ae7023d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: SmearGate BOS Fix + train-only logit calibration#1884

Experiment: SmearGate BOS Fix + train-only logit calibration#1884
someone114514 wants to merge 6 commits intoopenai:mainfrom
someone114514:smeargate-calibration-1868

someone114514 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

someone114514 commented Apr 28, 2026

Summary

Controls

Legality / causality

Status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants