Skip to content

Commit 5943edd

Browse files
vaibhav-iclaude
andcommitted
Add new experiments: Polar Express NS, Causal SLOT, Streaming Log-Bias
New base: PR openai#1394 (clarkkev SP8192 + SDClip + GPTQ embeddings, 1.08563 BPB) Experiments (all build on new_base_pr1394): - exp_polar_express: 4-step minimax-optimal NS (arXiv:2505.16932), ~-0.002 BPB - exp_causal_slot: per-window delta on context tokens, AdamW 16 steps, ~-0.013 BPB - exp_log_bias: streaming online log-bias (Nacrith arXiv:2602.19626), ~-0.015 BPB Research briefs: - research/2026-04-04-full-scan-brief.md - research/2026-04-05-scan-brief.md (updated: pre-quant TTT ruled illegal) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 323982b commit 5943edd

9 files changed

Lines changed: 6753 additions & 0 deletions

File tree

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# exp_causal_slot: Causal SLOT Eval Adaptation
2+
3+
Base: PR #1394 (clarkkev SP8192 + SDClip, 1.08563 BPB)
4+
5+
## Change
6+
Adds causal SLOT eval-time adaptation (PR #1333 approach, context-only delta optimization).
7+
Per-window: optimize delta [1,1,dim] on context tokens (AdamW, 16 steps), score stride tokens with delta.
8+
Weights frozen. Delta re-initialized per window. Single left-to-right pass.
9+
10+
## Expected gain
11+
−0.013 BPB (confirmed on SP4096 stack, PR #1333). May differ on SP8192 base.
12+
13+
## Run
14+
SLOT_ENABLED=1 SLOT_STEPS=16 SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py
15+
# Without SLOT (baseline):
16+
SLOT_ENABLED=0 SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py

experiments/exp_causal_slot/train_gpt.py

Lines changed: 1525 additions & 0 deletions
Large diffs are not rendered by default.

experiments/exp_log_bias/README.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# exp_log_bias: Streaming Online Log-Bias (Nacrith)
2+
3+
Base: PR #1394 (clarkkev SP8192 + SDClip, 1.08563 BPB)
4+
5+
## Change
6+
Adds streaming online log-bias correction at eval time (arXiv:2602.19626, Tacconelli 2026).
7+
Zero artifact cost. Strictly causal. Single pass.
8+
9+
Mechanism: maintain b ∈ R^vocab. Before each token: logits += b.
10+
After each token: b += lr * (one_hot(x_t) - softmax(logits+b)).
11+
lr=0.001, no momentum, no reset across windows.
12+
13+
## Expected gain
14+
−0.015 BPB (confirmed on enwik8 in Nacrith paper). Untested in competition.
15+
16+
## Run
17+
LOG_BIAS_ENABLED=1 LOG_BIAS_LR=0.001 SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py
18+
# Without log-bias (baseline):
19+
LOG_BIAS_ENABLED=0 SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py
20+
21+
## Ablations to try
22+
LOG_BIAS_LR=0.0001 # slower adaptation
23+
LOG_BIAS_LR=0.01 # faster adaptation (may overshoot)
24+
LOG_BIAS_RESET=1 # reset b per window (weaker but safer)

0 commit comments

Comments
 (0)