vaibhav-i
diff --git a/‎experiments/exp_causal_slot/README.md‎
Lines changed: 16 additions & 0 deletions b/‎experiments/exp_causal_slot/README.md‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎experiments/exp_causal_slot/train_gpt.py‎
Lines changed: 1525 additions & 0 deletions b/‎experiments/exp_causal_slot/train_gpt.py‎
Lines changed: 1525 additions & 0 deletions
diff --git a/‎experiments/exp_log_bias/README.md‎
Lines changed: 24 additions & 0 deletions b/‎experiments/exp_log_bias/README.md‎
Lines changed: 24 additions & 0 deletions
@@ -0,0 +1,16 @@
+# exp_causal_slot: Causal SLOT Eval Adaptation
+
+Base: PR #1394 (clarkkev SP8192 + SDClip, 1.08563 BPB)
+
+## Change
+Adds causal SLOT eval-time adaptation (PR #1333 approach, context-only delta optimization).
+Per-window: optimize delta [1,1,dim] on context tokens (AdamW, 16 steps), score stride tokens with delta.
+Weights frozen. Delta re-initialized per window. Single left-to-right pass.
+
+## Expected gain
+−0.013 BPB (confirmed on SP4096 stack, PR #1333). May differ on SP8192 base.
+
+## Run
+SLOT_ENABLED=1 SLOT_STEPS=16 SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py
+# Without SLOT (baseline):
+SLOT_ENABLED=0 SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py
@@ -0,0 +1,24 @@
+# exp_log_bias: Streaming Online Log-Bias (Nacrith)
+
+Base: PR #1394 (clarkkev SP8192 + SDClip, 1.08563 BPB)
+
+## Change
+Adds streaming online log-bias correction at eval time (arXiv:2602.19626, Tacconelli 2026).
+Zero artifact cost. Strictly causal. Single pass.
+
+Mechanism: maintain b ∈ R^vocab. Before each token: logits += b.
+After each token: b += lr * (one_hot(x_t) - softmax(logits+b)).
+lr=0.001, no momentum, no reset across windows.
+
+## Expected gain
+−0.015 BPB (confirmed on enwik8 in Nacrith paper). Untested in competition.
+
+## Run
+LOG_BIAS_ENABLED=1 LOG_BIAS_LR=0.001 SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py
+# Without log-bias (baseline):
+LOG_BIAS_ENABLED=0 SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py
+
+## Ablations to try
+LOG_BIAS_LR=0.0001   # slower adaptation
+LOG_BIAS_LR=0.01     # faster adaptation (may overshoot)
+LOG_BIAS_RESET=1     # reset b per window (weaker but safer)