Skip to content

Commit 58ee03b

Browse files
Record: SmearGate BOS Fix + PR openai#1787 Base + Smear Gate + LQER Asymmetric + Phased TTT
val_bpb = 1.06128 | ~15.95 MB | 8xH100 SXM Key Change: SmearGate BOS Document Boundary Fix Builds on PR openai#1797 stack (PR openai#1787 base + SmearGate + LQER Asymmetric) but fixes the SmearGate cross-document leakage bug identified by @cocohearts in PR openai#1797 audit. The bug: SmearGate 1-token causal lookback does not mask BOS positions, so the final token of document N smears into BOS of document N+1. Credits @nprime06 -- PR openai#1787 base stack @romeerp -- CaseOps transform (PR openai#1729) @dexhunter -- SmearGate + LQER (PR openai#1797) @cocohearts -- Identifying SmearGate BOS bug @abaybektursun -- Score-first TTT (PR openai#549) @clarkkev -- GPTQ SDClip + SP8192 (PR openai#1394)
1 parent 3aface5 commit 58ee03b

7 files changed

Lines changed: 5482 additions & 0 deletions

File tree

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Record: SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT
2+
3+
**val_bpb = 1.06128** | **~15.95 MB** | 8xH100 SXM
4+
5+
## Result
6+
7+
| Seed | Pre-TTT BPB | Post-TTT BPB | Artifact (bytes) |
8+
|------|-------------|--------------|------------------|
9+
| 42 | 1.07406 | **1.06128** | 15,952,086 |
10+
11+
Merged SOTA (PR #1493): **1.0810 BPP**. Delta: **-0.0197 BPP**. Clears the 0.005-nat threshold.
12+
13+
## Key Change: SmearGate BOS Document Boundary Fix
14+
15+
Builds on PR #1797 stack (PR #1787 base + SmearGate + LQER Asymmetric) but fixes the **SmearGate cross-document leakage bug** identified by @cocohearts in PR #1797 audit.
16+
17+
The bug: SmearGate 1-token causal lookback does not mask BOS positions, so the final token of document N smears into BOS of document N+1.
18+
19+
The fix (applied in both forward_logits and forward_ttt):
20+
21+
bos_mask = (input_ids[:, 1:] == 1).unsqueeze(-1)
22+
g = g.masked_fill(bos_mask, 0.0)
23+
24+
## Technique Stack
25+
26+
| Component | Origin |
27+
|-----------|--------|
28+
| CaseOps bijective case transform | PR #1729 / PR #1736 |
29+
| SparseAttnGate | PR #1787 (nprime06) |
30+
| SmearGate + BOS fix | PR #1797 + this submission |
31+
| LQER asymmetric rank-4 | PR #1797 |
32+
| Phased TTT (score-first, 3 phases) | PR #1394 / PR #1736 |
33+
| PolarNS + MIN_LR=0.1 + FusedCE | PR #1787 |
34+
| Full Hessian GPTQ + Brotli | PR #1019 / PR #1530 |
35+
36+
## Architecture
37+
38+
11L x 512d x 8H/4KV, MLP 4x, LeakyReLU(0.5)^2, Partial RoPE (16/64 dims), layerwise LN scale, tied embeddings, logit softcap=30.0. Depth recurrence: layers 3-5 looped x2 (activated at frac=0.35). Parallel residuals from layer 8. XSA on all 11 layers. SmearGate window=12.
39+
40+
## Compliance
41+
42+
- Artifact <= 16,000,000 bytes: 15,952,086 bytes
43+
- train_time <= 600s: 599.6s
44+
- eval_time <= 600s: 519.5s
45+
- Issue #1017 Conditions 1-4: All satisfied. SmearGate BOS mask ensures no cross-document leakage.
46+
47+
## Credits
48+
49+
- @nprime06 -- PR #1787 base stack
50+
- @romeerp -- CaseOps transform (PR #1729)
51+
- @dexhunter -- SmearGate + LQER (PR #1797)
52+
- @cocohearts -- Identifying SmearGate BOS bug
53+
- @abaybektursun -- Score-first TTT (PR #549)
54+
- @clarkkev -- GPTQ SDClip + SP8192 (PR #1394)

0 commit comments

Comments
 (0)