Skip to content

Commit aadd830

Browse files
Record: SmearGate BOS Fix 3-Seed Reproduction — val_bpb 1.06145
3-seed reproduction of PR #1851 (SmearGate BOS document boundary fix). Code is byte-identical to #1851 by @aquariouseworkman. Results (post-TTT BPB): Seed 42: 1.06128 (original #1851 author) Seed 314: 1.06087 (this submission) Seed 1234: 1.06220 (this submission) Mean: 1.06145 ± 0.00068 All artifacts < 16,000,000 bytes. All runs < 600s. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 7427de2 commit aadd830

8 files changed

Lines changed: 13527 additions & 0 deletions

File tree

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Record: SmearGate BOS Fix — 3-Seed Reproduction of PR #1851
2+
3+
**val_bpb = 1.06145** (3-seed mean, std 0.00068) | **~15.95 MB** | 8xH100 SXM 80GB
4+
5+
## Summary
6+
7+
This is a **pure reproduction study** of [PR #1851](https://github.com/openai/parameter-golf/pull/1851) by @aquariouseworkman. The training script is byte-identical to the code in PR #1851. No new techniques or modifications are introduced.
8+
9+
PR #1851 submitted a single-seed result (seed 42, val_bpb = 1.06128). We extend this to a **3-seed evaluation** (seeds 42, 314, 1234) to confirm the result is robust and reproducible.
10+
11+
## 3-Seed Results
12+
13+
| Seed | Pre-Quant BPB | Quant BPB | **Post-TTT BPB** | Artifact (bytes) | Train Time | Eval Time |
14+
|------|---------------|-----------|-------------------|-------------------|------------|-----------|
15+
| 42* | 1.06490240 | 1.07405660 | **1.06128183** | 15,952,086 | 599.6s | 519.5s |
16+
| 314 | 1.06467893 | 1.07358634 | **1.06086831** | 15,952,419 | 599.6s | 525.6s |
17+
| 1234 | 1.06593114 | 1.07503808 | **1.06220261** | 15,952,690 | 599.5s | 479.6s |
18+
| **Mean ± Std** | | | **1.06145 ± 0.00068** | | | |
19+
20+
\* Seed 42 result is from the original PR #1851 author @aquariouseworkman. Seeds 314 and 1234 are independent runs by @Christopher-Lee-McClendon.
21+
22+
## Key Change: SmearGate BOS Document Boundary Fix
23+
24+
PR #1851 identified and fixed a bug in the SmearGate mechanism's handling of beginning-of-sequence (BOS) document boundaries. The fix ensures SmearGate correctly resets at document boundaries instead of bleeding attention across documents.
25+
26+
This was a targeted one-line fix on top of the PR #1787 codebase. Credit for identifying the BOS bug goes to @cocohearts; the fix implementation is by @aquariouseworkman.
27+
28+
## Technique Stack
29+
30+
All techniques below are inherited from PR #1851 (and its lineage). No new techniques are introduced in this reproduction.
31+
32+
| Technique | Source | Author |
33+
|-----------|--------|--------|
34+
| Base architecture (11L, MLP 4x, MuonEq-R) | PR #1787 | @nprime06 |
35+
| SmearGate attention | PR #1797 | @dexhunter |
36+
| SmearGate BOS fix | PR #1851 | @aquariouseworkman |
37+
| LQER Asymmetric quantization | PR #1797 | @dexhunter |
38+
| CaseOps SP8192 | PR #1729 | @romeerp |
39+
| GPTQ + SP8192 | PR #1394 | @clarkkev |
40+
| Score-first TTT (3 phases) | PR #549 | @abaybektursun |
41+
| BOS bug identification | Issue | @cocohearts |
42+
43+
## Architecture
44+
45+
Same as PR #1851 / PR #1787:
46+
- 11 transformer layers, MLP multiplier 4x
47+
- SmearGate attention with BOS boundary fix
48+
- LQER asymmetric quantization
49+
- CaseOps with SP8192 tokenization
50+
- GPTQ post-training quantization
51+
- Phased test-time training (3 phases)
52+
- Embed clipping (15.0σ), MLP clipping (12.0σ)
53+
- Embed bits: 7
54+
55+
## Compliance
56+
57+
| Budget | Limit | Worst-Case (across seeds) | Status |
58+
|--------|-------|--------------------------|--------|
59+
| Artifact size | 16,000,000 bytes | 15,952,690 bytes ||
60+
| Training time | 600s | 599.6s ||
61+
| Eval time | 600s | 525.6s ||
62+
63+
## Reproduction
64+
65+
The training script is byte-identical to PR #1851. To reproduce:
66+
67+
```bash
68+
# 1. Install dependencies
69+
pip install brotli python-minifier
70+
71+
# 2. Prepare CaseOps SP8192 data
72+
# Option A: Download pre-tokenized CaseOps data from HuggingFace
73+
python3 prepare_caseops_data.py # downloads from romeerp/parameter-golf-caseops-v1
74+
# Option B: Or use the standard data script
75+
MATCHED_FINEWEB_REPO_ID=kevclark/parameter-golf python3 data/cached_challenge_fineweb.py --variant sp8192 --skip-manifest
76+
# Then apply CaseOps transform:
77+
python3 lossless_caps.py # transforms shards with CaseOps encoding
78+
79+
# 3. Run training (replace SEED with 42, 314, or 1234)
80+
SEED=42 \
81+
CASEOPS_ENABLED=1 \
82+
EMBED_BITS=7 \
83+
SMEAR_GATE_ENABLED=1 \
84+
SPARSE_ATTN_GATE_ENABLED=1 \
85+
MIN_LR=0.1 \
86+
EMBED_CLIP_SIGMAS=15.0 \
87+
MLP_CLIP_SIGMAS=12.0 \
88+
GPTQ_RESERVE_SECONDS=0.5 \
89+
PHASED_TTT_NUM_PHASES=3 \
90+
torchrun --standalone --nproc_per_node=8 train_gpt.py
91+
```
92+
93+
**Environment variables (all required for exact reproduction):**
94+
95+
| Variable | Value | Purpose |
96+
|----------|-------|---------|
97+
| `CASEOPS_ENABLED` | `1` | Enable CaseOps SP8192 tokenization |
98+
| `EMBED_BITS` | `7` | Embedding quantization bits |
99+
| `SMEAR_GATE_ENABLED` | `1` | Enable SmearGate attention |
100+
| `SPARSE_ATTN_GATE_ENABLED` | `1` | Enable sparse attention gating |
101+
| `MIN_LR` | `0.1` | Minimum learning rate |
102+
| `EMBED_CLIP_SIGMAS` | `15.0` | Embedding clipping threshold (σ) |
103+
| `MLP_CLIP_SIGMAS` | `12.0` | MLP clipping threshold (σ) |
104+
| `GPTQ_RESERVE_SECONDS` | `0.5` | Seconds reserved for GPTQ |
105+
| `PHASED_TTT_NUM_PHASES` | `3` | Number of TTT phases |
106+
107+
**Hardware:** 8×H100 SXM 80GB (RunPod)
108+
109+
## Credits
110+
111+
- **@aquariouseworkman** — PR #1851 author (SmearGate BOS fix, seed 42 result)
112+
- **@nprime06** — PR #1787 (base architecture)
113+
- **@romeerp** — PR #1729 (CaseOps)
114+
- **@dexhunter** — PR #1797 (SmearGate + LQER asymmetric quantization)
115+
- **@cocohearts** — BOS document boundary bug identification
116+
- **@abaybektursun** — PR #549 (score-first TTT)
117+
- **@clarkkev** — PR #1394 (GPTQ + SP8192)

0 commit comments

Comments
 (0)