Skip to content

Commit 67b952d

Browse files
Octavianclaude
andcommitted
Podracing III: Cubric Lite — 0.9362 mean BPB (3-seed)
Per-order adaptive alpha scaling on legal score-first 7-gram backoff. Tracks per-order beat rate on already-scored tokens, suppresses noisy low orders (2-3 → 0.3x alpha), boosts accurate high orders (5-7 → 2.0x). Results (seeds 2045/43/300): Sliding BPB (no n-gram): 1.1198 mean Cubric n-gram BPB: 0.9362 mean (0.9357/0.9362/0.9365) Artifact: 15.59 MB (int6+zstd) 0.026 BPB improvement over Podracing II (openai#753, 0.9625). Original contribution: per-order adaptive alpha scaling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent ebda3af commit 67b952d

6 files changed

Lines changed: 2422 additions & 0 deletions

File tree

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Podracing III: Cubric Lite
2+
3+
## Results
4+
5+
| Seed | Sliding BPB | Cubric N-gram BPB | Artifact |
6+
|------|-------------|-------------------|----------|
7+
| 2045 | 1.1193 | **0.9357** | 15.59 MB |
8+
| 43 | 1.1200 | **0.9362** | 15.58 MB |
9+
| 300 | 1.1202 | **0.9365** | 15.58 MB |
10+
| **Mean** | **1.1198** | **0.9362** ||
11+
12+
## What Changed vs Podracing II (#753)
13+
14+
One eval-time improvement, no training changes:
15+
16+
1. **Per-order adaptive alpha scaling ("Cubric Lite")**: Track how often each n-gram order's probability beats the model's probability on already-scored tokens. Every 32 batches, adjust per-order alpha multipliers. Orders that consistently beat the model get boosted (up to 2.0x), orders that consistently lose get suppressed (down to 0.3x).
17+
18+
**Learned multipliers (converged by step 48):**
19+
```
20+
o2:0.300 o3:0.300 o4:0.970 o5:2.000 o6:2.000 o7:2.000
21+
```
22+
23+
Key insight: bigrams and trigrams (orders 2-3) were actively harming BPB by injecting noisy predictions at the same alpha as high-order matches. Suppressing them to 30% of base alpha and boosting orders 5-7 to 200% = 0.026 BPB improvement over Podracing II (0.9625 → 0.9362).
24+
25+
## Compliance
26+
27+
- Score-first, backward-looking: n-gram cache built from already-scored tokens only
28+
- Alpha depends solely on model's own softmax entropy — no target/label access
29+
- Per-order multipliers use beat-rate statistics from already-scored tokens — same legality as the score-first table update
30+
- No oracle selection, no min-NLL comparison
31+
- GPTQ calibration runs inside training phase (before wallclock stop)
32+
- Cubric multiplier adaptation runs during eval, uses no training data
33+
34+
## Credits
35+
36+
- N-gram eval cache concept: @deanbrr (PR #659)
37+
- Multi-order backoff + adaptive alpha inspiration: @Asukabot0 (PR #727)
38+
- Per-order adaptive alpha scaling (Cubric Lite): @newjordan (original contribution)
39+
- Base architecture: @signalrush (PR #414)
40+
41+
## Reproduce
42+
43+
```bash
44+
SEED=2045 bash concepts/podracer/podracer_green/run.sh
45+
```
46+
47+
8xH100 SXM, 600s training + ~120s eval.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{
2+
"author": "Frosty40",
3+
"github_id": "newjordan",
4+
"name": "Podracing III: Cubric Lite — Per-Order Adaptive Alpha",
5+
"blurb": "11L/512d U-Net with legal score-first 7-gram backoff (orders 2-7) + entropy-adaptive alpha + per-order adaptive alpha scaling (Cubric Lite). Orders 2-3 suppressed (0.3x), orders 5-7 boosted (2.0x). 3-seed mean val_bpb=0.9362. N-gram concept credited to @deanbrr (PR #659).",
6+
"date": "2026-03-25T23:30:00Z",
7+
"val_loss": 1.5807,
8+
"val_bpb": 0.9362,
9+
"bytes_total": 15588220,
10+
"bytes_code": 100286
11+
}

0 commit comments

Comments
 (0)