Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# 10L Int5-MLP + BigramHash(10240) + Delayed PPM K=15

**val_bpb: 1.14174** (mean of 3 seeds, post int5/int6+zstd roundtrip, sliding-window eval stride=32, delayed outside-context-only PPM)

This record keeps the same 10-layer Int5-MLP + BigramHash(10240) base model as `2026-03-20_10L_Int5MLP_MuonWD04_SWA50`, then adds a strictly causal delayed PPM bank at inference time. The PPM bank is only allowed to contain targets from outside the model's current `2048`-token context window.

## Key Result

- **Mean baseline val_bpb:** `1.14299494`
- **Mean delayed-PPM val_bpb:** `1.14173730`
- **Mean improvement:** `-0.00125764` bpb
- **PPM delta std across seeds:** `0.00001979`
- **Paired one-sided p-value (3 seeds):** `4.13e-05`
- **All 3 seeds improved**

## 3-Seed Results

| Seed | Baseline val_bpb | PPM val_bpb | Delta BPB | Total submission size | Valid |
|------|------------------|-------------|-----------|-----------------------|-------|
| 42 | 1.14253746 | 1.14125711 | -0.00128035 | 15,649,761 | yes |
| 1337 | 1.14387335 | 1.14262486 | -0.00124849 | 15,673,878 | yes |
| 2024 | 1.14257402 | 1.14132993 | -0.00124409 | 15,850,793 | yes |
| **Mean** | **1.14299494** | **1.14173730** | **-0.00125764** | | |
| **Std** | **0.00076094** | **0.00076951** | **0.00001979** | | |

## Method

### Outside-Context-Only Delayed PPM

The PPM bank is updated with a delay of `train_seq_len = 2048` tokens. At prediction position `i`, the bank only contains targets from positions `<= i - 2048`, so it cannot exploit anything already visible to the model inside the current sliding-window context.

This preserves the intended use case:

- The transformer handles the local `2048`-token window.
- The delayed PPM bank adds only longer-range repeated-sequence signal.

### Fixed Inference Configuration

- `k_values = [16, 12, 8, 6]`
- `min_confs = [1.0, 1.0, 1.0, 0.95]`
- `min_counts = [1, 1, 1, 1]`
- `boost_k = 15`
- `delay = 2048`
- `bos_id = 1`

`K=15` was selected from an initial seed-42 sweep, then reused unchanged for the validation seeds `1337` and `2024`.

## PPM Bank Stats

These phase-1 stats are identical across seeds because they depend only on the validation tokens and the delayed PPM config:

- Total hits: `631,838`
- Hit rate: `1.019%`
- Direct accuracy: `76.54%`

Per-level hit breakdown:

- `k=16`: `95,920` hits, `91.84%` direct accuracy
- `k=12`: `65,928` hits, `81.75%` direct accuracy
- `k=8`: `194,763` hits, `76.82%` direct accuracy
- `k=6`: `275,227` hits, `69.77%` direct accuracy

## Run Command

```bash
SEED=42 \
RUN_ID=ppm_k15_seed42 \
FINAL_EVAL_PPM=1 \
PPM_K_VALUES='16,12,8,6' \
PPM_MIN_CONFS='1.0,1.0,1.0,0.95' \
PPM_MIN_COUNTS='1,1,1,1' \
PPM_BOOST_K='15' \
PPM_DELAY='2048' \
torchrun --standalone --nproc_per_node=8 train_gpt.py
```

Repeat with `SEED=1337` and `SEED=2024` for the 3-seed validation above.

Files in this folder:

- `train_gpt.py` — self-contained delayed-PPM submission entrypoint
- `trie_bench.c` — C helper for delayed trie/PPM bank construction
- `train_seed42.log`, `train_seed1337.log`, `train_seed2024.log` — full training/eval logs
- `submission.json` — leaderboard metadata
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"name": "10L Int5-MLP + BigramHash(10240) + Delayed PPM K=15",
"author": "thwu1",
"github_id": "thwu1",
"date": "2026-03-23",
"val_bpb": 1.14173730,
"val_bpb_std": 0.00076951,
"baseline_val_bpb": 1.14299494,
"delta_bpb": -0.00125764,
"delta_bpb_std": 0.00001979,
"paired_p_value_one_sided": 0.00004125,
"bytes_total": 15850793,
"seeds": [42, 1337, 2024],
"seed_results": {
"42": {
"baseline_val_bpb": 1.14253746,
"val_bpb": 1.14125711,
"delta_bpb": -0.00128035,
"bytes_total": 15649761
},
"1337": {
"baseline_val_bpb": 1.14387335,
"val_bpb": 1.14262486,
"delta_bpb": -0.00124849,
"bytes_total": 15673878
},
"2024": {
"baseline_val_bpb": 1.14257402,
"val_bpb": 1.14132993,
"delta_bpb": -0.00124409,
"bytes_total": 15850793
}
},
"blurb": "Same 10L Int5-MLP + BigramHash(10240) base model, with a strictly causal delayed PPM bank added only at inference. The bank is delayed by 2048 tokens so it can only use targets outside the model's current context window. Fixed config: k_values=[16,12,8,6], min_confs=[1.0,1.0,1.0,0.95], min_counts=[1,1,1,1], boost_k=15. Mean over 3 seeds improves baseline by -0.00125764 bpb, and all seeds remain under the 16MB limit."
}
Loading