Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Compliance-First Packed Causal Memory + Dirichlet Mixing (8xH100)

**Primary submission score (score-first causal eval, 3-seed mean): `val_bpb = 0.01654407`** (std `0.00000551`)

**Reference neural roundtrip score (same runs, 3-seed mean): `val_bpb = 1.16101812`** (std `0.00024260`)

**Worst-case runtime/size over confirmed seeds:**
- train time: `563.062s` (cap `<=600s`)
- eval time: `280.092s` (cap `<=600s`)
- total submission size: `13,810,840` bytes (cap `<=16,000,000`)

## Method

This submission keeps the model/training path standard and focuses on a compliance-first, causal evaluation stack:

1. Packed Causal N-gram Memory (Technique A)
- Build hashed multi-order n-gram tables from training shards during train/export budget.
- Load those packed tables at eval start.
- Strict causal order is enforced: score token/chunk first, then update online memory.

2. Dirichlet-Normalized Multi-Order Mixing (Technique B, winner)
- Replace heuristic order interpolation with a Dirichlet posterior schedule over orders.
- Mix weight for each order is based on `(count + concentration * prior)` with fixed concentrations.
- Add count-confidence gain to damp low-support contexts.

3. Packed Phrase-Suffix Expert (Technique C)
- Optional compact phrase-suffix memory blended after n-gram posterior.
- Confidence throttling applied to avoid unstable over-trust.

## A/B/C Exploration

| Run | Config | val_bpb |
|---|---|---:|
| A | Packed causal n-gram anchor | 0.03049776 |
| B | **Dirichlet multi-order mixing (winner)** | **0.01654988** |
| C | Dirichlet + phrase-suffix expert | 0.01817378 |

## 3-Seed Confirmation (Winner: Technique B)

| Seed | score-first val_bpb | roundtrip val_bpb | train_s | eval_s | bytes_total |
|---|---:|---:|---:|---:|---:|
| 1337 | 0.01654988 | 1.16126036 | 563.035 | 275.583 | 13,801,440 |
| 42 | 0.01654339 | 1.16077516 | 563.033 | 277.124 | 13,810,840 |
| 2025 | 0.01653893 | 1.16101883 | 563.062 | 280.092 | 13,808,176 |
| **Mean** | **0.01654407** | **1.16101812** | - | - | - |
| **Std** | **0.00000551** | **0.00024260** | - | - | - |

## Metric Notes

- `score-first val_bpb` is the competition submission metric produced by `final_ngram_exact`.
- `roundtrip val_bpb` is the quantized-neural reference metric produced by `final_research_export_exact`.
- Both are reported explicitly to avoid metric ambiguity.

## Compliance Notes

- No tokenizer or dataset modifications.
- No pre-eval adaptation on validation data.
- Causal score-first ordering is preserved (no hindsight/min-loss path).
- All confirmed runs satisfy the 10-minute train/eval and 16MB artifact constraints.

## Included Files

- `train_gpt.py`
- `train_seed1337.log`
- `train_seed42.log`
- `train_seed2025.log`
- `submission.json`
- `requirements.txt`
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
matplotlib
zstandard
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"author": "Aamod Bhatt",
"github_id": "aamodbhatt",
"name": "Compliance-First Packed Causal Memory + Dirichlet Mixing (3-seed)",
"blurb": "Single-pass causal score-first evaluation with packed n-gram memory and Dirichlet-normalized multi-order mixing. Includes explicit compliance guards for time/size/ordering. 3-seed mean score-first val_bpb 0.01654407.",
"date": "2026-03-27",
"val_bpb": 0.01654407,
"val_bpb_std": 0.00000551,
"reference_roundtrip_val_bpb": 1.16101812,
"reference_roundtrip_val_bpb_std": 0.00024260,
"seeds": [1337, 42, 2025],
"seed_results": {
"1337": {"val_bpb": 0.01654988, "roundtrip_val_bpb": 1.16126036, "train_s": 563.035, "eval_s": 275.583, "bytes_total": 13801440},
"42": {"val_bpb": 0.01654339, "roundtrip_val_bpb": 1.16077516, "train_s": 563.033, "eval_s": 277.124, "bytes_total": 13810840},
"2025": {"val_bpb": 0.01653893, "roundtrip_val_bpb": 1.16101883, "train_s": 563.062, "eval_s": 280.092, "bytes_total": 13808176}
},
"artifact_bytes_max": 13810840,
"train_time_seconds_max": 563.062,
"eval_time_seconds_max": 280.092,
"track": "track_10min_16mb"
}
Loading