Non Record: Add PPM heuristic for test time learning #511
Non Record: Add PPM heuristic for test time learning #511AnirudhRahul wants to merge 2 commits intoopenai:mainfrom
Conversation
Document the fixed K=15 outside-context-only PPM method with 3-seed validation logs, metadata, and a reproducible record-folder code snapshot. Made-with: Cursor
Fold the base script into a single self-contained record train_gpt.py and drop the extra base_train_gpt.py dependency while keeping the fixed K=15 delayed PPM config. Made-with: Cursor
Community Review — Non Record: Add PPM heuristic for test time learningBPB: (not parsed — see PR title) | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache What I found in the code (head SHA Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.28s, dim=512, layers=10, vocab=1024, code=64312 B, SMOKE_TEST_PASS Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline. Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.28s, dim=512, layers=10, vocab=1024, code=64312 B, SMOKE_TEST_PASS. Classification via deterministic AST-based |
Summary
Added
records/track_10min_16mb/2026-03-23_10L_Int5MLP_DelayedPPM_K15with a self-contained snapshot:train_gpt.pytrie_bench.csubmission.jsonThis is a test-time-only improvement on top of the existing 10L Int5-MLP + BigramHash(10240) model, not a training-time architecture change.
Method
This submission adds delayed outside-context-only PPM at evaluation time.
PPM (Prediction by Partial Matching) is a compression-style backoff n-gram model that boosts a continuation token when the same token prefix has appeared earlier in the document. It prefers longer matches and backs off to shorter ones when confidence is insufficient.
The important constraint is a 2048-token delay: at position
i, the PPM bank only contains targets from positions<= i - 2048. That means it cannot reuse anything still visible inside the transformer’s current sliding-window context.So the division of labor is:
Fixed inference config
k = [16, 12, 8, 6]min_confidence = [1.0, 1.0, 1.0, 0.95]min_count = [1, 1, 1, 1]K = 15delay = 2048bos_id = 1Per-seed results
All 3 seeds improved, and all 3 runs remained under the 16MB submission limit.