Skip to content

Non Record: Add PPM heuristic for test time learning #511

Open
AnirudhRahul wants to merge 2 commits intoopenai:mainfrom
AnirudhRahul:record-delayed-ppm-k15
Open

Non Record: Add PPM heuristic for test time learning #511
AnirudhRahul wants to merge 2 commits intoopenai:mainfrom
AnirudhRahul:record-delayed-ppm-k15

Conversation

@AnirudhRahul
Copy link
Copy Markdown

@AnirudhRahul AnirudhRahul commented Mar 23, 2026

Summary

Added records/track_10min_16mb/2026-03-23_10L_Int5MLP_DelayedPPM_K15 with a self-contained snapshot:

  • train_gpt.py
  • trie_bench.c
  • submission.json
  • 3 seed logs

This is a test-time-only improvement on top of the existing 10L Int5-MLP + BigramHash(10240) model, not a training-time architecture change.

Method

This submission adds delayed outside-context-only PPM at evaluation time.

PPM (Prediction by Partial Matching) is a compression-style backoff n-gram model that boosts a continuation token when the same token prefix has appeared earlier in the document. It prefers longer matches and backs off to shorter ones when confidence is insufficient.

The important constraint is a 2048-token delay: at position i, the PPM bank only contains targets from positions <= i - 2048. That means it cannot reuse anything still visible inside the transformer’s current sliding-window context.

So the division of labor is:

  • transformer: local context
  • delayed PPM: longer-range repeated-sequence signal from outside the local window

Fixed inference config

  • k = [16, 12, 8, 6]
  • min_confidence = [1.0, 1.0, 1.0, 0.95]
  • min_count = [1, 1, 1, 1]
  • K = 15
  • delay = 2048
  • bos_id = 1

Per-seed results

Seed Baseline val_bpb Delayed PPM val_bpb Delta
42 1.14253746 1.14125711 -0.00128035
1337 1.14387335 1.14262486 -0.00124849
2024 1.14257402 1.14132993 -0.00124409
Mean 1.14299494 1.14173730 -0.00125764
Std 0.00076094 0.00076951 0.00001979

All 3 seeds improved, and all 3 runs remained under the 16MB submission limit.

Document the fixed K=15 outside-context-only PPM method with 3-seed validation logs, metadata, and a reproducible record-folder code snapshot.

Made-with: Cursor
@AnirudhRahul AnirudhRahul changed the title Add delayed PPM K=15 record for 10L Int5-MLP Add PPM model for test time learning Mar 23, 2026
@AnirudhRahul AnirudhRahul changed the title Add PPM model for test time learning Add PPM heuristic for test time learning Mar 23, 2026
Fold the base script into a single self-contained record train_gpt.py and drop the extra base_train_gpt.py dependency while keeping the fixed K=15 delayed PPM config.

Made-with: Cursor
@AnirudhRahul AnirudhRahul changed the title Add PPM heuristic for test time learning Record: Add PPM heuristic for test time learning Mar 23, 2026
@AnirudhRahul AnirudhRahul changed the title Record: Add PPM heuristic for test time learning Non Record: Add PPM heuristic for test time learning Mar 24, 2026
@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Non Record: Add PPM heuristic for test time learning

BPB: (not parsed — see PR title) | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

What I found in the code (head SHA 0bfdfe4453bd, file records/track_10min_16mb/2026-03-23_10L_Int5MLP_DelayedPPM_K15/train_gpt.py):

Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.28s, dim=512, layers=10, vocab=1024, code=64312 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline.

Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.28s, dim=512, layers=10, vocab=1024, code=64312 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants