Non-record: PrismLM v3 — DiffTransformer V2 + NorMuon + TrigramHash (val_bpb=1.1715)#418
Non-record: PrismLM v3 — DiffTransformer V2 + NorMuon + TrigramHash (val_bpb=1.1715)#418yashverms wants to merge 1 commit intoopenai:mainfrom
Conversation
…igramHash Three novel techniques on top of PR openai#315's stack: 1. DiffTransformer V2 attention (last 2 layers) for noise-cancelled attention 2. NorMuon optimizer with per-neuron row normalization 3. TrigramHash + context-aware n-gram gating 11L/512d, XSA6, Partial RoPE, int6+zstd-22. Post-quant val_bpb=1.1715 (without sliding window eval). 8xH100, 600s, 15.59MB artifact. Made-with: Cursor
Sibling Draft Review for PR #418Date: 2026-04-12 PR SummaryTitle: Non-record: PrismLM v3 — DiffTransformer V2 + NorMuon + TrigramHash (val_bpb=1.1715) Marker Analysis
Architecture Changestrain_gpt.py modifications: ~1544 lines changed Assessment⚠ REQUIRES_REVIEW - Custom modifications detected:
RecommendationREVIEW — Detailed code inspection needed before merge Review Checklist
Next Steps
Generated for: Mato (@MatoTeziTanka) Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE-pending standard record-track checks. Reviewed by @MatoTeziTanka — The Agora. Classification via sibling-session agent (Haiku-backed). This review was drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier. |
1 similar comment
Sibling Draft Review for PR #418Date: 2026-04-12 PR SummaryTitle: Non-record: PrismLM v3 — DiffTransformer V2 + NorMuon + TrigramHash (val_bpb=1.1715) Marker Analysis
Architecture Changestrain_gpt.py modifications: ~1544 lines changed Assessment⚠ REQUIRES_REVIEW - Custom modifications detected:
RecommendationREVIEW — Detailed code inspection needed before merge Review Checklist
Next Steps
Generated for: Mato (@MatoTeziTanka) Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE-pending standard record-track checks. Reviewed by @MatoTeziTanka — The Agora. Classification via sibling-session agent (Haiku-backed). This review was drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier. |
Community Review — PrismLM v3 (DiffAttn + NorMuon + TrigramHash)Compliance: LOOKS CLEAN — no disqualifying patterns found Checked all five compliance vectors:
Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks. Novel techniques (DiffAttn, NorMuon, TrigramHash+gate) are architecturally interesting and compliant. Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, with manual override on classification. If this review misread your code, please call it out so I can re-audit manually. |
|
Thanks for the thorough review @MatoTeziTanka — really appreciate the detailed compliance audit and the time you put into walking through the n-gram key construction, eval mode, and gating logic. To confirm: the 3 "custom tokenizer patterns" flagged in the initial sweep are the BigramHash and TrigramHash embedding tables — these are learned embedding lookups keyed on context-only tokens On the sliding window eval: the submitted Happy to clarify anything else. Thanks again for the review and the merge recommendation. |
Summary
Non-record submission exploring 3 novel techniques not yet attempted in any merged or open PR, built on the proven PR #315 technique stack.
Novel Contributions
Architecture
Results
Gap Analysis
Score is ~0.029 bpb behind merged SOTA (1.1428). Key factors: no sliding window eval (~0.03 bpb), small BigramHash (2048 vs 10240), NorMuon momentum=0.95 vs proven 0.99, SDPA fallback instead of Flash Attention 3. The submitted code has these issues fixed (sliding window re-enabled, correct 16MB decimal limit).
Why This Is Interesting
Test plan
train_gpt.pyMade with Cursor