openai · arsenis-cmd · Apr 15, 2026
diff --git a/..._16mb/2026-04-08_Verifily_ThreeTierTokenWeighting_DCLSSalience_SP1024/README.md b/..._16mb/2026-04-08_Verifily_ThreeTierTokenWeighting_DCLSSalience_SP1024/README.md
@@ -0,0 +1,57 @@
+# Verifily: Three-Tier Token Weighting + DCLS Salience (Non-Record)
+
+## Approach
+
+**Pure data-quality approach — zero architectural changes.**
+
+We layer three data-quality components on top of an SP1024 11L 512d baseline (XSA-all + GPTQ + BigramHash + Parallel Muon). All modifications are in the loss computation and eval — zero additional parameters, zero extra memory beyond a 4MB bigram table.
+
+### 1. Three-Tier Token Classification (Training)
+
+Not all tokens deserve equal gradient. We classify each token into three tiers using a GPU-resident bigram frequency table built incrementally from training data:
+
+| Tier | Condition | Weight | Rationale |
+|------|-----------|--------|-----------|
+| **Predictable** | P_bigram > ~p95 | 0.10 | Bigram handles these; free neural capacity |
+| **Frontier** | Low P_bigram + high quality doc | 1.0 | Maximum gradient signal |
+| **Noise** | Low P_bigram + low quality doc | 0.70 | Gentle gradient reduction |
+
+Document quality is scored per-batch using two GPU-vectorized signals:
+- Vocabulary richness (unique tokens / total via scatter)
+- Repetition (fraction of tokens matching 4 positions back)
+
+### 2. DCLS Salience Batch Reweighting (Training)
+
+Per-batch loss multiplier in [0.85, 1.15] based on surprise (|batch_loss - EMA| / EMA) and document quality. High-surprise high-quality batches get amplified.
+
+### 3. Quality-Conditioned Bigram Mixer (Eval)
+
+At eval, mix neural predictions with bigram statistics where alpha is conditioned on document quality:
+- High quality docs: alpha_base = 0.15 (trust neural more)
+- Low quality docs: alpha_base = 0.30 (trust bigram more)
+- Scaled by bigram confidence
+
+## Results
+
+2-seed validation on 8xH100 SXM (seed 999 lost to pod termination):
+
+| Seed | BPB | Loss | Steps | Artifact |
+|------|-----|------|-------|----------|
+| 314 | 1.13414677 | 1.91495424 | 6524 | 15,841,796 bytes |
+| 42 | 1.13285851 | 1.91277908 | 6732 | 15,917,868 bytes |
+| **Mean** | **1.13350264** | **1.91386666** | | |
+
+This places ~#16 on the leaderboard. The result demonstrates that data-quality signals provide measurable training improvement, but cannot close a ~0.05 BPB gap driven by architectural advances (SP8192, depth recurrence, parallel residuals, TTT).
+
+## Ablation Environment Variables
+
+```bash
+VERIFILY_ENABLED=0         # Disable all Verifily components
+VERIFILY_SALIENCE=0        # Disable salience reweighting only
+VERIFILY_MIXER=0           # Disable eval-time bigram mixer only
+VERIFILY_NGRAM_WARMUP=500  # Steps before activating token weighting
+```
+
+## Base Architecture (Unchanged)
+
+SP1024, 11 layers, 512d, 8 heads, 4 KV heads, 3x MLP, XSA-all, BigramHash(2048,128), Parallel Muon+Adam, GPTQ-int6+LZMA, sliding window eval (stride 64)
diff --git a/...cord_16mb/2026-04-08_Verifily_ThreeTierTokenWeighting_DCLSSalience_SP1024/submission.json b/...cord_16mb/2026-04-08_Verifily_ThreeTierTokenWeighting_DCLSSalience_SP1024/submission.json
@@ -0,0 +1,48 @@
+{
+  "author": "Arsenis Papachristos",
+  "github_id": "Areneu",
+  "name": "Verifily Three-Tier Token Weighting + DCLS Salience Reweighting",
+  "blurb": "Pure data-quality approach — zero architectural changes to the base GPT. Three components: (1) BigramStats-driven three-tier token weighting (Predictable=0.10, Frontier=1.0, Noise=0.70), (2) DCLS-inspired salience batch reweighting [0.85, 1.15], (3) quality-conditioned bigram mixer at eval. Demonstrates that training signal quality alone can improve BPB on an unchanged SP1024 11L 512d architecture. 2-seed mean: 1.13350264 BPB.",
+  "date": "2026-04-08",
+  "track": "non_record_10min_16mb",
+  "val_loss": 1.91386666,
+  "val_bpb": 1.13350264,
+  "val_loss_std": 0.00108758,
+  "val_bpb_std": 0.00064413,
+  "seeds": [314, 42],
+  "seed_results": {
+    "314": {
+      "val_loss": 1.91495424,
+      "val_bpb": 1.13414677,
+      "artifact_bytes": 15841796,
+      "steps": 6524,
+      "step_avg_ms": 92.0
+    },
+    "42": {
+      "val_loss": 1.91277908,
+      "val_bpb": 1.13285851,
+      "artifact_bytes": 15917868,
+      "steps": 6732,
+      "step_avg_ms": 89.2
+    }
+  },
+  "note": "Seed 999 not completed — pod terminated during run. Submitting as non-record with 2 seeds.",
+  "base_stack": "SP1024, 11 layers, 512d, 8 heads, 4 KV heads, XSA-all, BigramHash(2048,128), Parallel Muon+Adam, GPTQ-int6+LZMA",
+  "implementation_lineage_pr": 1402,
+  "hardware": "8xH100 80GB SXM",
+  "pytorch_version": "2.9.1+cu128",
+  "cuda_version": "12.8",
+  "flash_attn_version": "2.8.3 (FA3 Hopper kernels)",
+  "verifily_components": {
+    "three_tier_weighting": "BigramStats P(curr|prev) — Predictable (>p95, w=0.10), Frontier (low ngram + high quality, w=1.0), Noise (low ngram + low quality, w=0.70)",
+    "salience_reweighting": "DCLS-inspired EMA loss tracking, surprise signal — per-batch multiplier [0.85, 1.15]",
+    "bigram_eval_mixer": "Quality-conditioned bigram interpolation at eval (alpha=0.30 if quality<0.6, else 0.15)"
+  },
+  "ablation_env_vars": {
+    "VERIFILY_ENABLED": "0 to disable all Verifily components",
+    "VERIFILY_THREE_TIER": "0 to disable token weighting",
+    "VERIFILY_SALIENCE": "0 to disable salience reweighting",
+    "VERIFILY_BIGRAM_EVAL": "0 to disable bigram mixer at eval"
+  },
+  "technique_summary": "Three-tier token weighting + DCLS salience + quality-conditioned bigram eval mixer"
+}