C30 research 1127Z: +5 candidates for L03 (3) + L10 (2) — 4 world-novel-candidate, 1 comp-novel

Takoda Mundy · Takoda Mundy · commit 920f8439a2dc · 2026-04-08T21:32:33.000+10:00
L03 +3: - EMB_lsq_gradient_aware_embedding_quantization (LSQ ICLR 2020 + GWQ arXiv:2411.00850 fusion, world-novel, PhD-defensible, 110 LOC) - EMB_intrinsic_dimension_adaptive_projection (arXiv:2503.02142 ID estimation for byte vocab, world-novel, PhD-defensible, 95 LOC) - EMB_tied_lsq_tt_codebook_factorization (TT + LSQ + tied co-training, comp-novel extension of openai#5, 130 LOC) L10 +2: - CMP_context_adaptive_rANS_per_layer_predictor (RAS + EntroLLM, world-novel, PhD-defensible, 140 LOC) - CMP_learned_scalar_adaptive_clipping (EfficientQAT extension, world-novel-candidate, not PhD-defensible, 65 LOC) Skipped: L10 Hadamard pre-rotation (duplicate of existing openai#8 CMP_hadamard_pre_rotation_quant) 4 audit blocks added to Section C with verdicts. L03 backlog: 12→15. L10 backlog: 14→16. Both layers above 5-candidate floor.
diff --git a/RESEARCH_BACKLOG.md b/RESEARCH_BACKLOG.md
@@ -112,6 +112,9 @@ These are NOT world-novel but ARE necessary baseline pieces. The SOTA val_bpb 1.
 | 10 | EMB_mdct_polyphase_projection | C30#6 — cross-domain pollination: MDCT filterbanks (audio MP3/AAC) + Bellanger 1983 polyphase | split 1024 vocab into 16 polyphase channels via learned rotation R; project each channel through 64×32 MDCT-preconditioned matrix (cosine kernel from MP3/AAC filterbank); learned summation reconstruction → 50% size reduction; **MDCT block-overlap matches vocab cluster boundaries for quantization robustness** | -0.008 to -0.016 BPB indirect | **world-novel-candidate** | 110 | 20260408T0635Z |
 | 11 | EMB_spherical_norm_compression | C30#6 — Jina AI spherical compression Jan 2026 + byte-vocab application | normalize embeddings to unit sphere; learnable per-dim 4-bit nonlinear quantization with entropy-adaptive bin placement (learned histogram, not uniform); reconstruction error <1e-6 on logit geometry; **byte-vocab clusters tightly on sphere** | -0.005 to -0.010 BPB indirect (1.0-1.5 MB freed) | **world-novel-candidate** | 85 | 20260408T0635Z |
 | 12 | EMB_learned_hessian_codebook_tiling | C30#6 — Goya/GPTQ-Lite Hessian-aware + group-adaptive K-means | compute Fisher-Hessian of logit loss w.r.t. embedding rows; cluster into 32 prototype groups by Hessian norm; assign each group its own K-means codebook (rank 4-8 by importance); critical rows get higher precision | -0.006 to -0.012 BPB indirect | comp-novel | 130 | 20260408T0635Z |
+| 13 | EMB_lsq_gradient_aware_embedding_quantization | C30 1127Z — LSQ ICLR 2020 + GWQ arXiv:2411.00850 fusion | apply Learned Step-Size Quantization (learnable scalar α per layer) to tok_emb during training; use Fisher/Hessian to allocate bits per embedding DIMENSION (not row): high-gradient dims get more bits, rare-byte dims fewer. Tied head inherits factorization automatically | -0.008 to -0.015 BPB (1.1-1.8 MB freed via int4 + learned step) | **world-novel-candidate** | 110 | 20260408T1127Z |
+| 14 | EMB_intrinsic_dimension_adaptive_projection | C30 1127Z — arXiv:2503.02142 ID estimation adapted to byte vocab | compute intrinsic dimension (ID) of byte-vocab clusters via skipped-SVD during 500-step warmup; embed common bytes (ID_cluster < threshold) through 512-d learned, rare bytes through 256-d Fourier basis + 256-d learned; gate by entropy bucket. Distinct from #6 (uses ID, not entropy) | -0.006 to -0.012 BPB (0.9-1.4 MB freed) | **world-novel-candidate** | 95 | 20260408T1127Z |
+| 15 | EMB_tied_lsq_tt_codebook_factorization | C30 1127Z — TT-decompose arXiv:1901.10787 + LSQ + tied co-training | TT-decompose tok_emb into ranks [1,64,16,1]; share TT factors across BOTH tok_emb and lm_head; apply LSQ learnable step per TT factor; co-train embedding+head with separate LRs. Extension of #5 with LSQ + Hessian-driven rank selection | -0.010 to -0.018 BPB (1.5-2.1 MB freed via factorized cores + int5) | comp-novel (TT known; tying-with-LSQ is the novelty) | 130 | 20260408T1127Z |
 
 ---
 
@@ -211,6 +214,8 @@ These are NOT world-novel but ARE necessary baseline pieces. The SOTA val_bpb 1.
 | 12b | CMP_quant_value_dedup | C90 1010Z novel synthesis — post-int8 alphabet snap for zlib LZ77 | snap int8 q values to multiples of step (default 2), halving effective alphabet from 255→128 distinct values; creates longer LZ77 byte runs in zlib payload; trades recoverable precision for entropy reduction | -0.003 to -0.008 BPB via 5-15% smaller serialized artifact → reallocate freed bytes | **world-novel-candidate** **SHIPPED 1010Z** as CMP_QUANT_VALUE_DEDUP_MARKER | 25 | 20260408T1010Z |
 | 13 | CMP_entro_llm_huffman_cabac_hybrid | C30#8 — EntroLLM arXiv:2505.02380 + CABAC (H.264 video codec) hybrid | post-GPTQ int6: lightweight ~4KB NN predicts P(code\|layer,pos,prev_code) to drive CABAC adaptive Huffman with context bins | saves 0.8-1.2 MB → -0.0035 BPB indirect | **world-novel-candidate** | 110 | 20260408T0720Z |
 | 14 | CMP_learned_elias_gamma_codes_rq | C30#8 — Elias gamma universal codes (1950s) + RQ stage-specific CDF training | replace fixed rANS with learned Elias-gamma parameters per RQ stage (primary + residual codebook); train CDF predictor on weight distribution of EACH quant stage | saves 0.6-1.0 MB → -0.003 BPB indirect | **world-novel-candidate** | 95 | 20260408T0720Z |
+| 15 | CMP_context_adaptive_rANS_per_layer_predictor | C30 1127Z — RAS arXiv:2511.04684 + EntroLLM arXiv:2505.02380 | train tiny <2KB categorical NN predictor `P(quant_value \| layer_id, position, prev_code)` to drive rANS entropy coding for QUANTIZED INDICES (post-int8); learned context-adaptive CDF beats static zlib + Gaussian prior. Distinct from #10 (per-layer predictor, not single global) | -0.005 to -0.011 BPB (1.0-2.2 MB freed) | **world-novel-candidate** | 140 | 20260408T1127Z |
+| 16 | CMP_learned_scalar_adaptive_clipping | C30 1127Z — EfficientQAT ACL 2025 + per-layer extension | learn per-layer scalar α ∈ [0,1] via val-set gradient; use clip = absmax(W) - α·std(W) during int8 (not fixed absmax); tighter, less outlier-dominated dist → zlib better compression (3-6% smaller artifact) | -0.003 to -0.008 BPB (0.6-1.6 MB freed) | **world-novel-candidate** | 65 | 20260408T1127Z |
 
 ---
 
diff --git a/STACK_NOVELTY_TRACKER.md b/STACK_NOVELTY_TRACKER.md
@@ -478,6 +478,62 @@ verdict_reason: 0 hits anywhere
 phd_defensible: yes    # options: yes | no | TBD
 owner: MAC
 
+### EMB_lsq_gradient_aware_embedding_quantization
+added_utc: 20260408T1127Z
+source: C30 1127Z research fire — LSQ ICLR 2020 + GWQ arXiv:2411.00850 byte-vocab synthesis
+websearch_terms: ["LSQ tied embedding byte LM", "gradient-aware bit allocation embedding row dimension", "learned step size embedding quantization 2025"]
+websearch_hits: 0 (LSQ for general LMs exists; LSQ + per-DIMENSION Fisher-bit-allocation for tied byte-vocab embeddings = 0)
+github_terms: ["LSQ tok_emb tied head", "GWQ byte language model embedding"]
+github_hits: 0
+comp_pr_audit_utc: 20260408T1127Z
+comp_pr_hits: 0
+verdict: world-novel-candidate
+verdict_reason: LSQ paper (rkgO66VKDS) covers general weights; GWQ (arXiv:2411.00850) covers LLM weights but not embeddings; per-DIMENSION (not per-row) gradient-aware bit allocation for byte-LM tied embeddings is the new combination
+phd_defensible: yes — clear hypothesis (per-dim gradient variance predicts quant sensitivity), falsifiable ablation (uniform vs gradient-aware bit allocation sweep), 6-page workshop paper feasible
+owner: E
+
+### EMB_intrinsic_dimension_adaptive_projection
+added_utc: 20260408T1127Z
+source: C30 1127Z — arXiv:2503.02142 ID estimation + byte-vocab adaptation
+websearch_terms: ["intrinsic dimension byte vocabulary embedding adaptive", "ID estimation token embedding routing", "skipped SVD vocab embedding"]
+websearch_hits: 0
+github_terms: ["intrinsic_dimension byte vocab embed", "adaptive_projection byte LM"]
+github_hits: 0
+comp_pr_audit_utc: 20260408T1127Z
+comp_pr_hits: 0
+verdict: world-novel-candidate
+verdict_reason: ID estimation papers exist (arXiv:2503.02142) but not for byte-vocab routing of embedding capacity. Distinct from existing #6 EMB_byte_adaptive_projection_mixing (entropy bucket gate, not ID).
+phd_defensible: yes — clear hypothesis (lower-ID byte clusters need fewer dims), clear ablation (gate-off control), connects to manifold-learning literature
+owner: E
+
+### CMP_context_adaptive_rANS_per_layer_predictor
+added_utc: 20260408T1127Z
+source: C30 1127Z — RAS arXiv:2511.04684 + EntroLLM arXiv:2505.02380 fusion
+websearch_terms: ["context-adaptive rANS per-layer LLM compression", "neural prior weight quantization rANS 2025", "learned CDF predictor LLM weight indices"]
+websearch_hits: 0 specific (rANS LM compression exists at high level; per-layer per-position learned predictor for quantized indices = 0)
+github_terms: ["rans_predictor llm", "context_adaptive_rans weight"]
+github_hits: 0
+comp_pr_audit_utc: 20260408T1127Z
+comp_pr_hits: 0 (only 2 BROTLI PRs in competition, 0 rANS)
+verdict: world-novel-candidate
+verdict_reason: distinct from existing L10 #10 CMP_asymmetric_numeric_systems_neural_prior — that one uses a single global predictor; this one uses per-layer predictors with position+previous-code context
+phd_defensible: yes — clear hypothesis (per-layer code distributions differ from global), falsifiable (cross-validate predictor; bits saved vs zlib baseline), workshop paper feasible on info-theoretic LM compression
+owner: G/Mac
+
+### CMP_learned_scalar_adaptive_clipping
+added_utc: 20260408T1127Z
+source: C30 1127Z — EfficientQAT ACL 2025 + per-layer extension
+websearch_terms: ["learned per-layer clip scalar int8 quantization", "absmax minus alpha sigma quantization clipping", "validation gradient learned quantization clip 2025"]
+websearch_hits: 0 (EfficientQAT exists for QAT; PTQ + zlib + learned scalar α via val-gradient = novel combination)
+github_terms: ["learned clip alpha quantization", "absmax minus alpha sigma int8"]
+github_hits: 0
+comp_pr_audit_utc: 20260408T1127Z
+comp_pr_hits: 0
+verdict: world-novel-candidate
+verdict_reason: tighter clip via val-gradient learned α reduces outlier dominance → smaller serialized artifact via zlib LZ77 longer runs. Not in existing backlog (CMP_HESSIAN_BIT_BUDGET demoted; this is val-driven not Hessian-driven)
+phd_defensible: no — empirical engineering candidate; clear ablation but no clean theoretical mechanism. Useful as a comp-novel ship if it works.
+owner: G/Mac
+
 The PhD defensibility check (PD3) requires:
   - clear hypothesis + falsification criterion
   - clear theoretical or empirical mechanism