You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: RESEARCH_BACKLOG.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,6 +112,9 @@ These are NOT world-novel but ARE necessary baseline pieces. The SOTA val_bpb 1.
112
112
| 10 | EMB_mdct_polyphase_projection |C30#6 — cross-domain pollination: MDCT filterbanks (audio MP3/AAC) + Bellanger 1983 polyphase | split 1024 vocab into 16 polyphase channels via learned rotation R; project each channel through 64×32 MDCT-preconditioned matrix (cosine kernel from MP3/AAC filterbank); learned summation reconstruction → 50% size reduction; **MDCT block-overlap matches vocab cluster boundaries for quantization robustness**| -0.008 to -0.016 BPB indirect |**world-novel-candidate**| 110 | 20260408T0635Z |
113
113
| 11 | EMB_spherical_norm_compression |C30#6 — Jina AI spherical compression Jan 2026 + byte-vocab application | normalize embeddings to unit sphere; learnable per-dim 4-bit nonlinear quantization with entropy-adaptive bin placement (learned histogram, not uniform); reconstruction error <1e-6 on logit geometry; **byte-vocab clusters tightly on sphere**| -0.005 to -0.010 BPB indirect (1.0-1.5 MB freed) |**world-novel-candidate**| 85 | 20260408T0635Z |
114
114
| 12 | EMB_learned_hessian_codebook_tiling |C30#6 — Goya/GPTQ-Lite Hessian-aware + group-adaptive K-means | compute Fisher-Hessian of logit loss w.r.t. embedding rows; cluster into 32 prototype groups by Hessian norm; assign each group its own K-means codebook (rank 4-8 by importance); critical rows get higher precision | -0.006 to -0.012 BPB indirect | comp-novel | 130 | 20260408T0635Z |
115
+
| 13 | EMB_lsq_gradient_aware_embedding_quantization | C30 1127Z — LSQ ICLR 2020 + GWQ arXiv:2411.00850 fusion | apply Learned Step-Size Quantization (learnable scalar α per layer) to tok_emb during training; use Fisher/Hessian to allocate bits per embedding DIMENSION (not row): high-gradient dims get more bits, rare-byte dims fewer. Tied head inherits factorization automatically | -0.008 to -0.015 BPB (1.1-1.8 MB freed via int4 + learned step) |**world-novel-candidate**| 110 | 20260408T1127Z |
116
+
| 14 | EMB_intrinsic_dimension_adaptive_projection | C30 1127Z — arXiv:2503.02142 ID estimation adapted to byte vocab | compute intrinsic dimension (ID) of byte-vocab clusters via skipped-SVD during 500-step warmup; embed common bytes (ID_cluster < threshold) through 512-d learned, rare bytes through 256-d Fourier basis + 256-d learned; gate by entropy bucket. Distinct from #6 (uses ID, not entropy) | -0.006 to -0.012 BPB (0.9-1.4 MB freed) |**world-novel-candidate**| 95 | 20260408T1127Z |
117
+
| 15 | EMB_tied_lsq_tt_codebook_factorization | C30 1127Z — TT-decompose arXiv:1901.10787 + LSQ + tied co-training | TT-decompose tok_emb into ranks [1,64,16,1]; share TT factors across BOTH tok_emb and lm_head; apply LSQ learnable step per TT factor; co-train embedding+head with separate LRs. Extension of #5 with LSQ + Hessian-driven rank selection | -0.010 to -0.018 BPB (1.5-2.1 MB freed via factorized cores + int5) | comp-novel (TT known; tying-with-LSQ is the novelty) | 130 | 20260408T1127Z |
115
118
116
119
---
117
120
@@ -211,6 +214,8 @@ These are NOT world-novel but ARE necessary baseline pieces. The SOTA val_bpb 1.
211
214
| 12b | CMP_quant_value_dedup | C90 1010Z novel synthesis — post-int8 alphabet snap for zlib LZ77 | snap int8 q values to multiples of step (default 2), halving effective alphabet from 255→128 distinct values; creates longer LZ77 byte runs in zlib payload; trades recoverable precision for entropy reduction | -0.003 to -0.008 BPB via 5-15% smaller serialized artifact → reallocate freed bytes |**world-novel-candidate****SHIPPED 1010Z** as CMP_QUANT_VALUE_DEDUP_MARKER | 25 | 20260408T1010Z |
websearch_hits: 0 (LSQ for general LMs exists; LSQ + per-DIMENSION Fisher-bit-allocation for tied byte-vocab embeddings = 0)
486
+
github_terms: ["LSQ tok_emb tied head", "GWQ byte language model embedding"]
487
+
github_hits: 0
488
+
comp_pr_audit_utc: 20260408T1127Z
489
+
comp_pr_hits: 0
490
+
verdict: world-novel-candidate
491
+
verdict_reason: LSQ paper (rkgO66VKDS) covers general weights; GWQ (arXiv:2411.00850) covers LLM weights but not embeddings; per-DIMENSION (not per-row) gradient-aware bit allocation for byte-LM tied embeddings is the new combination
492
+
phd_defensible: yes — clear hypothesis (per-dim gradient variance predicts quant sensitivity), falsifiable ablation (uniform vs gradient-aware bit allocation sweep), 6-page workshop paper feasible
493
+
owner: E
494
+
495
+
### EMB_intrinsic_dimension_adaptive_projection
496
+
added_utc: 20260408T1127Z
497
+
source: C30 1127Z — arXiv:2503.02142 ID estimation + byte-vocab adaptation
verdict_reason: ID estimation papers exist (arXiv:2503.02142) but not for byte-vocab routing of embedding capacity. Distinct from existing #6 EMB_byte_adaptive_projection_mixing (entropy bucket gate, not ID).
506
+
phd_defensible: yes — clear hypothesis (lower-ID byte clusters need fewer dims), clear ablation (gate-off control), connects to manifold-learning literature
comp_pr_hits: 0 (only 2 BROTLI PRs in competition, 0 rANS)
518
+
verdict: world-novel-candidate
519
+
verdict_reason: distinct from existing L10 #10 CMP_asymmetric_numeric_systems_neural_prior — that one uses a single global predictor; this one uses per-layer predictors with position+previous-code context
520
+
phd_defensible: yes — clear hypothesis (per-layer code distributions differ from global), falsifiable (cross-validate predictor; bits saved vs zlib baseline), workshop paper feasible on info-theoretic LM compression
websearch_hits: 0 (EfficientQAT exists for QAT; PTQ + zlib + learned scalar α via val-gradient = novel combination)
528
+
github_terms: ["learned clip alpha quantization", "absmax minus alpha sigma int8"]
529
+
github_hits: 0
530
+
comp_pr_audit_utc: 20260408T1127Z
531
+
comp_pr_hits: 0
532
+
verdict: world-novel-candidate
533
+
verdict_reason: tighter clip via val-gradient learned α reduces outlier dominance → smaller serialized artifact via zlib LZ77 longer runs. Not in existing backlog (CMP_HESSIAN_BIT_BUDGET demoted; this is val-driven not Hessian-driven)
534
+
phd_defensible: no — empirical engineering candidate; clear ablation but no clean theoretical mechanism. Useful as a comp-novel ship if it works.
0 commit comments