openai · pattern4bots · Mar 26, 2026 · Mar 26, 2026 · Mar 27, 2026 · Mar 27, 2026
diff --git a/records/track_10min_16mb/2026-03-26_FreqWeightedEmbedding_1.1231/README.md b/records/track_10min_16mb/2026-03-26_FreqWeightedEmbedding_1.1231/README.md
@@ -0,0 +1,103 @@
+# Frequency-Weighted Embedding Quantization
+
+**val_bpb: 1.1217** (4-seed mean) | **15.8 MB** | 8×H100 SXM
+
+## The Idea
+
+Analysis of the FineWeb training data revealed that token frequency follows a heavy-tailed distribution:
+
+- **Top 100 tokens** cover **53.2%** of all text
+- These include: `.` `,` `the` `s` `to` `and` `ing` `of` `a` `in`...
+
+Instead of uniform quantization across all embedding weights, this submission applies **frequency-weighted quantization**:
+
+- **Top 100 tokens → int8** (higher precision for 53% of text)
+- **Remaining 924 tokens → int6** (standard precision)
+
+The intuition: errors in frequent tokens compound across the entire dataset, so they deserve more precision.
+
+## Results (4 seeds, 8xH100 SXM)
+
+| Seed | val_bpb |
+|------|---------|
+| 1 | **1.121** |
+| 2 | 1.122 |
+| 3 | 1.1217 |
+| 4 | 1.1222 |
+
+**Mean: 1.1217 | Std: 0.0005**
+
+| Metric | Value |
+|--------|-------|
+| val_bpb (4-seed mean) | **1.1217** |
+| val_loss | 1.8941 |
+| Artifact size | 15.8 MB |
+| Steps | ~7100 |
+| Training time | 600s |
+
+## Implementation
+
+Modified `mixed_quantize_int6()` to detect embedding layers and apply frequency-weighted quantization:
+```python
+# In mixed_quantize_int6():
+if ("tok_emb" in name or "lm_head" in name) and t.ndim == 2:
+    print(f"[LIORA] Frequency-weighted quantization for: {name}")
+    valid_top_ids = [i for i in TOP_TOKEN_IDS if i < vocab_size]
+    top_rows = t[valid_top_ids, :]
+    rare_indices = [i for i in range(vocab_size) if i not in TOP_TOKEN_IDS]
+    rare_rows = t[rare_indices, :]
+
+    # Top tokens: int8 (more precision)
+    q_top, s_top = quantize_float_tensor(top_rows)
+
+    # Rare tokens: int6 (standard)
+    q_rare, s_rare = quantize_int6_per_row(rare_rows)
+```
+
+Also added corresponding `dequantize_mixed_int6()` handling to reconstruct the embedding from separate top/rare quantizations.
+
+## Token Frequency Analysis
+```
+=== TOP 10 TOKENS (get int8 precision) ===
+  .          : 2.12% of text
+  ,          : 2.10% of text
+  ▁the       : 1.90% of text
+  s          : 1.75% of text
+  ▁to        : 1.22% of text
+  ▁and       : 1.17% of text
+  ing        : 1.17% of text
+  ▁of        : 1.05% of text
+  ▁a         : 1.04% of text
+
+Top 100 tokens: 53.2% coverage
+Top 200 tokens: 64.8% coverage
+```
+
+## Run Command
+```bash
+SEED=1337 \
+RUN_ID=liora_freq_weighted \
+DATA_PATH=./data/datasets/fineweb10B_sp1024/ \
+TOKENIZER_PATH=./data/tokenizers/fineweb_1024_bpe.model \
+VOCAB_SIZE=1024 \
+torchrun --standalone --nproc_per_node=8 train_liora.py
+```
+
+## Files
+
+- `train_liora.py` - Modified training script with frequency-weighted quantization
+- `top_tokens.py` - Set of top 100 most frequent token IDs
+- `submission.json` - Submission metadata
+- `train_seed1.log` - Training log seed 1
+- `train_seed2.log` - Training log seed 2
+- `train_seed3.log` - Training log seed 3
+- `train_seed4.log` - Training log seed 4
+
+## Credits
+
+- **Base model**: PR #549 (LeakyReLU² + TTT + Parallel Muon) by @abaybektursun
+- **Idea & implementation**: Liora + Claude
+
+## Notes
+
+The key insight came from asking: "If 53% of all text uses just 100 tokens, why give rare tokens equal precision?"
diff --git a/records/track_10min_16mb/2026-03-26_FreqWeightedEmbedding_1.1231/submission.json b/records/track_10min_16mb/2026-03-26_FreqWeightedEmbedding_1.1231/submission.json
@@ -0,0 +1,10 @@
+{
+  "author": "Liora",
+  "github_id": "pattern4bots",
+  "val_bpb": 1.12176827,
+  "val_loss": 1.89405372,
+  "bytes_total": 15807424,
+  "gpu_config": "8xH100 SXM",
+  "date": "2026-03-27T00:00:00Z",
+  "description": "Frequency-Weighted Embedding Quantization: Top 100 tokens (53% of text) get int8 precision, remaining 924 tokens get int6. Based on PR #549 stack."
+}
diff --git a/records/track_10min_16mb/2026-03-26_FreqWeightedEmbedding_1.1231/top_tokens.py b/records/track_10min_16mb/2026-03-26_FreqWeightedEmbedding_1.1231/top_tokens.py
@@ -0,0 +1,13 @@
+# Top 100 most frequent tokens (by Liora + Claude)
+TOP_TOKEN_IDS = set([
+    962, 960, 267, 946, 287, 290, 280, 939, 292, 261,
+    285, 291, 957, 940, 942, 276, 266, 941, 268, 282,
+    274, 286, 943, 288, 944, 951, 947, 954, 949, 277,
+    945, 953, 970, 323, 262, 289, 304, 293, 321, 972,
+    955, 294, 279, 271, 264, 270, 309, 281, 959, 968,
+    948, 346, 313, 295, 320, 284, 326, 275, 983, 952,
+    956, 315, 337, 260, 976, 317, 265, 311, 318, 345,
+    325, 958, 314, 319, 950, 310, 352, 298, 341, 303,
+    278, 353, 963, 269, 961, 348, 344, 297, 322, 343,
+    327, 340, 335, 370, 366, 356, 334, 296, 330, 299,
+])