fix(data): normalize cached FineWeb paths by RolanH · Pull Request #7 · openai/parameter-golf

RolanH · 2026-03-18T18:54:58Z

Summary

normalize MATCHED_FINEWEB_REMOTE_ROOT_PREFIX handling for manifest, docs, dataset, and tokenizer downloads in data/cached_challenge_fineweb.py
strip full multi-segment remote prefixes before mapping files into local data/datasets and data/tokenizers
add regression coverage for nested remote prefixes and empty-prefix manifest resolution

Test Plan

python3 -m unittest discover -s tests -v
python3 -m py_compile $(rg --files -g '*.py')

Handle MATCHED_FINEWEB_REMOTE_ROOT_PREFIX consistently for manifest, docs, datasets, and tokenizer artifacts. Add regression tests for nested prefixes and empty-prefix manifest resolution.

PR openai#672 maxes TTT at 30 epochs (590s/600s eval budget), so all future improvements must be orthogonal to TTT. This update: - Sets 1.0781 BPB (PR openai#672) as the new target to beat - Reorders Top 8 directions: XSA-all confirmed at #1, Full GPTQ #2, SwiGLU #3, Muon-VS #4, aggressive quant #5, MASA openai#6, depth recurrence openai#7 with int6 risk warning, AdEMAMix openai#8 - Deprioritizes TTT-related directions already exploited by PR openai#672 - Collapses ~1000 lines of stale Round 0-3.9 session logs into a concise historical summary - Removes resolved blockers (flash_attn, SSH hangs, local runtime) - Adds fresh Round 1 section with 5 submitted experiments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…mization goal Key clarifications: 1. Warm-start x is the soft embedding: initially one-hot, iteratively updated by CTP/NTP predictions across DEQ iterations 2. Soft Dense Routing: sparsity encouraged (L1 per-token) not required, balance enforced globally — applies to ALL expert groups (MLP + MoS) 3. Optimization goal: throughput × convergence rate within time budget 4. Include paper/repo references when proposing improvements 5. Fixed constraint numbering (DDP is now openai#7) 6. Added paper refs: DeepSeek-V2, GatedAttn repo, FSQ paper Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix openai#1: ternary roundtrip eval on ALL ranks with dist.broadcast (was: only rank 0 loaded weights → invalid eval results) - Fix openai#2: pass pre-computed scales to export (avoids double-quantization) - Fix openai#3: keep scales as float32 (was: lossy float16 cast) - Fix openai#4: import returns float32 (was: lossy bfloat16 cast) - Fix openai#5: lower z_loss from 1e-4 to 1e-5 (prevents loss explosion) - Fix openai#6: add dist.broadcast after int8 roundtrip load too - Fix openai#7: add weights_only=False to suppress FutureWarning Ternary roundtrip is now LOSSLESS (max error = 0.0). The previous val_bpb=0.9650 was an artifact of bug openai#1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

EL2 cycle-2 = 3.2742 (only +0.0008 above champion 3.2734) reversed the audit fire openai#1 verdict that EngramLite was falsified. Adding 4 new EL multi-seed experiments to confirm: - EL3 (seed 1337), EL4 (seed 999), EL5 (seed 7) - EL6 with L5 weights (0.15/0.20/0.15) — new combination Removed 15 dead/falsified configs that wasted cycle 2 compute: EA*, BG*, NG*, TH*, MEGA, MTP0/2/3, MTP1_seed999, PR2/3, EL0. Also captured EMA(0.997) canonical spec from 6 merged records (openai#287, openai#315, openai#414, openai#1019, openai#1099) — DEFERRED actual Patch 17 ship because EMA only affects final val_bpb (not loop train_loss) and training-loop anchoring is risky without reading train_gpt.py. Queue now cycles in ~100 min (vs 185 min) leaving more compute for the EL family expansion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ntified as top missing technique Patches 15/16/21 + NEW Patch 20 USE_COPRIME_STRIDE all uncontested in 150+ open + 20 closed PRs (7 consecutive audits for the original 3, first confirmation for Patch 20 just shipped 3h ago). CRITICAL FINDING: XSA (Cross-Sequence Attention) is in 4+ MERGED records (PR openai#1019, openai#287, openai#315, openai#265, latest openai#1099) and we have ZERO attention-mask variants. Most-validated missing technique. ~200 LOC moderate port — too big for a single research fire but worth a focused 30-45 min investigation if we can find a minimal variant. SLOT (Score-First TTT) is the openai#2 missing (PR openai#549, ~100 LOC) but it's eval-time, joins the H100 escalation bundle category. H100 escalation candidate updated: NEW: CHAMP_L4 + COPRIME_STRIDE + EL + (EMA + Tilt + INT6 GPTQ) OLD: CHAMP_L4 + EL + (EMA + Tilt + INT6 GPTQ) Need CS2 cycle 2+3 for n=3 mean confirmation before escalating. PR openai#1430 still OPEN, 0 comments, no comp owner activity for 16h+. Spend ~$4.00/$36 (11.1%). Pod healthy at 7h 50min uptime. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… coef sweeps Per user directive 2026-04-30 (feedback_throughput_priority.md): throughput-bearing iters (Triton kernels, sparse dispatch, sparse attention) take queue priority over coef-sweep follow-ups for iter 112's Gram penalty. Throughput compounds research velocity — faster step rate = more iters per unit time. Tier 1 reordered: openai#1 (DROPPED) iter 113 openai#2 iter 112 — IN FLIGHT openai#3 iter 117b-2 — Triton entmax (THROUGHPUT) openai#4 iter 117b-3 — Sparse MoE dispatch (THROUGHPUT, biggest win) openai#5 iter 117b-3b — Sparse-Q attention (THROUGHPUT, promoted from Tier 2) openai#6 iter 120 — RRAttention (THROUGHPUT, promoted from Tier 2) openai#7 iter 108 — k_eval=10 throughput openai#8 iter 110 — refinement re-enable (last) Deferred coef sweeps (post-throughput): iter 112b/c/d. These remain conditional on iter 112 promotion AND will only run after the throughput iters are exhausted. Anti-pattern explicitly avoided: chasing diminishing val_bpb gains via hyperparameter tuning while a 1.5-4x wallclock improvement sits unmerged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…2048 Per user challenge 2026-04-30: "Would RRAttention hurt throughput as the optimized SDPA is replaced?" — answered yes. RRAttention is the same SDPA-replacement class as iter 106 NSA which was DROPPED 2026-04-29 because NSA = 0.42x FlashAttention at T=2048 (official fla-org Triton benchmark). At T=2048 we're in the FA-fusion + tensorcore-saturation regime; manual sparse attention loses on memory traffic, kernel launch overhead, and tensorcore utilization simultaneously. The component file's "8/8 PASS, tau=1.0 bit-identical" claim is a correctness check, NOT a throughput check. Pure-PyTorch component cannot compete with F.scaled_dot_product_attention at T=2048. Re-queue paths: - flex_attention (PyTorch 2.5+) with score_mod/block_mask - Custom Triton kernel with selection inside FA tile - Defer until T-scaling phase (T=4096+) Tier 1 reordered: openai#1 (DROPPED) iter 113 openai#2 iter 112 — IN FLIGHT openai#3 iter 117b-2 — Triton entmax (kernel-only, doesn't replace SDPA) openai#4 iter 117b-3 — Sparse MoE dispatch (replaces MLP path, not SDPA) openai#5 iter 117b-3b — Sparse-Q attention (smaller-Q gather; SDPA call preserved) openai#6 iter 108 — k_eval=10 (one-line config) openai#7 iter 110 — refinement re-enable DEMOTED to Deferred: iter 120 (RRAttention). New durable rule: feedback_sdpa_replacement_at_T2048.md — never queue sparse-attention iters that REPLACE F.scaled_dot_product_attention at T=2048 without a fused implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(data): normalize cached FineWeb paths

9c73fc2

Handle MATCHED_FINEWEB_REMOTE_ROOT_PREFIX consistently for manifest, docs, datasets, and tokenizer artifacts. Add regression tests for nested prefixes and empty-prefix manifest resolution.

RolanH closed this Mar 18, 2026

RolanH deleted the codex/fix-cached-fineweb-paths branch March 18, 2026 18:56

upascal mentioned this pull request May 2, 2026

Non-record submission: post-deadline CaseOps + SparseAttnGate + Phased TTT (1.07134 BPB) #2143

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(data): normalize cached FineWeb paths#7

fix(data): normalize cached FineWeb paths#7
RolanH wants to merge 1 commit intoopenai:mainfrom
RolanH:codex/fix-cached-fineweb-paths

RolanH commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RolanH commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RolanH commented Mar 18, 2026 •

edited

Loading