Non-Record: PR #1901 base + LQER Asymmetric + Brotli/Byte-Shuffle Compression#1927
Non-Record: PR #1901 base + LQER Asymmetric + Brotli/Byte-Shuffle Compression#1927squ11z1 wants to merge 5 commits intoopenai:mainfrom
Conversation
|
Thanks for the submission and for building on top of PR #1901. The ideas (LQER asym + Brotli/byte-shuffle) look interesting, but since there is no completed run or validated val_bpb yet, it’s hard to evaluate the real impact. Theoretical estimates alone aren’t enough for merging. Before this can be considered, we would need:
Also, since this is marked as non-record and the runs were preempted, I’d treat this as an experimental direction for now rather than something ready to merge. Happy to take another look once you have full results. |
|
Thanks for the careful review @Karen042009 — completely agree that a theoretical estimate doesn't merit merging. A few notes on where things stand: Validation is the binding constraint. The patch itself is small and round-trips cleanly through Compute access: I filed two compute-grant requests through the official form — first on 2026-04-27 (development tier, $500), and a follow-up on 2026-04-29 after this PR was published — neither returned a decision before the deadline. With $1.49 of personal balance left as I'm writing this, a clean 3-seed 8×H100 SXM run isn't feasible from my side this week. What I'll do when compute does become available: run 3 seeds against your PR #1901 baseline on identical hardware, post per-seed val_bpb + artifact sizes here, and run the Happy to keep this open as "experimental direction" in the meantime — if anyone reading wants to test the LQER-on-Sigma-Delta combination independently, the patched |
|
Thanks for the detailed clarification. The situation is clear, especially regarding compute constraints, and I appreciate that you’re being transparent about the validation status without overclaiming results. The plan for proper 3-seed runs and direct comparison against the PR #1901 baseline makes sense and would provide a solid basis for evaluation. For now, I agree this should remain an experimental direction. However, as mentioned before, a merge cannot be considered without fully measured and validated results. I’ll wait for an update once you have the full validation data available. |
Summary
Non-record submission proposing two orthogonal additions to PR #1901's stack (DualHash + AdaMuon + MoE + SDClip, val_bpb 0.83353 pending):
Patched
train_gpt.pyis LZMA-base85-wrapped (18,204 bytes vs PR #1901's 53,443 raw — 65.9% code-byte saving).Status: non-record
A $25 starter grant + remaining personal balance funded two single-seed bid attempts on 8×H100 SXM. Both were preempted before producing an artifact:
partial_run_2026-04-26.log(HF dataset).partial_run_2026-04-29.log.A $500 development grant filed 2026-04-27 did not return a decision before deadline. Submitting as non-record discussion: implementation + theoretical δ-BPB estimate, no measured val_bpb.
Theoretical δ-BPB estimate
Projected on PR #1901 base 0.83353: 0.823–0.829 BPB.
LQER δ is conservatively lower than PR #1797's measured −0.009 BPB on Hessian-GPTQ, because Sigma-Delta error diffusion auto-compensates within-row error (smaller residual variance → less for LQER to recover).
Test plan
partial_run_2026-04-29.log)LQER_TOP_K ∈ {1, 2, 3}×LQER_RANK ∈ {2, 4, 8}If validated post-deadline, I commit to providing 3-seed logs as a follow-up update.
Attribution
Compliance (verifiable from code)
Files
README.md— submission documentationsubmission.json— metadata with theoretical δ estimate (val_bpb fields null pending validation)train_gpt.py— LZMA-wrapped patched code (18,204 bytes)train_gpt_unwrapped.py— raw patched source for reviewpartial_run_2026-04-29.log— data-prefetch log up to preemption