Non-Record: PR #1901 base + LQER Asymmetric + Brotli/Byte-Shuffle Compression by squ11z1 · Pull Request #1927 · openai/parameter-golf

squ11z1 · 2026-04-29T12:02:54Z

Summary

Non-record submission proposing two orthogonal additions to PR #1901's stack (DualHash + AdaMuon + MoE + SDClip, val_bpb 0.83353 pending):

LQER asymmetric rank-4 post-quantization correction (port from PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797 @dexhunter, first application to a Sigma-Delta-quantized stack).
Brotli-11 + stride-2 byte-shuffle replacing LZMA (idea from PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855).

Patched train_gpt.py is LZMA-base85-wrapped (18,204 bytes vs PR #1901's 53,443 raw — 65.9% code-byte saving).

Status: non-record

A $25 starter grant + remaining personal balance funded two single-seed bid attempts on 8×H100 SXM. Both were preempted before producing an artifact:

2026-04-26: preempted at training step ~4,000 of ~6,700 (different stack — PR Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) #1493+LQER); train_loss 2.91 at step 4,000 preserved in partial_run_2026-04-26.log (HF dataset).
2026-04-29 (this stack): preempted at HuggingFace data prefetch 50% (125M / 250M tokens per rank); preserved as partial_run_2026-04-29.log.

A $500 development grant filed 2026-04-27 did not return a decision before deadline. Submitting as non-record discussion: implementation + theoretical δ-BPB estimate, no measured val_bpb.

Theoretical δ-BPB estimate

Contribution	Mechanism	Estimated δ
LQER asym rank-4 top-K=2	INT2 A + INT4 B per-group-64 SVD factors recover Sigma-Delta residual	−0.002 to −0.005 BPB
Brotli-11 + byte-shuffle	~150–280 KB compression saving → larger model	−0.002 to −0.005 BPB
Combined		−0.005 to −0.010 BPB

Projected on PR #1901 base 0.83353: 0.823–0.829 BPB.

LQER δ is conservatively lower than PR #1797's measured −0.009 BPB on Hessian-GPTQ, because Sigma-Delta error diffusion auto-compensates within-row error (smaller residual variance → less for LQER to recover).

Test plan

Patch applies cleanly to PR Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean) #1901 (function-level replacement, syntax check)
LZMA-base85 wrapper round-trips (compile + decompress identity verified)
Patched code launches on 8×H100 SXM (verified up to data prefetch in partial_run_2026-04-29.log)
Pending: 3-seed val_bpb on 8×H100 SXM with full 600s training cap
Pending: artifact size verification under 16 MB
Pending: ablation LQER_TOP_K ∈ {1, 2, 3} × LQER_RANK ∈ {2, 4, 8}
Pending: Brotli vs LZMA artifact size A/B on identical model

If validated post-deadline, I commit to providing 3-seed logs as a follow-up update.

Attribution

Base stack PR Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean) #1901: @Karen042009
LQER asymmetric: @dexhunter (PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797)
LQER paper: Lee et al. 2023 (arXiv:2310.18313)
Brotli + byte-shuffle idea: @dexhunter (PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855)

Compliance (verifiable from code)

INT6 SDClip + INT2/INT4 LQER + fp16 scales, Brotli-11 — all standard tensor types and public format
No PPM mixture (avoids the Issue Legality clarification: byte-level PPM-D mixture submissions (#1835 / #1850 / #1854 cluster) under Issue #1017 C2 #1872 / PR Report: PPM-D byte-level scoring is not a valid probability distribution, and why it appears to gain #1905 probability-distribution dispute)
Score-First TTT inherited verbatim from PR Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean) #1901
Training within 600s wallclock cap (PR Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean) #1901's auto-configured schedule unchanged)
Eval within 600s cap (LQER dequant adds <1s for top-K=2)

Files

README.md — submission documentation
submission.json — metadata with theoretical δ estimate (val_bpb fields null pending validation)
train_gpt.py — LZMA-wrapped patched code (18,204 bytes)
train_gpt_unwrapped.py — raw patched source for review
partial_run_2026-04-29.log — data-prefetch log up to preemption

Karen042009 · 2026-05-01T09:38:22Z

Thanks for the submission and for building on top of PR #1901.

The ideas (LQER asym + Brotli/byte-shuffle) look interesting, but since there is no completed run or validated val_bpb yet, it’s hard to evaluate the real impact. Theoretical estimates alone aren’t enough for merging.

Before this can be considered, we would need:

Completed runs (preferably 3-seed) with reported val_bpb
Verification that the model stays within the 16MB constraint
Clear comparison vs the PR Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean) #1901 baseline (same setup)

Also, since this is marked as non-record and the runs were preempted, I’d treat this as an experimental direction for now rather than something ready to merge.

Happy to take another look once you have full results.

squ11z1 · 2026-05-01T10:42:30Z

Thanks for the careful review @Karen042009 — completely agree that a theoretical estimate doesn't merit merging.

A few notes on where things stand:

Validation is the binding constraint. The patch itself is small and round-trips cleanly through compile(); it reaches the warmup phase on 8×H100 SXM (verified in partial_run_2026-04-29.log). Both bid-pod attempts preempted before producing an artifact — the 2026-04-29 run died at 50% of HF data prefetch, well before training started. Without an artifact I can't honestly compare against the PR #1901 baseline, and I won't pretend otherwise.

Compute access: I filed two compute-grant requests through the official form — first on 2026-04-27 (development tier, $500), and a follow-up on 2026-04-29 after this PR was published — neither returned a decision before the deadline. With $1.49 of personal balance left as I'm writing this, a clean 3-seed 8×H100 SXM run isn't feasible from my side this week.

What I'll do when compute does become available: run 3 seeds against your PR #1901 baseline on identical hardware, post per-seed val_bpb + artifact sizes here, and run the LQER_TOP_K × LQER_RANK ablation. If the projected −0.005 to −0.010 BPB doesn't materialize, I'll close this PR — same standard you applied. The integration risk lives entirely on the validation step.

Happy to keep this open as "experimental direction" in the meantime — if anyone reading wants to test the LQER-on-Sigma-Delta combination independently, the patched train_gpt.py is self-contained and reproducible. Will ping you here when measurements land.

Karen042009 · 2026-05-01T12:10:56Z

Thanks for the detailed clarification.

The situation is clear, especially regarding compute constraints, and I appreciate that you’re being transparent about the validation status without overclaiming results.

The plan for proper 3-seed runs and direct comparison against the PR #1901 baseline makes sense and would provide a solid basis for evaluation.

For now, I agree this should remain an experimental direction. However, as mentioned before, a merge cannot be considered without fully measured and validated results.

I’ll wait for an update once you have the full validation data available.

squ11z1 added 5 commits April 29, 2026 14:01

Add README.md

440cc34

Add submission.json

c99a540

Add train_gpt.py

3b1f003

Add train_gpt_unwrapped.py

f865a24

Add partial_run_2026-04-29.log

f144aaf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-Record: PR #1901 base + LQER Asymmetric + Brotli/Byte-Shuffle Compression#1927

Non-Record: PR #1901 base + LQER Asymmetric + Brotli/Byte-Shuffle Compression#1927
squ11z1 wants to merge 5 commits intoopenai:mainfrom
squ11z1:non-record-pr1901-lqer-brotli

squ11z1 commented Apr 29, 2026

Uh oh!

Karen042009 commented May 1, 2026

Uh oh!

squ11z1 commented May 1, 2026

Uh oh!

Karen042009 commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

squ11z1 commented Apr 29, 2026

Summary

Status: non-record

Theoretical δ-BPB estimate

Test plan

Attribution

Compliance (verifiable from code)

Files

Uh oh!

Karen042009 commented May 1, 2026

Uh oh!

squ11z1 commented May 1, 2026

Uh oh!

Karen042009 commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants