Skip to content

Non-Record: PR #1901 base + LQER Asymmetric + Brotli/Byte-Shuffle Compression#1927

Open
squ11z1 wants to merge 5 commits intoopenai:mainfrom
squ11z1:non-record-pr1901-lqer-brotli
Open

Non-Record: PR #1901 base + LQER Asymmetric + Brotli/Byte-Shuffle Compression#1927
squ11z1 wants to merge 5 commits intoopenai:mainfrom
squ11z1:non-record-pr1901-lqer-brotli

Conversation

@squ11z1
Copy link
Copy Markdown

@squ11z1 squ11z1 commented Apr 29, 2026

Summary

Non-record submission proposing two orthogonal additions to PR #1901's stack (DualHash + AdaMuon + MoE + SDClip, val_bpb 0.83353 pending):

  1. LQER asymmetric rank-4 post-quantization correction (port from PR Record: PR #1787 base + Smear Gate + LQER Asym — val_bpb 1.06157 #1797 @dexhunter, first application to a Sigma-Delta-quantized stack).
  2. Brotli-11 + stride-2 byte-shuffle replacing LZMA (idea from PR Record: SP8192 + LQER + Sparse Attn Gate + BOS-Fixed SmearGate + 9-Hparam Greedy Stack — val_bpb 1.06108 (3-seed mean) #1855).

Patched train_gpt.py is LZMA-base85-wrapped (18,204 bytes vs PR #1901's 53,443 raw — 65.9% code-byte saving).

Status: non-record

A $25 starter grant + remaining personal balance funded two single-seed bid attempts on 8×H100 SXM. Both were preempted before producing an artifact:

A $500 development grant filed 2026-04-27 did not return a decision before deadline. Submitting as non-record discussion: implementation + theoretical δ-BPB estimate, no measured val_bpb.

Theoretical δ-BPB estimate

Contribution Mechanism Estimated δ
LQER asym rank-4 top-K=2 INT2 A + INT4 B per-group-64 SVD factors recover Sigma-Delta residual −0.002 to −0.005 BPB
Brotli-11 + byte-shuffle ~150–280 KB compression saving → larger model −0.002 to −0.005 BPB
Combined −0.005 to −0.010 BPB

Projected on PR #1901 base 0.83353: 0.823–0.829 BPB.

LQER δ is conservatively lower than PR #1797's measured −0.009 BPB on Hessian-GPTQ, because Sigma-Delta error diffusion auto-compensates within-row error (smaller residual variance → less for LQER to recover).

Test plan

  • Patch applies cleanly to PR Record: 0.8335 BPB — DualHash + AdaMuon + MoE + SDClip (3-seed mean) #1901 (function-level replacement, syntax check)
  • LZMA-base85 wrapper round-trips (compile + decompress identity verified)
  • Patched code launches on 8×H100 SXM (verified up to data prefetch in partial_run_2026-04-29.log)
  • Pending: 3-seed val_bpb on 8×H100 SXM with full 600s training cap
  • Pending: artifact size verification under 16 MB
  • Pending: ablation LQER_TOP_K ∈ {1, 2, 3} × LQER_RANK ∈ {2, 4, 8}
  • Pending: Brotli vs LZMA artifact size A/B on identical model

If validated post-deadline, I commit to providing 3-seed logs as a follow-up update.

Attribution

Compliance (verifiable from code)

Files

  • README.md — submission documentation
  • submission.json — metadata with theoretical δ estimate (val_bpb fields null pending validation)
  • train_gpt.py — LZMA-wrapped patched code (18,204 bytes)
  • train_gpt_unwrapped.py — raw patched source for review
  • partial_run_2026-04-29.log — data-prefetch log up to preemption

@Karen042009
Copy link
Copy Markdown

Thanks for the submission and for building on top of PR #1901.

The ideas (LQER asym + Brotli/byte-shuffle) look interesting, but since there is no completed run or validated val_bpb yet, it’s hard to evaluate the real impact. Theoretical estimates alone aren’t enough for merging.

Before this can be considered, we would need:

Also, since this is marked as non-record and the runs were preempted, I’d treat this as an experimental direction for now rather than something ready to merge.

Happy to take another look once you have full results.

@squ11z1
Copy link
Copy Markdown
Author

squ11z1 commented May 1, 2026

Thanks for the careful review @Karen042009 — completely agree that a theoretical estimate doesn't merit merging.

A few notes on where things stand:

Validation is the binding constraint. The patch itself is small and round-trips cleanly through compile(); it reaches the warmup phase on 8×H100 SXM (verified in partial_run_2026-04-29.log). Both bid-pod attempts preempted before producing an artifact — the 2026-04-29 run died at 50% of HF data prefetch, well before training started. Without an artifact I can't honestly compare against the PR #1901 baseline, and I won't pretend otherwise.

Compute access: I filed two compute-grant requests through the official form — first on 2026-04-27 (development tier, $500), and a follow-up on 2026-04-29 after this PR was published — neither returned a decision before the deadline. With $1.49 of personal balance left as I'm writing this, a clean 3-seed 8×H100 SXM run isn't feasible from my side this week.

What I'll do when compute does become available: run 3 seeds against your PR #1901 baseline on identical hardware, post per-seed val_bpb + artifact sizes here, and run the LQER_TOP_K × LQER_RANK ablation. If the projected −0.005 to −0.010 BPB doesn't materialize, I'll close this PR — same standard you applied. The integration risk lives entirely on the validation step.

Happy to keep this open as "experimental direction" in the meantime — if anyone reading wants to test the LQER-on-Sigma-Delta combination independently, the patched train_gpt.py is self-contained and reproducible. Will ping you here when measurements land.

@Karen042009
Copy link
Copy Markdown

Thanks for the detailed clarification.

The situation is clear, especially regarding compute constraints, and I appreciate that you’re being transparent about the validation status without overclaiming results.

The plan for proper 3-seed runs and direct comparison against the PR #1901 baseline makes sense and would provide a solid basis for evaluation.

For now, I agree this should remain an experimental direction. However, as mentioned before, a merge cannot be considered without fully measured and validated results.

I’ll wait for an update once you have the full validation data available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants