RECORD: 1855 base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean)#1918
Closed
aquariouseworkman wants to merge 3 commits intoopenai:mainfrom
Closed
RECORD: 1855 base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean)#1918aquariouseworkman wants to merge 3 commits intoopenai:mainfrom
aquariouseworkman wants to merge 3 commits intoopenai:mainfrom
Conversation
…symmetric + Phased TTT val_bpb = 1.06128 | ~15.95 MB | 8xH100 SXM Key Change: SmearGate BOS Document Boundary Fix Builds on PR openai#1797 stack (PR openai#1787 base + SmearGate + LQER Asymmetric) but fixes the SmearGate cross-document leakage bug identified by @cocohearts in PR openai#1797 audit. The bug: SmearGate 1-token causal lookback does not mask BOS positions, so the final token of document N smears into BOS of document N+1. Credits @nprime06 -- PR openai#1787 base stack @romeerp -- CaseOps transform (PR openai#1729) @dexhunter -- SmearGate + LQER (PR openai#1797) @cocohearts -- Identifying SmearGate BOS bug @abaybektursun -- Score-first TTT (PR openai#549) @clarkkev -- GPTQ SDClip + SP8192 (PR openai#1394)
…d mean) Applies activation-aware mixed-precision GPTQ (from PR openai#1908 / romeerp) on top of codemath3000 PR openai#1855 stack. ## Results | Seed | val_bpb (post-TTT) | artifact bytes | steps | eval time | |------|--------------------|----------------|-------|-----------| | 42 | 1.06118 | 15,978,503 | 4989 | 392.8s | | 314 | 1.06005 | 15,976,469 | 4986 | 395.8s | | 1234 | 1.06135 | 15,976,673 | 4977 | 395.5s | | **mean** | **1.06086** | — | — | — | 3-seed std: 0.00069. Beats codemath3000 PR openai#1855 (1.06108) by 0.00022 BPB. ## Technique Training is identical to PR openai#1855. The only change is post-training quantization: **AWQ-lite (activation-aware GPTQ):** 1. Collect per-input-channel activation RMS during GPTQ calibration 2. Score column groups: `saliency = act_rms * mean(abs(weight))` 3. Select top-1 most salient 64-column group per matrix 4. Quantize that group at int8 inside the same full-tensor GPTQ solve (rest stays int6) Env vars: `AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64` ## Setup 1. `pip install -r requirements.txt` 2. `apt-get install -y lrzip` 3. Install FA3: `pip install --no-deps flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/` 4. Run `prepare_caseops_data.py` to build the dataset 5. `AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64 torchrun --standalone --nproc_per_node=8 train_gpt.py` ## Environment - 8xH100 80GB SXM (RunPod) - PyTorch 2.9.1+cu128 - FlashAttention 3.0.0 - Triton 3.5.1
You need to outperform by 0.005 BPB: https://github.com/openai/parameter-golf#submission-process |
Contributor
Author
|
The code is byte-for-byte identical as #1908. I was just able to validate due to resources only, which Romeerp could not obtain. Due to his only limitation being the ability to acquire GPU resources and not ability to develop working code, the record obtained from this merge should go to Romeerp |
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
Apr 29, 2026
- spec 060N: compound AWQ-lite (PR openai#1908) + 4 TTT phases + 3000 prefix + 2 global-SGD epochs, eval-only on 060A's final_model.pt. Single-shot compound to use openai#1918's ~205s eval-time slack; safe fallback drops GLOBAL_TTT_EPOCHS if wallclock blows. - new idea 1925-matrix-lr-ttt-prefix-tune (PR openai#1925, hyperparam-only on openai#1855: MATRIX_LR=0.028 + PHASED_TTT_PREFIX_DOCS=3500 → 1.06109). - new idea 1915-per-doc-lora-ttt (PR openai#1915, per-doc-only LoRA TTT discipline; parked as fallback if global-SGD class is ruled out). - frontier scan: 21 new PRs (openai#1906-openai#1931). Headline: PRs openai#1908+openai#1918 independently confirm AWQ-lite mixed-bit GPTQ pattern at ~1.0608 on openai#1855 base; openai#1925 hyperparam-only at 1.06109; openai#1923 Asymmetric Logit Rescale = empirical negative; openai#1929 banned SLOT+prequant-TTT. - frontier-state.json: 21 PRs added; total 200. - diary/2026-04-29-frontier-scan.md: full scan report. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Отправлено с iPhone29 апр. 2026 г., в 10:58, aquariouseworkman ***@***.***> написал(а):Applies activation-aware mixed-precision GPTQ (from PR #1908 / romeerp) on top of codemath3000 PR #1855 stack.
Results
Seed
val_bpb (post-TTT)
artifact bytes
steps
eval time
42
1.06118
15,978,503
4989
392.8s
314
1.06005
15,976,469
4986
395.8s
1234
1.06135
15,976,673
4977
395.5s
mean
1.06086
—
—
—
3-seed std: 0.00069. Beats codemath3000 PR #1855 (1.06108) by 0.00022 BPB.
Technique
Training is identical to PR #1855. The only change is post-training quantization:
AWQ-lite (activation-aware GPTQ):
Collect per-input-channel activation RMS during GPTQ calibration
Score column groups: saliency = act_rms * mean(abs(weight))
Select top-1 most salient 64-column group per matrix
Quantize that group at int8 inside the same full-tensor GPTQ solve (rest stays int6)
Env vars: AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64
Setup
pip install -r requirements.txt
apt-get install -y lrzip
Install FA3: pip install --no-deps flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/
Run prepare_caseops_data.py to build the dataset
AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64 torchrun --standalone --nproc_per_node=8 train_gpt.py
Environment
8xH100 80GB SXM (RunPod)
PyTorch 2.9.1+cu128
FlashAttention 3.0.0
Triton 3.5.1
You can view, comment on, or merge this pull request online at:
#1918
Commit Summary
58ee03b Record: SmearGate BOS Fix + PR #1787 Base + Smear Gate + LQER Asymmetric + Phased TTT
b346431 Record: base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean)
d5328fe Merge branch 'openai:main' into main
File Changes (17 files)
A
records/track_10min_16mb/2020-04-29_AWQ_lite_mixedprecision_GPTQ/README.md
(39)
A
records/track_10min_16mb/2020-04-29_AWQ_lite_mixedprecision_GPTQ/lossless_caps.py
(833)
A
records/track_10min_16mb/2020-04-29_AWQ_lite_mixedprecision_GPTQ/prepare_caseops_data.py
(177)
A
records/track_10min_16mb/2020-04-29_AWQ_lite_mixedprecision_GPTQ/requirements.txt
(13)
A
records/track_10min_16mb/2020-04-29_AWQ_lite_mixedprecision_GPTQ/submission.json
(25)
A
records/track_10min_16mb/2020-04-29_AWQ_lite_mixedprecision_GPTQ/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model
(0)
A
records/track_10min_16mb/2020-04-29_AWQ_lite_mixedprecision_GPTQ/train_gpt.py
(3998)
A
records/track_10min_16mb/2020-04-29_AWQ_lite_mixedprecision_GPTQ/train_seed1234.log
(947)
A
records/track_10min_16mb/2020-04-29_AWQ_lite_mixedprecision_GPTQ/train_seed314.log
(951)
A
records/track_10min_16mb/2020-04-29_AWQ_lite_mixedprecision_GPTQ/train_seed42.log
(950)
A
records/track_10min_16mb/2026-04-27_SmearGateBOSFix_PR1787Base_LQERAsym_PhasedTTT/README.md
(54)
A
records/track_10min_16mb/2026-04-27_SmearGateBOSFix_PR1787Base_LQERAsym_PhasedTTT/lossless_caps.py
(833)
A
records/track_10min_16mb/2026-04-27_SmearGateBOSFix_PR1787Base_LQERAsym_PhasedTTT/prepare_caseops_data.py
(177)
A
records/track_10min_16mb/2026-04-27_SmearGateBOSFix_PR1787Base_LQERAsym_PhasedTTT/submission.json
(30)
A
records/track_10min_16mb/2026-04-27_SmearGateBOSFix_PR1787Base_LQERAsym_PhasedTTT/tokenizers/fineweb_8192_bpe_lossless_caps_caseops_v1_reserved.model
(0)
A
records/track_10min_16mb/2026-04-27_SmearGateBOSFix_PR1787Base_LQERAsym_PhasedTTT/train_gpt.py
(3555)
A
records/track_10min_16mb/2026-04-27_SmearGateBOSFix_PR1787Base_LQERAsym_PhasedTTT/train_seed42.log
(833)
Patch Links:
https://github.com/openai/parameter-golf/pull/1918.patch
https://github.com/openai/parameter-golf/pull/1918.diff
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
This was referenced May 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Applies activation-aware mixed-precision GPTQ (from PR #1908 / romeerp) on top of codemath3000 PR #1855 stack.
Results
3-seed std: 0.00069. Beats codemath3000 PR #1855 (1.06108) by 0.00022 BPB.
Technique
Training is identical to PR #1855. The only change is post-training quantization:
AWQ-lite (activation-aware GPTQ):
saliency = act_rms * mean(abs(weight))Env vars:
AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64Setup
pip install -r requirements.txtapt-get install -y lrzippip install --no-deps flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/prepare_caseops_data.pyto build the datasetAWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64 torchrun --standalone --nproc_per_node=8 train_gpt.pyEnvironment