Skip to content

WIP: FreqGPTQ + GatedDeltaNet + Adaptive Quantization#1721

Closed
OlesStankevych wants to merge 1 commit intoopenai:mainfrom
OlesStankevych:freqgptq-gateddeltanet
Closed

WIP: FreqGPTQ + GatedDeltaNet + Adaptive Quantization#1721
OlesStankevych wants to merge 1 commit intoopenai:mainfrom
OlesStankevych:freqgptq-gateddeltanet

Conversation

@OlesStankevych
Copy link
Copy Markdown

Summary

  • Built on PR Record: GatedDeltaNet (FLA) + Legal Score-First TTT — val_bpb 1.00995 (3-seed mean) #1698 (GatedDeltaNet + Legal TTT, 1.00995 BPB)
  • FreqGPTQ: frequency-weighted Hessian calibration — top-100 tokens get 2x weight in GPTQ
  • PassthroughQuant: int8 for control tensors instead of fp16 (~40KB savings)
  • Sandwich quantization: int8 for final block to protect LM head signal
  • Adaptive embedding precision: int8 for top-100 frequent tokens, intN for rest
  • Configurable Int5/6 GPTQ with synced Late QAT clip range
  • LZMA self-extracting wrapper: ~73KB savings for model budget

Status

WIP — code complete, pending GPU validation. Will update with BPB results and 3-seed logs once compute is available.

Test plan

Built on PR openai#1698 (GatedDeltaNet + Legal TTT). Adds:
- FreqGPTQ: frequency-weighted Hessian calibration for GPTQ
- PassthroughQuant: int8 for control tensors (saves ~40KB)
- Sandwich quantization: int8 for final block
- Adaptive embedding precision: int8 top-100 / intN rest
- Configurable Int5/6 GPTQ with synced QAT
- LZMA wrapper saves ~73KB

Pending GPU validation for BPB results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@OlesStankevych
Copy link
Copy Markdown
Author

Closing — pushed from wrong Git user. Will re-submit from correct account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant