Skip to content

Bias quantization for prequantized checkpoints #1821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jackzhxng opened this issue Mar 3, 2025 · 0 comments · Fixed by #1845
Closed

Bias quantization for prequantized checkpoints #1821

jackzhxng opened this issue Mar 3, 2025 · 0 comments · Fixed by #1845
Assignees
Labels
enhancement New feature or request quantize

Comments

@jackzhxng
Copy link

Following up from a chat with @jainapurva

For private internal model enablement purposes, we would like to request support for bias quantization in prequantized checkpoint loading. At the moment we are doing manual source transformation after loading the prequantized checkpoint here https://github.com/pytorch/executorch/blob/main/examples/models/llama/source_transformation/pre_quantization.py#L40 into your deprecated Int8DynActInt4WeightLinear, which doesn't support bias quantization.

@jainapurva jainapurva added enhancement New feature or request quantize labels Mar 5, 2025
andrewor14 added a commit that referenced this issue Mar 5, 2025
**Summary:** Previously, when we see a linear with bias, we simply
do not swap it to `Int8DynActInt4WeightLinear` and leave it as is.
Now we do swap it, but bias is not quantized and passed to F.linear
in full precision.

Fixes #1821

**Test Plan:**
python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias
andrewor14 added a commit that referenced this issue Mar 5, 2025
**Summary:** Previously, when we see a linear with bias, we simply
do not swap it to `Int8DynActInt4WeightLinear` and leave it as is.
Now we do swap it, but bias is not quantized and passed to F.linear
in full precision.

Fixes #1821

**Test Plan:**
python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias
andrewor14 added a commit that referenced this issue Mar 5, 2025
**Summary:** Previously, when we see a linear with bias, we simply
do not swap it to `Int8DynActInt4WeightLinear` and leave it as is.
Now we do swap it, but bias is not quantized and passed to F.linear
in full precision.

Fixes #1821

**Test Plan:**
python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias
andrewor14 added a commit that referenced this issue Mar 6, 2025
**Summary:** Previously, when we see a linear with bias, we simply
do not swap it to `Int8DynActInt4WeightLinear` and leave it as is.
Now we do swap it, but bias is not quantized and passed to F.linear
in full precision.

Fixes #1821

**Test Plan:**
python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias
@andrewor14 andrewor14 self-assigned this Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request quantize
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants