Skip to content

Add bias support for Int8DynActInt4WeightLinear #1845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 10, 2025
Merged

Conversation

andrewor14
Copy link
Contributor

Summary: Previously, when we see a linear with bias, we simply do not swap it to Int8DynActInt4WeightLinear and leave it as is. Now we do swap it, but bias is not quantized and passed to F.linear in full precision.

Fixes #1821

Test Plan:
python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias

Copy link

pytorch-bot bot commented Mar 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1845

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1379cce with merge base ffb4350 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 5, 2025
@andrewor14 andrewor14 added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Mar 5, 2025
**Summary:** Previously, when we see a linear with bias, we simply
do not swap it to `Int8DynActInt4WeightLinear` and leave it as is.
Now we do swap it, but bias is not quantized and passed to F.linear
in full precision.

Fixes #1821

**Test Plan:**
python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias
@jainapurva
Copy link
Contributor

@andrewor14 By bias quantization, does it mean quantized bias, or just support for bias in quantization?

@andrewor14
Copy link
Contributor Author

@andrewor14 By bias quantization, does it mean quantized bias, or just support for bias in quantization?

Right now this doesn't support quantized bias, just loading a quantized model where the linear has a non-existent bias. I think we can add actual bias quantization later. In general quantizing the bias doesn't seem to be very common, however, as the benefits are not super significant

Copy link

@jackzhxng jackzhxng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this PR, we will be able to represent unquantized bias with Int8DynActInt4WeightLinear?

@andrewor14
Copy link
Contributor Author

So with this PR, we will be able to represent unquantized bias with Int8DynActInt4WeightLinear?

Yep

@andrewor14 andrewor14 merged commit f64d5a1 into main Mar 10, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bias quantization for prequantized checkpoints
4 participants