-
Notifications
You must be signed in to change notification settings - Fork 257
Add bias support for Int8DynActInt4WeightLinear #1845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1845
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1379cce with merge base ffb4350 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
**Summary:** Previously, when we see a linear with bias, we simply do not swap it to `Int8DynActInt4WeightLinear` and leave it as is. Now we do swap it, but bias is not quantized and passed to F.linear in full precision. Fixes #1821 **Test Plan:** python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias
@andrewor14 By bias quantization, does it mean quantized bias, or just support for bias in quantization? |
Right now this doesn't support quantized bias, just loading a quantized model where the linear has a non-existent bias. I think we can add actual bias quantization later. In general quantizing the bias doesn't seem to be very common, however, as the benefits are not super significant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with this PR, we will be able to represent unquantized bias with Int8DynActInt4WeightLinear?
Yep |
Summary: Previously, when we see a linear with bias, we simply do not swap it to
Int8DynActInt4WeightLinear
and leave it as is. Now we do swap it, but bias is not quantized and passed to F.linear in full precision.Fixes #1821
Test Plan:
python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias