-
Notifications
You must be signed in to change notification settings - Fork 257
Bias quantization for prequantized checkpoints #1821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
andrewor14
added a commit
that referenced
this issue
Mar 5, 2025
**Summary:** Previously, when we see a linear with bias, we simply do not swap it to `Int8DynActInt4WeightLinear` and leave it as is. Now we do swap it, but bias is not quantized and passed to F.linear in full precision. Fixes #1821 **Test Plan:** python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias
andrewor14
added a commit
that referenced
this issue
Mar 5, 2025
**Summary:** Previously, when we see a linear with bias, we simply do not swap it to `Int8DynActInt4WeightLinear` and leave it as is. Now we do swap it, but bias is not quantized and passed to F.linear in full precision. Fixes #1821 **Test Plan:** python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias
andrewor14
added a commit
that referenced
this issue
Mar 5, 2025
**Summary:** Previously, when we see a linear with bias, we simply do not swap it to `Int8DynActInt4WeightLinear` and leave it as is. Now we do swap it, but bias is not quantized and passed to F.linear in full precision. Fixes #1821 **Test Plan:** python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias
andrewor14
added a commit
that referenced
this issue
Mar 6, 2025
**Summary:** Previously, when we see a linear with bias, we simply do not swap it to `Int8DynActInt4WeightLinear` and leave it as is. Now we do swap it, but bias is not quantized and passed to F.linear in full precision. Fixes #1821 **Test Plan:** python test/quantization/test_quant_api.py -k test_8da4w_quantizer_linear_bias
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Following up from a chat with @jainapurva
For private internal model enablement purposes, we would like to request support for bias quantization in prequantized checkpoint loading. At the moment we are doing manual source transformation after loading the prequantized checkpoint here https://github.com/pytorch/executorch/blob/main/examples/models/llama/source_transformation/pre_quantization.py#L40 into your deprecated
Int8DynActInt4WeightLinear
, which doesn't support bias quantization.The text was updated successfully, but these errors were encountered: