-
Notifications
You must be signed in to change notification settings - Fork 260
Expose zero_point_domain as arguments #1401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1401
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi @airMeng! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
torchao/quantization/quant_api.py
Outdated
@@ -630,7 +630,8 @@ def int8_dynamic_activation_int4_weight( | |||
|
|||
|
|||
def int4_weight_only( | |||
group_size=128, layout=TensorCoreTiledLayout(inner_k_tiles=8), use_hqq=False | |||
group_size=128, layout=TensorCoreTiledLayout(inner_k_tiles=8), use_hqq=False, | |||
zero_point_domain=ZeroPointDomain.FLOAT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to expose this? if each layout is tied to certain settings, maybe we can do something similar to:
ao/torchao/quantization/quant_api.py
Line 675 in 039cef4
if isinstance(layout, MarlinSparseLayout): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are also brainstorming to how to best structure these settings as well, we might be reducing some of the options here in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think layout is for weight, while zero_point_domain is for zero points. So it would be better to decouple it.
But I agree we can structure these things, better to align with the most common recipes while maintaining the capabilities to let users extend themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I understand that they are separate. it's more of we are using layout as a tool to select kernels right now I think, e.g. the sparse layout example that I linked, user don't need to worry about making all the settings correct, just need to specify the sparse layout and the rest is set automatically. It is a bit confusing, but we can think about how to make it clearer in the future, one alternative I can think of is just expose a top level api for each kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good overall, please add some assert in int4_weight_only
to make it clearer what is supported.
another thing I want to mention here is that we are actually thinking of separating out the preserve_zero=False and zero_point_domain=FLOAT code path from tinygemm ( ao/torchao/quantization/quant_primitives.py Line 914 in 039cef4
ao/torchao/quantization/quant_primitives.py Line 913 in 039cef4
|
@jerryzh168 assert added, could you give a review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requesting changes to the default value of zero_point_domain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks!
@facebook-github-bot label "topic: improvement" |
@@ -630,7 +643,8 @@ def int8_dynamic_activation_int4_weight( | |||
|
|||
|
|||
def int4_weight_only( | |||
group_size=128, layout=TensorCoreTiledLayout(inner_k_tiles=8), use_hqq=False | |||
group_size=128, layout=TensorCoreTiledLayout(inner_k_tiles=8), use_hqq=False, | |||
zero_point_domain=None | |||
): | |||
""" | |||
Applies uint4 weight-only asymmetric per-group quantization to linear layers, using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please update docs for zero_point_domain before landing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update in 51a4505
torchao/quantization/README.md
Outdated
@@ -202,6 +202,17 @@ We also have a unified quantized tensor subclass that implements how to get a qu | |||
#### Layouts | |||
We extended the `layout` concept to represent different packing formats for a tensor. `AffineQuantizedTensor` supports `plain` and `tensor_core_tiled` layout. `plain` layout is used for `int8_weight_only` and `int8_dynamic_activation_int8_weight` and also as a default layout. `tensor_core_tiled` layout is used for `int4_weight_only` quantization and is packing the weights in a format that is compatible with tinygemm [int4mm](https://github.com/pytorch/pytorch/blob/39357ba06f48cda7d293a4995aa5eba2a46598b5/aten/src/ATen/native/native_functions.yaml#L4138) kernels. | |||
|
|||
### Zero Point Domains | |||
```ZeroPointDomain``` is used to control the data types of zero points. ```None``` represents symmetric quantization, while ```ZeroPointDomain.FLOAT``` and ```ZeroPointDomain.INT``` indicate asymmetric quantization. For detailed implementation of different zero point data types, refer to [the reference implementation]((../../test/quantization/test_quant_primitives.py)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have MappingType to mean symmetric or asymmetric quant, zero_point_domain is referring to zero_point, None means zero_point is None, FLOAT means zero_point is in the floating point domain and INT means integer domain, please see:
class ZeroPointDomain(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torchao/quantization/quant_api.py
Outdated
@@ -650,6 +664,7 @@ def int4_weight_only( | |||
size is more fine grained, choices are [256, 128, 64, 32] | |||
`layout`: layout type for quantized tensor, default is `TensorCoreTiledLayout(inner_k_tiles=8)` | |||
`use_hqq`: whether to use hqq or default quantization mode, default is False | |||
`zero_point_domain`: data type of zeros points, choices are [None(default), ZeroPointDomain.FLOAT, ZeroPointDomain.INT, ZeroPointDomain.NONE] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: probably adding that None
means we'll set zero_point_domain based on layout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryzh168 anything more before merged? |
No we can merge now |
* export zero_point_domain as arguments * assert for combination of TensorCoreTiledLayout and integer zero points * change the default zero_point_domain to None * maintain layout and zero_point_domain in a dict * nit * fix key errors * nit * add zero_point_domian arguments in documents * update documemts * Apply automatic Ruff fixes --------- Co-authored-by: Ruff Auto-fixes <[email protected]>
As discussed in #1264, expose zero_point_domain to the users.
However, to run the model E2E with integer zero points, we need to separate scales and zero points tensors. The first PR to enable this is pytorch/pytorch#137566
@jgong5 @mingfeima @leslie-fang-intel @liangan1