Expose zero_point_domain as arguments #1401

airMeng · 2024-12-11T07:12:30Z

As discussed in #1264, expose zero_point_domain to the users.

from torchao.quantization.quant_primitives import ZeroPointDomain
from torchao.quantization.quant_api import (
    quantize_,
    int4_weight_only,
)
quantize_(m, int4_weight_only(zero_point_domain=...))

However, to run the model E2E with integer zero points, we need to separate scales and zero points tensors. The first PR to enable this is pytorch/pytorch#137566

@jgong5 @mingfeima @leslie-fang-intel @liangan1

pytorch-bot · 2024-12-11T07:12:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1401

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-12-11T07:12:36Z

Hi @airMeng!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot · 2024-12-11T08:12:09Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

test/dtypes/test_affine_quantized.py

jerryzh168 · 2024-12-12T01:14:59Z

torchao/quantization/quant_api.py

@@ -630,7 +630,8 @@ def int8_dynamic_activation_int4_weight(


 def int4_weight_only(
-    group_size=128, layout=TensorCoreTiledLayout(inner_k_tiles=8), use_hqq=False
+    group_size=128, layout=TensorCoreTiledLayout(inner_k_tiles=8), use_hqq=False,
+    zero_point_domain=ZeroPointDomain.FLOAT


do we need to expose this? if each layout is tied to certain settings, maybe we can do something similar to:

ao/torchao/quantization/quant_api.py

Line 675 in 039cef4

if isinstance(layout, MarlinSparseLayout):

for now?

we are also brainstorming to how to best structure these settings as well, we might be reducing some of the options here in the future

I think layout is for weight, while zero_point_domain is for zero points. So it would be better to decouple it.
But I agree we can structure these things, better to align with the most common recipes while maintaining the capabilities to let users extend themselves.

yeah I understand that they are separate. it's more of we are using layout as a tool to select kernels right now I think, e.g. the sparse layout example that I linked, user don't need to worry about making all the settings correct, just need to specify the sparse layout and the rest is set automatically. It is a bit confusing, but we can think about how to make it clearer in the future, one alternative I can think of is just expose a top level api for each kernel.

torchao/quantization/quant_api.py

jerryzh168

looks good overall, please add some assert in int4_weight_only to make it clearer what is supported.

jerryzh168 · 2024-12-12T02:20:42Z

another thing I want to mention here is that we are actually thinking of separating out the preserve_zero=False and zero_point_domain=FLOAT code path from tinygemm (

ao/torchao/quantization/quant_primitives.py

Line 914 in 039cef4

zero_point = min_val_neg + scale * mid_point

) to a different op since it complicates the arguments everywhere and it is specific to tinygemm, e.g. the mid_point:

ao/torchao/quantization/quant_primitives.py

Line 913 in 039cef4

mid_point = (quant_max + quant_min + 1) / 2

is because of tinygemm kernel does a dtype conversion in the kernel. after that what top level argument we use for FLOAT zero point domain is still TBD

airMeng · 2024-12-12T14:33:10Z

@jerryzh168 assert added, could you give a review?

torchao/quantization/quant_api.py

jerryzh168

requesting changes to the default value of zero_point_domain

torchao/quantization/quant_api.py

jerryzh168

looks good, thanks!

torchao/quantization/quant_api.py

test/quantization/test_quant_primitives.py

airMeng · 2024-12-13T05:48:22Z

@facebook-github-bot label "topic: improvement"

torchao/quantization/quant_api.py

jerryzh168 · 2024-12-14T00:45:45Z

torchao/quantization/quant_api.py

@@ -630,7 +643,8 @@ def int8_dynamic_activation_int4_weight(


 def int4_weight_only(
-    group_size=128, layout=TensorCoreTiledLayout(inner_k_tiles=8), use_hqq=False
+    group_size=128, layout=TensorCoreTiledLayout(inner_k_tiles=8), use_hqq=False,
+    zero_point_domain=None
 ):
    """
    Applies uint4 weight-only asymmetric per-group quantization to linear layers, using


nit: please update docs for zero_point_domain before landing

update in 51a4505

jerryzh168 · 2024-12-15T06:44:58Z

torchao/quantization/README.md

@@ -202,6 +202,17 @@ We also have a unified quantized tensor subclass that implements how to get a qu
 #### Layouts
 We extended the `layout` concept to represent different packing formats for a tensor. `AffineQuantizedTensor` supports `plain` and `tensor_core_tiled` layout. `plain` layout is used for `int8_weight_only` and `int8_dynamic_activation_int8_weight` and also as a default layout. `tensor_core_tiled` layout is used for `int4_weight_only` quantization and is packing the weights in a format that is compatible with tinygemm [int4mm](https://github.com/pytorch/pytorch/blob/39357ba06f48cda7d293a4995aa5eba2a46598b5/aten/src/ATen/native/native_functions.yaml#L4138) kernels.

+### Zero Point Domains
+```ZeroPointDomain``` is used to control the data types of zero points. ```None``` represents symmetric quantization, while ```ZeroPointDomain.FLOAT``` and ```ZeroPointDomain.INT``` indicate asymmetric quantization. For detailed implementation of different zero point data types, refer to [the reference implementation]((../../test/quantization/test_quant_primitives.py)).


we have MappingType to mean symmetric or asymmetric quant, zero_point_domain is referring to zero_point, None means zero_point is None, FLOAT means zero_point is in the floating point domain and INT means integer domain, please see:

ao/torchao/quantization/quant_primitives.py

Line 68 in 46b8796

class ZeroPointDomain(Enum):

jerryzh168 · 2024-12-15T06:45:57Z

torchao/quantization/quant_api.py

@@ -650,6 +664,7 @@ def int4_weight_only(
         size is more fine grained, choices are [256, 128, 64, 32]
        `layout`: layout type for quantized tensor, default is `TensorCoreTiledLayout(inner_k_tiles=8)`
        `use_hqq`: whether to use hqq or default quantization mode, default is False
+        `zero_point_domain`: data type of zeros points, choices are [None(default), ZeroPointDomain.FLOAT, ZeroPointDomain.INT, ZeroPointDomain.NONE]


nit: probably adding that None means we'll set zero_point_domain based on layout

airMeng · 2024-12-17T00:59:03Z

@jerryzh168 anything more before merged?

jerryzh168 · 2024-12-17T01:11:15Z

No we can merge now

* export zero_point_domain as arguments * assert for combination of TensorCoreTiledLayout and integer zero points * change the default zero_point_domain to None * maintain layout and zero_point_domain in a dict * nit * fix key errors * nit * add zero_point_domian arguments in documents * update documemts * Apply automatic Ruff fixes --------- Co-authored-by: Ruff Auto-fixes <[email protected]>

airMeng marked this pull request as draft December 11, 2024 07:12

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 11, 2024

airMeng force-pushed the int_zp branch from 30ee6c4 to 7a78567 Compare December 11, 2024 12:59

airMeng marked this pull request as ready for review December 12, 2024 00:57

jerryzh168 reviewed Dec 12, 2024

View reviewed changes

test/dtypes/test_affine_quantized.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Dec 12, 2024

View reviewed changes

torchao/quantization/quant_api.py Show resolved Hide resolved

jerryzh168 approved these changes Dec 12, 2024

View reviewed changes

airMeng force-pushed the int_zp branch from 3dc920f to 50be736 Compare December 12, 2024 14:17

export zero_point_domain as arguments

50be736

assert for combination of TensorCoreTiledLayout and integer zero points

1db5ee3

jerryzh168 reviewed Dec 12, 2024

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

jerryzh168 requested changes Dec 12, 2024

View reviewed changes

change the default zero_point_domain to None

1dbf9aa

jerryzh168 reviewed Dec 13, 2024

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Dec 13, 2024

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

jerryzh168 approved these changes Dec 13, 2024

View reviewed changes

maintain layout and zero_point_domain in a dict

17b8bf1

jerryzh168 reviewed Dec 13, 2024

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Dec 13, 2024

View reviewed changes

test/quantization/test_quant_primitives.py Outdated Show resolved Hide resolved

nit

e4a28a8

jerryzh168 reviewed Dec 13, 2024

View reviewed changes

torchao/quantization/quant_api.py Outdated Show resolved Hide resolved

airMeng added 2 commits December 13, 2024 14:19

fix key errors

f384133

nit

836d508

jerryzh168 reviewed Dec 14, 2024

View reviewed changes

airMeng force-pushed the int_zp branch from b5ce275 to 51a4505 Compare December 14, 2024 11:37

add zero_point_domian arguments in documents

51a4505

jerryzh168 reviewed Dec 15, 2024

View reviewed changes

airMeng and others added 2 commits December 16, 2024 09:21

update documemts

e4237e0

Apply automatic Ruff fixes

5496c4c

jerryzh168 merged commit ace7219 into pytorch:main Dec 17, 2024
3 checks passed

airMeng deleted the int_zp branch December 17, 2024 01:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose zero_point_domain as arguments #1401

Expose zero_point_domain as arguments #1401

airMeng commented Dec 11, 2024 •

edited

Loading

pytorch-bot bot commented Dec 11, 2024 •

edited

Loading

facebook-github-bot commented Dec 11, 2024

facebook-github-bot commented Dec 11, 2024

jerryzh168 Dec 12, 2024 •

edited

Loading

jerryzh168 Dec 12, 2024

airMeng Dec 12, 2024

jerryzh168 Dec 12, 2024

jerryzh168 left a comment

jerryzh168 commented Dec 12, 2024

airMeng commented Dec 12, 2024

jerryzh168 left a comment

jerryzh168 left a comment

airMeng commented Dec 13, 2024

jerryzh168 Dec 14, 2024

airMeng Dec 14, 2024

jerryzh168 Dec 15, 2024

airMeng Dec 16, 2024

jerryzh168 Dec 15, 2024

airMeng Dec 16, 2024

airMeng commented Dec 17, 2024

jerryzh168 commented Dec 17, 2024

Expose zero_point_domain as arguments #1401

Expose zero_point_domain as arguments #1401

Conversation

airMeng commented Dec 11, 2024 • edited Loading

pytorch-bot bot commented Dec 11, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1401

facebook-github-bot commented Dec 11, 2024

Action Required

Process

facebook-github-bot commented Dec 11, 2024

jerryzh168 Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerryzh168 left a comment

Choose a reason for hiding this comment

jerryzh168 commented Dec 12, 2024

airMeng commented Dec 12, 2024

jerryzh168 left a comment

Choose a reason for hiding this comment

jerryzh168 left a comment

Choose a reason for hiding this comment

airMeng commented Dec 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

airMeng commented Dec 17, 2024

jerryzh168 commented Dec 17, 2024

airMeng commented Dec 11, 2024 •

edited

Loading

pytorch-bot bot commented Dec 11, 2024 •

edited

Loading

jerryzh168 Dec 12, 2024 •

edited

Loading