Skip to content

Expose zero_point_domain as arguments #1401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 17, 2024
Merged

Expose zero_point_domain as arguments #1401

merged 10 commits into from
Dec 17, 2024

Conversation

airMeng
Copy link
Collaborator

@airMeng airMeng commented Dec 11, 2024

As discussed in #1264, expose zero_point_domain to the users.

from torchao.quantization.quant_primitives import ZeroPointDomain
from torchao.quantization.quant_api import (
    quantize_,
    int4_weight_only,
)
quantize_(m, int4_weight_only(zero_point_domain=...))

However, to run the model E2E with integer zero points, we need to separate scales and zero points tensors. The first PR to enable this is pytorch/pytorch#137566

@jgong5 @mingfeima @leslie-fang-intel @liangan1

Copy link

pytorch-bot bot commented Dec 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1401

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@airMeng airMeng marked this pull request as draft December 11, 2024 07:12
@facebook-github-bot
Copy link
Contributor

Hi @airMeng!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 11, 2024
@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@airMeng airMeng marked this pull request as ready for review December 12, 2024 00:57
@@ -630,7 +630,8 @@ def int8_dynamic_activation_int4_weight(


def int4_weight_only(
group_size=128, layout=TensorCoreTiledLayout(inner_k_tiles=8), use_hqq=False
group_size=128, layout=TensorCoreTiledLayout(inner_k_tiles=8), use_hqq=False,
zero_point_domain=ZeroPointDomain.FLOAT
Copy link
Contributor

@jerryzh168 jerryzh168 Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to expose this? if each layout is tied to certain settings, maybe we can do something similar to:

if isinstance(layout, MarlinSparseLayout):
for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are also brainstorming to how to best structure these settings as well, we might be reducing some of the options here in the future

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think layout is for weight, while zero_point_domain is for zero points. So it would be better to decouple it.
But I agree we can structure these things, better to align with the most common recipes while maintaining the capabilities to let users extend themselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I understand that they are separate. it's more of we are using layout as a tool to select kernels right now I think, e.g. the sparse layout example that I linked, user don't need to worry about making all the settings correct, just need to specify the sparse layout and the rest is set automatically. It is a bit confusing, but we can think about how to make it clearer in the future, one alternative I can think of is just expose a top level api for each kernel.

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall, please add some assert in int4_weight_only to make it clearer what is supported.

@jerryzh168
Copy link
Contributor

another thing I want to mention here is that we are actually thinking of separating out the preserve_zero=False and zero_point_domain=FLOAT code path from tinygemm (

zero_point = min_val_neg + scale * mid_point
) to a different op since it complicates the arguments everywhere and it is specific to tinygemm, e.g. the mid_point:
mid_point = (quant_max + quant_min + 1) / 2
is because of tinygemm kernel does a dtype conversion in the kernel. after that what top level argument we use for FLOAT zero point domain is still TBD

@airMeng
Copy link
Collaborator Author

airMeng commented Dec 12, 2024

@jerryzh168 assert added, could you give a review?

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requesting changes to the default value of zero_point_domain

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks!

@airMeng
Copy link
Collaborator Author

airMeng commented Dec 13, 2024

@facebook-github-bot label "topic: improvement"

@@ -630,7 +643,8 @@ def int8_dynamic_activation_int4_weight(


def int4_weight_only(
group_size=128, layout=TensorCoreTiledLayout(inner_k_tiles=8), use_hqq=False
group_size=128, layout=TensorCoreTiledLayout(inner_k_tiles=8), use_hqq=False,
zero_point_domain=None
):
"""
Applies uint4 weight-only asymmetric per-group quantization to linear layers, using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please update docs for zero_point_domain before landing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update in 51a4505

@@ -202,6 +202,17 @@ We also have a unified quantized tensor subclass that implements how to get a qu
#### Layouts
We extended the `layout` concept to represent different packing formats for a tensor. `AffineQuantizedTensor` supports `plain` and `tensor_core_tiled` layout. `plain` layout is used for `int8_weight_only` and `int8_dynamic_activation_int8_weight` and also as a default layout. `tensor_core_tiled` layout is used for `int4_weight_only` quantization and is packing the weights in a format that is compatible with tinygemm [int4mm](https://github.com/pytorch/pytorch/blob/39357ba06f48cda7d293a4995aa5eba2a46598b5/aten/src/ATen/native/native_functions.yaml#L4138) kernels.

### Zero Point Domains
```ZeroPointDomain``` is used to control the data types of zero points. ```None``` represents symmetric quantization, while ```ZeroPointDomain.FLOAT``` and ```ZeroPointDomain.INT``` indicate asymmetric quantization. For detailed implementation of different zero point data types, refer to [the reference implementation]((../../test/quantization/test_quant_primitives.py)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have MappingType to mean symmetric or asymmetric quant, zero_point_domain is referring to zero_point, None means zero_point is None, FLOAT means zero_point is in the floating point domain and INT means integer domain, please see:

class ZeroPointDomain(Enum):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -650,6 +664,7 @@ def int4_weight_only(
size is more fine grained, choices are [256, 128, 64, 32]
`layout`: layout type for quantized tensor, default is `TensorCoreTiledLayout(inner_k_tiles=8)`
`use_hqq`: whether to use hqq or default quantization mode, default is False
`zero_point_domain`: data type of zeros points, choices are [None(default), ZeroPointDomain.FLOAT, ZeroPointDomain.INT, ZeroPointDomain.NONE]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably adding that None means we'll set zero_point_domain based on layout

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@airMeng
Copy link
Collaborator Author

airMeng commented Dec 17, 2024

@jerryzh168 anything more before merged?

@jerryzh168
Copy link
Contributor

No we can merge now

@jerryzh168 jerryzh168 merged commit ace7219 into pytorch:main Dec 17, 2024
3 checks passed
@airMeng airMeng deleted the int_zp branch December 17, 2024 01:14
amdfaa pushed a commit that referenced this pull request Jan 10, 2025
* export zero_point_domain as arguments

* assert for combination of TensorCoreTiledLayout and integer zero points

* change the default zero_point_domain to None

* maintain layout and zero_point_domain in a dict

* nit

* fix key errors

* nit

* add zero_point_domian arguments in documents

* update documemts

* Apply automatic Ruff fixes

---------

Co-authored-by: Ruff Auto-fixes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants