Skip to content

Commit 51a4505

Browse files
committed
add zero_point_domian arguments in documents
1 parent 836d508 commit 51a4505

File tree

2 files changed

+15
-1
lines changed

2 files changed

+15
-1
lines changed

torchao/quantization/README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,17 @@ We also have a unified quantized tensor subclass that implements how to get a qu
202202
#### Layouts
203203
We extended the `layout` concept to represent different packing formats for a tensor. `AffineQuantizedTensor` supports `plain` and `tensor_core_tiled` layout. `plain` layout is used for `int8_weight_only` and `int8_dynamic_activation_int8_weight` and also as a default layout. `tensor_core_tiled` layout is used for `int4_weight_only` quantization and is packing the weights in a format that is compatible with tinygemm [int4mm](https://github.com/pytorch/pytorch/blob/39357ba06f48cda7d293a4995aa5eba2a46598b5/aten/src/ATen/native/native_functions.yaml#L4138) kernels.
204204

205+
### Zero Point Domains
206+
```ZeroPointDomain``` is used to control the data types of zero points. ```None``` represents symmetric quantization, while ```ZeroPointDomain.FLOAT``` and ```ZeroPointDomain.INT``` indicate asymmetric quantization. For detailed implementation of different zero point data types, refer to [the reference implementation]((../../test/quantization/test_quant_primitives.py)).
207+
The following support matrix illustrates the relationship between layouts and zero point domains, which may be updated with backend changes:
208+
209+
|Layout|None(Symmetric)|Float|Int|
210+
|------|---------------|-----|---|
211+
|TensorCoreTiledLayout| Yes | Yes(Default) | No|
212+
|Int4CPULayout | Yes | Yes(Default) | No |
213+
|MarlinSparseLayout | No | No | Yes(Default) |
214+
215+
205216
### Full Affine Quantization Flow Example
206217
Let's use int4 weight only quantization that's targeting tinygemm int4 weight only quantized matmul
207218
as an example:
@@ -239,6 +250,8 @@ m_bf16 = torch.compile(m_bf16, mode='max-autotune')
239250
group_size = 32
240251
# only works for torch 2.4+
241252
quantize_(m, int4_weight_only(group_size=group_size))
253+
## If different zero_point_domain needed
254+
# quantize_(m, int4_weight_only(group_size=group_size), zero_point_domain=ZeroPointDomain.FLOAT)
242255

243256
# temporary workaround for tensor subclass + torch.compile
244257
# NOTE: this is only need for torch version < 2.5+

torchao/quantization/quant_api.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -664,6 +664,7 @@ def int4_weight_only(
664664
size is more fine grained, choices are [256, 128, 64, 32]
665665
`layout`: layout type for quantized tensor, default is `TensorCoreTiledLayout(inner_k_tiles=8)`
666666
`use_hqq`: whether to use hqq or default quantization mode, default is False
667+
`zero_point_domain`: data type of zeros points, choices are [None(default), ZeroPointDomain.FLOAT, ZeroPointDomain.INT, ZeroPointDomain.NONE]
667668
"""
668669

669670
def apply_int4_weight_only_quant(weight):
@@ -679,14 +680,14 @@ def apply_int4_weight_only_quant(weight):
679680
quant_min = 0
680681
quant_max = 15
681682
eps = 1e-6
683+
preserve_zero = LAYOUT_TO_PRESERVE_ZEROS[type(layout)]
682684
zero_point_dtype = torch.bfloat16
683685

684686
nonlocal zero_point_domain
685687
assert type(layout) in LAYOUT_TO_ZERO_POINT_DOMAIN.keys(), f"Only support layout: {LAYOUT_TO_ZERO_POINT_DOMAIN.keys()}"
686688
if zero_point_domain is None:
687689
# the first value is the default one
688690
zero_point_domain = LAYOUT_TO_ZERO_POINT_DOMAIN[type(layout)][0]
689-
preserve_zero = LAYOUT_TO_PRESERVE_ZEROS[type(layout)]
690691
else:
691692
assert zero_point_domain in LAYOUT_TO_ZERO_POINT_DOMAIN[type(layout)], f"Layout only support {LAYOUT_TO_ZERO_POINT_DOMAIN[layout]}"
692693

0 commit comments

Comments
 (0)