Skip to content

Commit 9d80b62

Browse files
authored
Update quantization-overview.md
1 parent 0b48f47 commit 9d80b62

File tree

1 file changed

+11
-2
lines changed

1 file changed

+11
-2
lines changed

docs/source/quantization-overview.md

+11-2
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,15 @@ For an example quantization flow with ``XNPACKQuantizer``, more documentation an
1717

1818
## Source Quantization: Int8DynActInt4WeightQuantizer
1919

20-
In addition to export based quantization (described above), ExecuTorch also provides source-based quantization. This is accomplished via [torchao](https://github.com/pytorch/ao). One specific example is `Int8DynActInt4WeightQuantizer`.
20+
In addition to export based quantization (described above), ExecuTorch wants to highlight source based quantizations, accomplished via [torchao](https://github.com/pytorch/ao). Unlike export based quantization, source based quantization directly modifies the model prior to export. One specific example is `Int8DynActInt4WeightQuantizer`.
21+
22+
This scheme represents 4-bit weight quantization with 8-bit dynamic quantization of activation during inference.
23+
24+
Imported with ``from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer``, this class uses a quantization instance constructed with a specified dtype precision and groupsize, to mutate a provided ``nn.Module``.
25+
26+
```
27+
from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer
28+
29+
model = Int8DynActInt4WeightQuantizer(precision=torch_dtype, groupsize=group_size).quantize(model)
30+
```
2131

22-
Imported with ``from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer``, this class uses a quantization object constructed with the specified precicion and groupsize, to mutate a provided ``nn.Module``.

0 commit comments

Comments
 (0)