Skip to content

Commit f48f392

Browse files
Jack-Khuufacebook-github-bot
authored andcommitted
Update quant overview for 021 (#3845)
Summary: Pull Request resolved: #3845 Reviewed By: Gasoonjia Differential Revision: D58176137 Pulled By: Jack-Khuu fbshipit-source-id: bdaf01a8fb66ba3333c3b6d7802c3bb02b20c4a5
1 parent 8009114 commit f48f392

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

docs/source/quantization-overview.md

+22
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,25 @@ Backend developers will need to implement their own ``Quantizer`` to express how
1414
Modeling users will use the ``Quantizer`` specific to their target backend to quantize their model, e.g. ``XNNPACKQuantizer``.
1515

1616
For an example quantization flow with ``XNPACKQuantizer``, more documentation and tutorials, please see ``Performing Quantization`` section in [ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial).
17+
18+
## Source Quantization: Int8DynActInt4WeightQuantizer
19+
20+
In addition to export based quantization (described above), ExecuTorch wants to highlight source based quantizations, accomplished via [torchao](https://github.com/pytorch/ao). Unlike export based quantization, source based quantization directly modifies the model prior to export. One specific example is `Int8DynActInt4WeightQuantizer`.
21+
22+
This scheme represents 4-bit weight quantization with 8-bit dynamic quantization of activation during inference.
23+
24+
Imported with ``from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer``, this class uses a quantization instance constructed with a specified dtype precision and groupsize, to mutate a provided ``nn.Module``.
25+
26+
```
27+
# Source Quant
28+
from torchao.quantization.quant_api import Int8DynActInt4WeightQuantizer
29+
30+
model = Int8DynActInt4WeightQuantizer(precision=torch_dtype, groupsize=group_size).quantize(model)
31+
32+
# Export to ExecuTorch
33+
from executorch.exir import to_edge
34+
from torch.export import export
35+
36+
exported_model = export(model, ...)
37+
et_program = to_edge(exported_model, ...).to_executorch(...)
38+
```

0 commit comments

Comments
 (0)