Skip to content

Commit f09a85f

Browse files
Spelling / comments
1 parent e8e94d3 commit f09a85f

File tree

2 files changed

+19
-6
lines changed

2 files changed

+19
-6
lines changed

en-wordlist.txt

+11
Original file line numberDiff line numberDiff line change
@@ -698,3 +698,14 @@ TorchServe
698698
Inductor’s
699699
onwards
700700
recompilations
701+
BiasCorrection
702+
ELU
703+
GELU
704+
NNCF
705+
OpenVINO
706+
OpenVINOQuantizer
707+
PReLU
708+
Quantizer
709+
SmoothQuant
710+
quantizer
711+
quantizers

prototype_source/openvino_quantizer.rst

+8-6
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,15 @@ Prerequisites
1111
Introduction
1212
--------------
1313

14-
**This is an experimental feature, the quantization API is subject to change.**
14+
.. note::
15+
16+
This is an experimental feature, the quantization API is subject to change.
1517

1618
This tutorial demonstrates how to use `OpenVINOQuantizer` from `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf/tree/develop>`_ in PyTorch 2 Export Quantization flow to generate a quantized model customized for the `OpenVINO torch.compile backend <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ and explains how to lower the quantized model into the `OpenVINO <https://docs.openvino.ai/2024/index.html>`_ representation.
1719
`OpenVINOQuantizer` unlocks the full potential of low-precision OpenVINO kernels due to the placement of quantizers designed specifically for the OpenVINO.
1820

1921
The PyTorch 2 export quantization flow uses the torch.export to capture the model into a graph and performs quantization transformations on top of the ATen graph.
20-
This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
22+
This approach is expected to have significantly higher model coverage, improved flexibility, and a simplified UX.
2123
OpenVINO backend compiles the FX Graph generated by TorchDynamo into an optimized OpenVINO model.
2224

2325
The quantization flow mainly includes four steps:
@@ -134,7 +136,7 @@ Below is the list of essential parameters and their description:
134136
135137
OpenVINOQuantizer(preset=nncf.QuantizationPreset.MIXED)
136138
137-
* ``model_type`` - used to specify quantization scheme required for specific type of the model. Transformer is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, DistilBERT, etc.). None is default, i.e. no specific scheme is defined.
139+
* ``model_type`` - used to specify quantization scheme required for specific type of the model. Transformer is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, Llama, etc.). None is default, i.e. no specific scheme is defined.
138140

139141
.. code-block:: python
140142
@@ -169,7 +171,7 @@ Below is the list of essential parameters and their description:
169171
170172
OpenVINOQuantizer(target_device=nncf.TargetDevice.CPU)
171173
172-
For futher details on `OpenVINOQuantizer` please see the `documentation <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.OpenVINOQuantizer>`_.
174+
For further details on `OpenVINOQuantizer` please see the `documentation <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.OpenVINOQuantizer>`_.
173175

174176
After we import the backend-specific Quantizer, we will prepare the model for post-training quantization.
175177
``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.
@@ -215,8 +217,8 @@ This should significantly speed up inference time in comparison with the eager m
215217
4. Optional: Improve quantized model metrics
216218
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
217219

218-
NNCF implements advanced quantization algorithms like SmoothQuant and BiasCorrection, which help
219-
improve the quantized model metrics while minimizing the output discrepancies between the original and compressed models.
220+
NNCF implements advanced quantization algorithms like `SmoothQuant <https://arxiv.org/abs/2211.10438>`_ and `BiasCorrection <https://arxiv.org/abs/1906.04721>`_, which help
221+
to improve the quantized model metrics while minimizing the output discrepancies between the original and compressed models.
220222
These advanced NNCF algorithms can be accessed via the NNCF `quantize_pt2e` API:
221223

222224
.. code-block:: python

0 commit comments

Comments
 (0)