Spelling / comments

daniil-lyakhov · daniil-lyakhov · commit f09a85f4bda1 · 2025-04-14T16:05:25.000+02:00
diff --git a/en-wordlist.txt b/en-wordlist.txt
@@ -698,3 +698,14 @@ TorchServe
 Inductor’s
 onwards
 recompilations
+BiasCorrection
+ELU
+GELU
+NNCF
+OpenVINO
+OpenVINOQuantizer
+PReLU
+Quantizer
+SmoothQuant
+quantizer
+quantizers
diff --git a/prototype_source/openvino_quantizer.rst b/prototype_source/openvino_quantizer.rst
@@ -11,13 +11,15 @@ Prerequisites
 Introduction
 --------------
 
-**This is an experimental feature, the quantization API is subject to change.**
+.. note::
+
+    This is an experimental feature, the quantization API is subject to change.
 
 This tutorial demonstrates how to use `OpenVINOQuantizer` from `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf/tree/develop>`_ in PyTorch 2 Export Quantization flow to generate a quantized model customized for the `OpenVINO torch.compile backend <https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html>`_ and explains how to lower the quantized model into the `OpenVINO <https://docs.openvino.ai/2024/index.html>`_ representation.
 `OpenVINOQuantizer` unlocks the full potential of low-precision OpenVINO kernels due to the placement of quantizers designed specifically for the OpenVINO.
 
 The PyTorch 2 export quantization flow uses the torch.export to capture the model into a graph and performs quantization transformations on top of the ATen graph.
-This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
+This approach is expected to have significantly higher model coverage, improved flexibility, and a simplified UX.
 OpenVINO backend compiles the FX Graph generated by TorchDynamo into an optimized OpenVINO model.
 
 The quantization flow mainly includes four steps:
@@ -134,7 +136,7 @@ Below is the list of essential parameters and their description:
 
         OpenVINOQuantizer(preset=nncf.QuantizationPreset.MIXED)
 
-* ``model_type`` - used to specify quantization scheme required for specific type of the model. Transformer is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, DistilBERT, etc.). None is default, i.e. no specific scheme is defined.
+* ``model_type`` - used to specify quantization scheme required for specific type of the model. Transformer is the only supported special quantization scheme to preserve accuracy after quantization of Transformer models (BERT, Llama, etc.). None is default, i.e. no specific scheme is defined.
 
     .. code-block:: python
 
@@ -169,7 +171,7 @@ Below is the list of essential parameters and their description:
 
         OpenVINOQuantizer(target_device=nncf.TargetDevice.CPU)
 
-For futher details on `OpenVINOQuantizer` please see the `documentation <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.OpenVINOQuantizer>`_.
+For further details on `OpenVINOQuantizer` please see the `documentation <https://openvinotoolkit.github.io/nncf/autoapi/nncf/experimental/torch/fx/index.html#nncf.experimental.torch.fx.OpenVINOQuantizer>`_.
 
 After we import the backend-specific Quantizer, we will prepare the model for post-training quantization.
 ``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.
@@ -215,8 +217,8 @@ This should significantly speed up inference time in comparison with the eager m
 4. Optional: Improve quantized model metrics
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-NNCF implements advanced quantization algorithms like SmoothQuant and BiasCorrection, which help
-improve the quantized model metrics while minimizing the output discrepancies between the original and compressed models.
+NNCF implements advanced quantization algorithms like `SmoothQuant <https://arxiv.org/abs/2211.10438>`_ and `BiasCorrection <https://arxiv.org/abs/1906.04721>`_, which help
+to improve the quantized model metrics while minimizing the output discrepancies between the original and compressed models.
 These advanced NNCF algorithms can be accessed via the NNCF `quantize_pt2e` API:
 
 .. code-block:: python