Skip to content

Commit 99a3970

Browse files
authored
Add OpenVINO weights compression to docs (#435)
* Add weights compression to docs * Update optimization_ov.mdx
1 parent 9562235 commit 99a3970

File tree

1 file changed

+22
-1
lines changed

1 file changed

+22
-1
lines changed

docs/source/optimization_ov.mdx

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,27 @@ tokenizer.save_pretrained(save_dir)
6262

6363
The `quantize()` method applies post-training static quantization and export the resulting quantized model to the OpenVINO Intermediate Representation (IR). The resulting graph is represented with two files: an XML file describing the network topology and a binary file describing the weights. The resulting model can be run on any target Intel device.
6464

65+
### Weights compression
66+
67+
For large language models (LLMs), it is often beneficial to only quantize weights, and keep activations in floating point precision. This method does not require a calibration dataset. To enable weights compression, set the `weights_only` parameter of `OVQuantizer`:
68+
69+
```python
70+
from optimum.intel.openvino import OVQuantizer, OVModelForCausalLM
71+
from transformers import AutoModelForCausalLM
72+
73+
save_dir = "int8_weights_compressed_model"
74+
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-3b")
75+
quantizer = OVQuantizer.from_pretrained(model, task="text-generation")
76+
quantizer.quantize(save_directory=save_dir, weights_only=True)
77+
```
78+
79+
To load the optimized model for inference:
80+
81+
```python
82+
optimized_model = OVModelForCausalLM.from_pretrained(save_dir)
83+
```
84+
85+
Weights compression is enabled for PyTorch and OpenVINO models: the starting model can be an `AutoModelForCausalLM` or `OVModelForCausalLM` instance.
6586

6687
## Training-time optimization
6788

@@ -221,4 +242,4 @@ text = "He's a dreadful magician."
221242
outputs = cls_pipe(text)
222243

223244
[{'label': 'NEGATIVE', 'score': 0.9840195178985596}]
224-
```
245+
```

0 commit comments

Comments
 (0)