Skip to content

Latest commit

 

History

History
33 lines (18 loc) · 1.31 KB

quantization.md

File metadata and controls

33 lines (18 loc) · 1.31 KB

Quantization

Quantization techniques reduce memory and computational costs by representing weights and activations with lower-precision data types like 8-bit integers (int8). This enables loading larger models you normally wouldn't be able to fit into memory, and speeding up inference. Diffusers supports 8-bit and 4-bit quantization with bitsandbytes.

Quantization techniques that aren't supported in Transformers can be added with the [DiffusersQuantizer] class.

Learn how to quantize models in the Quantization guide.

BitsAndBytesConfig

[[autodoc]] BitsAndBytesConfig

DiffusersQuantizer

[[autodoc]] quantizers.base.DiffusersQuantizer