Releases · huggingface/optimum

23 Dec 15:30

fxmarty

v1.6.0

06cdbc5

v1.6.0: Optimum CLI, Stable Diffusion ONNX export, BetterTransformer & ONNX support for more architectures

Optimum CLI

The Optimum command line interface is introduced, and is now the official entrypoint for the ONNX export. Example commands:

optimum-cli --help
optimum-cli export onnx --help
optimum-cli export onnx --model bert-base-uncased --task sequence-classification bert_onnx/

Add Optimum CLI backbone by @fxmarty in #593

Stable Diffusion ONNX export

Optimum now supports the ONNX export of stable diffusion models from the diffusers library:

optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/

Add Stable Diffusion ONNX export by @echarlaix in #570

BetterTransformer support for more architectures

BetterTransformer integration includes new models in this release: CLIP, RemBERT, mBART, ViLT, FSMT

The complete list of supported models is available in the documentation.

[BT] Add Bettertransformer support for FSMT by @Sumanth077 in #494
[BT] add BetterTransformer support for ViLT architecture by @ka00ri in #508
Add MBart support for BetterTransformer by @ravenouse in #516
Add CLIP BetterTransformer by @fxmarty in #534
Add BetterTransformer support for RemBERT by @hchings in #545

ONNX export for more architectures

The ONNX export now supports Swin, MobileNet-v1, MobileNet-v2.

Add Swin support in exporters.onnx by @fxmarty in #528
[ONNX] add mobilenet support by @younesbelkada in #633

Extended ONNX export for encoder-decoder and decoder models

Encoder-decoder or decoder-only models normally making use of the generate() method in transformers can now be exported in several files using the --for-ort argument:

optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_small_onnx

yielding:

.
└── t5_small_onnx
    ├── config.json
    ├── decoder_model.onnx
    ├── decoder_with_past_model.onnx
    ├── encoder_model.onnx
    ├── special_tokens_map.json
    ├── spiece.model
    ├── tokenizer_config.json
    └── tokenizer.json

Passing --for-ort, exported models are expected to be loadable directly into ORTModel.

Add ort export in exporters for encoder-decoder models by @mht-sharma in #497
Support decoder generated with --for-ort from optimum.exporters.onnx in ORTDecoder by @fxmarty in #554

Support for ONNX models with external data at export, optimization, quantization

The ONNX export from PyTorch normally creates external data in case the exported model is larger than 2 GB. This release introduces a better support for the export and use of large models, writting all external data into a .onnx_data file if necessary.

Handling ONNX models with external data by @NouamaneTazi in #586
Improve the compatibility dealing with large ONNX proto in ORTOptimizer and ORTQuantizer by @JingyaHuang in #332

ONNX Runtime API improvement

Various improvements to allow for a better user experience in the ONNX Runtime integration:

ORTModel, ORTModelDecoder and ORTModelForConditionalGeneration can now load any ONNX model files regardless of their names, allowing to load optimized and quantized models without having to specify a file name argument.
ORTModel.from_pretrained() with from_transformers=True now downloads and loads the model in a temporary directory instead of the cache, which was not a right place to store it.
ORTQuantizer.save_pretrained() now saves the model configuration and the preprocessor, making the exported directory usable end-to-end.
ORTOptimizer.save_pretrained() now saves the preprocessor, making the exported directory usable end-to-end.
ONNX Runtime integration API improvement by @michaelbenayoun in #515

Custom shapes support at ONNX export

The shape of the example input to provide for the export to ONNX can be overridden in case the validity of the ONNX model is sensitive to the shape used during the export.

Read more: optimum-cli export onnx --help

Support custom shapes for dummy inputs by @fxmarty in #522
Support for custom input shapes in exporters onnx by @fxmarty in #575

Enable `use_cache=True` for ORTModelForCausalLM

Reusing past key values for models using ORTModelForCausalLM (e.g. gpt2) is now possible using use_cache=True, avoiding to recompute them at each iteration of the decoding:

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = ORTModelForCausalLM.from_pretrained("gpt2", from_transformers=True, use_cache=True)

inputs = tokenizer("My name is Arthur and I live in", return_tensors="pt")

gen_tokens = model.generate(**inputs)
tokenizer.batch_decode(gen_tokens)

Enable past_key_values for ORTModelForCausalLM by @echarlaix in #326

IO binding support for ORTModelForCustomTasks

ORTModelForCustomTasks now supports IO Binding when using CUDAExecutionProvider.

Add IO binding support for custom ORTModel by @JingyaHuang in #447

Experimental support to merge ONNX decoder with/without past key values

Along with --for-ort, when passing --task causal-lm-with-past , --task seq2seq-with-past or --task speech2seq-lm-with-past during the ONNX export exports two models: one not using the previously computed keys/values, and one using them.

An experimental support is introduced to merge the two models in one. Example:

optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_onnx/

import onnx
from optimum.onnx import merge_decoders

decoder = onnx.load("t5_onnx/decoder_model.onnx")
decoder_with_past = onnx.load("t5_onnx/decoder_with_past_model.onnx")

merged_model = merge_decoders(decoder, decoder_with_past)
onnx.save(merged_model, "t5_onnx/decoder_merged_model.onnx")

Merge ONNX decoder models by @JingyaHuang in #587

Major bugs fixed

Fix BetterTransformer with padding="max_length" by @fxmarty in #543
Fix non-nesting bug in BetterTransformer integration by @younesbelkada in #637

Other changes, bugfixes and improvements

Fix doc-builder premission error by @mishig25 in #482
Fix doc build pr premissions by @mishig25 in #484
Re-order the task manager doc by @michaelbenayoun in #483
Fix whisper device for gpu test by @fxmarty in #486
Fix tensorflow CI by @fxmarty in #489
Fix PR doc generation by @regisss in #495
Fix broken links in the doc by @fxmarty in #499
Update iobinding ORT encoder whisper by @mht-sharma in #498
fix NormalizedConfig init error message by @PaulQbFeng in #500
Change import structure for ORTModel by @fxmarty in #456
[BT] Fix failing CI tests by @younesbelkada in #501
Remove redundant condition statement in ORTDecoder(Seq2seq) by @JingyaHuang in #504
[BT] put decorator on the correct place by @younesbelkada in #509
[BT] clearer error message for norm_first by @younesbelkada in #510
Deprecate PyTorch 1.12. for BetterTransformer by @fxmarty in #513
Fix ORTModelForSeq2SeqLM test by @fxmarty in #455
Clearer error messages when initilizing the requested ONNX Runtime execution provider fails by @fxmarty in #514
[BT] Fix doc bugs by @younesbelkada in #517
Replace sklearn by scikit-learn by @lesteve in #502
ORTModel uses optimum.exporters.onnx by @michaelbenayoun in #490
Cleanup deprecated ONNX Runtime training docker files by @JingyaHuang in #523
Added support for Tapas Model by @juheon...

Contributors

lesteve, fxmarty, and 18 other contributors

Assets 2

19 Dec 16:26

fxmarty

v1.5.2

b392ea3

v1.5.2: Patch release

Constraint temporarily numpy<1.24.0 (#614)

Assets 2

24 Nov 14:36

fxmarty

v1.5.1

10d0fec

v1.5.1: Patch release

Deprecate PyTorch 1.12. for BetterTransformer with better error message (#513)

Assets 2

17 Nov 16:40

michaelbenayoun

v1.5.0

06d62c3

v1.5.0: BetterTransformer Integration, IOBinding, Optimum Exporters, and Whisper with ONNX Runtime

BetterTransformer

Convert your model into its PyTorch BetterTransformer format using a one liner with the new BetterTransformer integration for faster inference on CPU and GPU!

from optimum.bettertransformer import BetterTransformer

model = BetterTransformer.transform(model)

Check the full list of supported models in the documentaiton, and check out the Google Colab demo.

Contributions

BetterTransformer integration (#423)
ViT and Wav2Vec2 support (#470)

ONNX Runtime IOBinding support

ORT models (except for ORTModelForCustomTasks) now support IOBinding to avoid data copying overheads between the host and device. Significant inference speedup during the decoding process on GPU.

By default, use_io_binding is set to True when using CUDA. You can turn off the IOBinding in case of any memory issue:

from optimum.onnxruntime import ORTModelForSeq2SeqLM

model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small", use_io_binding=False)

Contributions

Add IOBinding support to ONNX Runtime module (#421)

Optimum Exporters

optimum.exporters is a new module that handles the export of PyTorch and TensorFlow models to several backends. Only ONNX is supported for now, and more than 50 architectures can already be exported, among which BERT, GPT-Neo, Bloom, T5, ViT, Whisper, CLIP.

The export can be done via the CLI:

python -m optimum.exporters.onnx --model openai/whisper-tiny.en whisper_onnx/

For more information, check the documentation.

Contributions

optimum.exporters creation (#403)
Automatic task detection (#445)

Whisper

Whisper can be exported to ONNX using optimum.exporters.
Whisper can also be exported and ran using optimum.onnxruntime, IO binding is also supported.

Note: For the now the export from optimum.exporters will not be usable by ORTModelForSpeechSeq2Seq. To be able to run inference, export Whisper directly using ORTModelForSpeechSeq2Seq. This will be solved in the next release.

Contributions

Whisper support with optimum.onnxruntime and optimum.exporters (#420)

Other contributions

ONNX Runtime training now supports ORT 1.13.1 and transformers 4.23.1 (#434)
ORTModel can load models from subfolders in a similar fashion as in transformers (#443)
ORTOptimizer has been refactored, and a factory class has been added to create common OptimizationConfigs (#457)
Fixes and updates in the documentation (#411, #432, #437, #441)
Fixes IOBinding (#454, #461)

Assets 2

26 Oct 08:00

echarlaix

v1.4.1

f5040c2

v1.4.1: Patch release

Add inference with ORTModel to ORTTrainer and ORTSeq2SeqTrainer #189
Add InferenceSession options and provider to ORTModel #271
Add mT5 (#341) and Marian (#393) support to ORTOptimizer
Add batchnorm folding torch.fx transformations #348
The torch.fx transformations now use the marking methods mark_as_transformed, mark_as_restored, get_transformed_nodes #385
Update BaseConfig for transformers 4.22.0 release #386
Update ORTTrainer for transformers 4.22.1 release #388
Add extra ONNX Runtime quantization options #398
Add possibility to pass provider_options to ORTModel #401
Add support to pass a specific device for ORTModel, as transformers does for pipelines #427
Fixes to support onnxruntime 1.13.1 #430

Assets 2

08 Sep 17:56

echarlaix

v1.4.0

1b08fb5

v1.4.0: ORTQuantizer and ORTOptimizer refactorization

ONNX Runtime

Refactorization of ORTQuantizer (#270) and ORTOptimizer (#294)
Add ONNX Runtime fused Adam Optimizer (#295)
Add ORTModelForCustomTasks allowing ONNX Runtime inference support for custom tasks (#303)
Add ORTModelForMultipleChoice allowing ONNX Runtime inference for models with multiple choice classification head (#358)

Torch FX

Add FuseBiasInLinear a transformation that fuses the weight and the bias of linear modules (#253)

Improvements and bugfixes

Enable the possibility to disregard the precomputed past_key_values during ONNX Runtime inference of Seq2Seq models (#241)
Enable node exclusion from quantization for benchmark suite (#284)
Enable possibility to use a token authentication when loading a calibration dataset (#289)
Fix optimum pipeline when no model is given (#301)

Assets 2

12 Jul 12:32

echarlaix

v1.3.0

5713f84

v1.3.0: Torch FX transformations, ORTModelForSeq2SeqLM and ORTModelForImageClassification

Torch FX

The optimum.fx.optimization module (#232) provides a set of torch.fx graph transformations, along with classes and functions to write your own transformations and compose them.

The Transformation and ReversibleTransformation represent non-reversible and reversible transformations, and it is possible to write such transformations by inheriting from those classes
The compose utility function enables transformation composition
Two reversible transformations were added:
- MergeLinears: merges linear layers that have the same input
- ChangeTrueDivToMulByInverse: changes a division by a static value to a multiplication of its inverse

ORTModelForSeq2SeqLM

ORTModelForSeq2SeqLM (#199) allows ONNX export and ONNX Runtime inference for Seq2Seq models.

When exported, Seq2Seq models are decomposed into three parts : the encoder, the decoder (actually consisting of the decoder with the language modeling head), and the decoder with pre-computed key/values as additional inputs.
This specific export comes from the fact that during the first pass, the decoder has no pre-computed key/values hidden-states, while during the rest of the generation past key/values will be used to speed up sequential decoding.

Below is an example that downloads a T5 model from the Hugging Face Hub, exports it through the ONNX format and saves it :

from optimum.onnxruntime import ORTModelForSeq2SeqLM

# Load model from hub and export it through the ONNX format 
model = ORTModelForSeq2SeqLM.from_pretrained("t5-small",  from_transformers=True)

# Save the exported model in the given directory
model.save_pretrained(output_dir)

ORTModelForImageClassification

ORTModelForImageClassification (#226) allows ONNX Runtime inference for models with an image classification head.

Below is an example that downloads a ViT model from the Hugging Face Hub, exports it through the ONNX format and saves it :

from optimum.onnxruntime import ORTModelForImageClassification

# Load model from hub and export it through the ONNX format 
model = ORTModelForImageClassification.from_pretrained("google/vit-base-patch16-224",  from_transformers=True)

# Save the exported model in the given directory
model.save_pretrained(output_dir)

ORTOptimizer

Adds support for converting model weights from fp32 to fp16 by adding a new optimization parameter (fp16) to OptimizationConfig (#273).

Pipelines

Additional pipelines tasks are now supported, here is a list of the supported tasks along with the default model for each:

Image Classification (ViT)
Text-to-Text Generation (T5 small)
Summarization (T5 base)
Translation (T5 base)

Below is an example that downloads a T5 small model from the Hub and loads it with transformers pipeline for translation :

from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("optimum/t5-small")
model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small")
onnx_translation = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)

text = "What a beautiful day !"
pred = onnx_translation(text)
# [{'translation_text': "C'est une belle journée !"}]

Breaking change

The ORTModelForXXX execution provider default value is now set to CPUExecutionProvider (#203). Before, if no execution provider was provided, it was set to CUDAExecutionProvider if a gpu was detected, or to CPUExecutionProvider otherwise.

Assets 2

15 Jun 12:36

echarlaix

v1.2.3

5a0106d

v1.2.3: Patch release

Remove intel sub-package, migrating to optimum-intel (#212)
Fix the loading and saving of ORTModel optimized and quantized models (#214)

Assets 2

02 Jun 13:27

echarlaix

v1.2.2

0107c1c

v1.2.2: Patch release

Extend QuantizationPreprocessor to dynamic quantization (#196)
Introduce unified approach to create transformers vs optimized models benchmark (#194)
Bump huggingface_hub version and protobuf fix (#205)

Assets 2

13 May 10:04

echarlaix

v1.2.1

2709e60

v1.2.1: Patch release

Add support to Python version 3.7 (#176)

Assets 2

Releases: huggingface/optimum

v1.6.0: Optimum CLI, Stable Diffusion ONNX export, BetterTransformer & ONNX support for more architectures

Optimum CLI

Stable Diffusion ONNX export

BetterTransformer support for more architectures

ONNX export for more architectures

Extended ONNX export for encoder-decoder and decoder models

Support for ONNX models with external data at export, optimization, quantization

ONNX Runtime API improvement

Custom shapes support at ONNX export

Enable use_cache=True for ORTModelForCausalLM

IO binding support for ORTModelForCustomTasks

Experimental support to merge ONNX decoder with/without past key values

Major bugs fixed

Other changes, bugfixes and improvements

Contributors

Uh oh!

v1.5.2: Patch release

Uh oh!

v1.5.1: Patch release

Uh oh!

v1.5.0: BetterTransformer Integration, IOBinding, Optimum Exporters, and Whisper with ONNX Runtime

BetterTransformer

Contributions

ONNX Runtime IOBinding support

Contributions

Optimum Exporters

Contributions

Whisper

Contributions

Other contributions

Uh oh!

v1.4.1: Patch release

Uh oh!

v1.4.0: ORTQuantizer and ORTOptimizer refactorization

ONNX Runtime

Torch FX

Improvements and bugfixes

Uh oh!

v1.3.0: Torch FX transformations, ORTModelForSeq2SeqLM and ORTModelForImageClassification

Torch FX

ORTModelForSeq2SeqLM

ORTModelForImageClassification

ORTOptimizer

Pipelines

Breaking change

Uh oh!

v1.2.3: Patch release

Uh oh!

v1.2.2: Patch release

Uh oh!

v1.2.1: Patch release

Uh oh!

Enable `use_cache=True` for ORTModelForCausalLM