Release v1.5.0: BetterTransformer Integration, IOBinding, Optimum Exporters, and Whisper with ONNX Runtime · huggingface/optimum

BetterTransformer

Convert your model into its PyTorch BetterTransformer format using a one liner with the new BetterTransformer integration for faster inference on CPU and GPU!

from optimum.bettertransformer import BetterTransformer

model = BetterTransformer.transform(model)

Check the full list of supported models in the documentaiton, and check out the Google Colab demo.

Contributions

BetterTransformer integration (#423)
ViT and Wav2Vec2 support (#470)

ONNX Runtime IOBinding support

ORT models (except for ORTModelForCustomTasks) now support IOBinding to avoid data copying overheads between the host and device. Significant inference speedup during the decoding process on GPU.

By default, use_io_binding is set to True when using CUDA. You can turn off the IOBinding in case of any memory issue:

from optimum.onnxruntime import ORTModelForSeq2SeqLM

model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small", use_io_binding=False)

Contributions

Add IOBinding support to ONNX Runtime module (#421)

Optimum Exporters

optimum.exporters is a new module that handles the export of PyTorch and TensorFlow models to several backends. Only ONNX is supported for now, and more than 50 architectures can already be exported, among which BERT, GPT-Neo, Bloom, T5, ViT, Whisper, CLIP.

The export can be done via the CLI:

python -m optimum.exporters.onnx --model openai/whisper-tiny.en whisper_onnx/

For more information, check the documentation.

Contributions

optimum.exporters creation (#403)
Automatic task detection (#445)

Whisper

Whisper can be exported to ONNX using optimum.exporters.
Whisper can also be exported and ran using optimum.onnxruntime, IO binding is also supported.

Note: For the now the export from optimum.exporters will not be usable by ORTModelForSpeechSeq2Seq. To be able to run inference, export Whisper directly using ORTModelForSpeechSeq2Seq. This will be solved in the next release.

Contributions

Whisper support with optimum.onnxruntime and optimum.exporters (#420)

Other contributions

ONNX Runtime training now supports ORT 1.13.1 and transformers 4.23.1 (#434)
ORTModel can load models from subfolders in a similar fashion as in transformers (#443)
ORTOptimizer has been refactored, and a factory class has been added to create common OptimizationConfigs (#457)
Fixes and updates in the documentation (#411, #432, #437, #441)
Fixes IOBinding (#454, #461)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.5.0: BetterTransformer Integration, IOBinding, Optimum Exporters, and Whisper with ONNX Runtime

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

BetterTransformer

Contributions

ONNX Runtime IOBinding support

Contributions

Optimum Exporters

Contributions

Whisper

Contributions

Other contributions

Uh oh!