Skip to content

v1.5.0: BetterTransformer Integration, IOBinding, Optimum Exporters, and Whisper with ONNX Runtime

Choose a tag to compare

@michaelbenayoun michaelbenayoun released this 17 Nov 16:40
· 969 commits to main since this release

BetterTransformer

Convert your model into its PyTorch BetterTransformer format using a one liner with the new BetterTransformer integration for faster inference on CPU and GPU!

from optimum.bettertransformer import BetterTransformer

model = BetterTransformer.transform(model)

Check the full list of supported models in the documentaiton, and check out the Google Colab demo.

Contributions

  • BetterTransformer integration (#423)
  • ViT and Wav2Vec2 support (#470)

ONNX Runtime IOBinding support

ORT models (except for ORTModelForCustomTasks) now support IOBinding to avoid data copying overheads between the host and device. Significant inference speedup during the decoding process on GPU.

By default, use_io_binding is set to True when using CUDA. You can turn off the IOBinding in case of any memory issue:

from optimum.onnxruntime import ORTModelForSeq2SeqLM

model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small", use_io_binding=False)

Contributions

  • Add IOBinding support to ONNX Runtime module (#421)

Optimum Exporters

optimum.exporters is a new module that handles the export of PyTorch and TensorFlow models to several backends. Only ONNX is supported for now, and more than 50 architectures can already be exported, among which BERT, GPT-Neo, Bloom, T5, ViT, Whisper, CLIP.

The export can be done via the CLI:

python -m optimum.exporters.onnx --model openai/whisper-tiny.en whisper_onnx/

For more information, check the documentation.

Contributions

  • optimum.exporters creation (#403)
  • Automatic task detection (#445)

Whisper

  • Whisper can be exported to ONNX using optimum.exporters.
  • Whisper can also be exported and ran using optimum.onnxruntime, IO binding is also supported.

Note: For the now the export from optimum.exporters will not be usable by ORTModelForSpeechSeq2Seq. To be able to run inference, export Whisper directly using ORTModelForSpeechSeq2Seq. This will be solved in the next release.

Contributions

  • Whisper support with optimum.onnxruntime and optimum.exporters (#420)

Other contributions

  • ONNX Runtime training now supports ORT 1.13.1 and transformers 4.23.1 (#434)
  • ORTModel can load models from subfolders in a similar fashion as in transformers (#443)
  • ORTOptimizer has been refactored, and a factory class has been added to create common OptimizationConfigs (#457)
  • Fixes and updates in the documentation (#411, #432, #437, #441)
  • Fixes IOBinding (#454, #461)