Skip to content

Complete onnx/onnxruntime decoder models integration and testing#2278

Merged
echarlaix merged 27 commits into
mainfrom
distribute-tests
May 28, 2025
Merged

Complete onnx/onnxruntime decoder models integration and testing#2278
echarlaix merged 27 commits into
mainfrom
distribute-tests

Conversation

@IlyasMoutawwakil
Copy link
Copy Markdown
Member

@IlyasMoutawwakil IlyasMoutawwakil commented May 23, 2025

What does this PR do?

Decoder models are probably the most important topic of research right now, transformers itself decided to break many things in its rule book to keep up with the very fast changing field of LLMs. I think it should be easier to maintain and add new models with this PR by reducing the amount of special cases we handle or at least make the inference logic more straight forward.

This PR:

  • Enables io binding testing on cpu.
  • Enables and fixes more models that were never tested for inference (phi, olmo, olmo2).
  • Enables exhaustive testing of decoder models, including comparing pkv values in forward passes.
  • Enable using encoder-decoder models as decoders which was not supported before (bart, marian, blenderbot, ..).
  • Enables using models that support pkv cache to not it if configuret to, similar to transformers api (based on generation_config.use_cache)

Before, if you ran all tests in parallel some would interfere with each other and only pass in sequential model. Now you can run all 250 tests on a multi-core machine (dgx for example) in 2 minutes.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@IlyasMoutawwakil IlyasMoutawwakil changed the title Distribute and complete onnxruntime tests Distribute and complete onnxruntime tests (decoder models) May 23, 2025
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR expands and stabilizes ONNX Runtime testing for decoder models by fixing model mappings, adding new models, updating exporter logic, and including decoder tests in CI.

  • Updates testing utilities: fixes model name mappings, adds new models (olmo, olmo2, phi), and refactors the setup flow
  • Extends NormalizedConfigManager and ONNX exporter configs to support new architectures and bumps minimum Transformers versions
  • Enhances exporter task filtering by Transformers version and updates CI matrix to include test_decoder.py

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/onnxruntime/testing_utils.py Refactor _setup, adjust model IDs, remove unsupported case skips
optimum/utils/normalized_config.py Add mappings for olmo, olmo2, phi; update bloom args
optimum/exporters/tasks.py Filter supported model types by Transformers version
optimum/exporters/onnx/utils.py Bump version check threshold for position IDs requirement
optimum/exporters/onnx/model_configs.py Update MIN_TRANSFORMERS_VERSION for PhiOnnxConfig & BloomOnnxConfig, remove legacy override
optimum/exporters/onnx/base.py Simplify decoder merge logic using constants
.github/workflows/test_onnxruntime.yml Add test_decoder.py to CI test matrix
Comments suppressed due to low confidence (3)

optimum/utils/normalized_config.py:278

  • The mapping for "phi3small" was removed, which will break configuration support for phi3small models. Consider re-adding this entry or verifying that dropping phi3small support is intentional.
"phi3small": NormalizedTextConfigWithGQA,

tests/onnxruntime/testing_utils.py:176

  • The previous skipTest logic for unsupported export cases was removed, so tests may now error instead of being skipped. Please reintroduce a mechanism to skip unsupported configurations.
if model_args.get("use_cache", False):

.github/workflows/test_onnxruntime.yml:31

  • The YAML array for test_file uses bracket notation with commas on separate lines, which may not be valid. Consider using a standard YAML list (one - item per line) to ensure CI parses it correctly.
[

@IlyasMoutawwakil
Copy link
Copy Markdown
Member Author

IlyasMoutawwakil commented May 25, 2025

Added qwen3 model support as proof that new models inference integration should, in most cases, "just work".
Contrary to #2252 where logits and generations didn't match (even when export and inference ran without failure).
@Abdennacer-Badaoui

Copy link
Copy Markdown
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @IlyasMoutawwakil 🔥 thanks!

Comment thread optimum/exporters/tasks.py Outdated
Comment on lines +1500 to +1507
and is_transformers_version(
">=",
str(
TasksManager.get_exporter_config_constructor(
exporter, task=task, model_type=model_type
).func.MIN_TRANSFORMERS_VERSION
),
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it might be easier to raise an error instead before export when the transformers version is not compatible and to keep all supported architectures so that users know that the architecture export is supported but that transformers needs to be upgraded, what do you think @IlyasMoutawwakil ?

f"{config.MIN_TRANSFORMERS_VERSION}, got: {transformers.__version__}"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact I only added this here to be able to use it in test_find_untested_architectures
I can move the version checks there and keep this method as is.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ! I simply remove them the unsupported models (because version) using CONFIG_MAPPING_NAMES

        supported_transformers_models = set(CONFIG_MAPPING_NAMES.keys())
        supported_export_models = set(TasksManager.get_supported_model_type_for_task(task=self.TASK, exporter="onnx"))
        supported_export_models = supported_export_models & supported_transformers_models
        untested_models = supported_export_models - tested_models

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the raising of version error during export, I thin that's already the case

self.normalized_config = NormalizedConfigManager.get_normalized_config_class(config.model_type)(config)
self.key_value_input_names = [key for key in self.input_names if (".key" in key) or (".value" in key)]
self.key_value_output_names = [key for key in self.output_names if (".key" in key) or (".value" in key)]
self.can_use_cache = len(self.key_value_input_names) > 0 and len(self.key_value_output_names) > 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question : is there any case where len(self.key_value_input_names) > 0 and len(self.key_value_output_names) == 0 (before we only used self.key_value_input_names to determine self.use_cache so wondering)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the case of CausalLMs I don't think so, even legacy merged decoders have both.

Comment thread optimum/onnxruntime/modeling_decoder.py Outdated
@IlyasMoutawwakil
Copy link
Copy Markdown
Member Author

@echarlaix all good to merge !

@echarlaix
Copy link
Copy Markdown
Collaborator

great, thanks a lot @IlyasMoutawwakil

@echarlaix echarlaix merged commit 85376e3 into main May 28, 2025
34 checks passed
@echarlaix echarlaix deleted the distribute-tests branch May 28, 2025 08:51
@echarlaix echarlaix mentioned this pull request Jun 4, 2025
3 tasks
@IlyasMoutawwakil IlyasMoutawwakil changed the title Distribute and complete onnxruntime tests (decoder models) Complete onnx/onnxruntime decoder models integration and testing Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants