Skip to content

[fix] Allow ORTQuantizer over models with subfolder ONNX files#2094

Merged
echarlaix merged 3 commits into
huggingface:mainfrom
tomaarsen:fix/ort_quantizer_subfolder
Nov 18, 2024
Merged

[fix] Allow ORTQuantizer over models with subfolder ONNX files#2094
echarlaix merged 3 commits into
huggingface:mainfrom
tomaarsen:fix/ort_quantizer_subfolder

Conversation

@tomaarsen
Copy link
Copy Markdown
Member

@tomaarsen tomaarsen commented Nov 11, 2024

Hello!

Pull Request overview

  • Allow ORTQuantizer over models with subfolder ONNX files

Details

Currently, if you call ORTQuantizer over a model that was loaded with a subfolder, then it'll break:

from optimum.onnxruntime import ORTModelForFeatureExtraction, ORTQuantizer

model = ORTModelForFeatureExtraction.from_pretrained(
    "sentence-transformers-testing/all-MiniLM-L6-v2",
    subfolder="onnx",
    file_name="model.onnx",
)
quantizer = ORTQuantizer.from_pretrained(model)
print(quantizer)
Traceback (most recent call last):
  File "...\optimum\demo_ort_quantizer.py", line 8, in <module>
    quantizer = ORTQuantizer.from_pretrained(model)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\optimum\onnxruntime\quantization.py", line 156, in from_pretrained
    return cls(path)
           ^^^^^^^^^
  File "...\optimum\onnxruntime\quantization.py", line 102, in __init__
    self.config = AutoConfig.from_pretrained(self.onnx_model_path.parent)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\site-packages\transformers\models\auto\configuration_auto.py", line 1049, in from_pretrained
    raise ValueError(
ValueError: Unrecognized model in ...\.cache\huggingface\hub\models--sentence-transformers--all-MiniLM-L6-v2\snapshots\fa97f6e7cb1a59073dff9e6b13e2715cf7475ac9\onnx. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, audio-spectrogram-transformer, autoformer, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deformable_detr, deit, depth_anything, deta, detr, dinat, dinov2, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, git, glm, glpn, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, graphormer, grounding-dino, groupvit, hiera, hubert, ibert, idefics, idefics2, idefics3, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mimi, mistral, mixtral, mllama, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, paligemma, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, phi3, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rwkv, sam, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, siglip, siglip_vision_model, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, time_series_transformer, timesformer, timm_backbone, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso, zamba, zoedepth, onnx_model

There are 2 issues at play:

  1. The config (although known in the ORT Model) is not passed nicely to the ORTQuantizer with from_pretrained
  2. If AutoConfig.from_pretrained fails, it often fails with a ValueError rather than an OSError..

An underlying issue is that #2044 added/strengthened the "ONNX in subfolders" support by allowing the config to be in root while the model is in a subfolder - the ORTQuantizer wasn't updated to reflect that the config.json isn't necessarily adjacent to the model.onnx.

FYI, this is breaking https://sbert.net/docs/package_reference/util.html#sentence_transformers.backend.export_dynamic_quantized_onnx_model in some cases.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@echarlaix as you added/strengthened the "ONNX in subfolders" support & that's how I encountered this.

  • Tom Aarsen

Copy link
Copy Markdown
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix @tomaarsen !

@echarlaix echarlaix merged commit 400bb82 into huggingface:main Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants