[OpenVINO] Support Qwen3-next by rkazants · Pull Request #1523 · huggingface/optimum-intel

rkazants · 2025-11-16T13:10:30Z

What does this PR do?

Example of conversion cmd-line for Qwen/Qwen3-Next-80B-A3B-Instruct:

optimum-cli export openvino -m Qwen/Qwen3-Next-80B-A3B-Instruct Qwen3-Next-80B-A3B-Instruct

Example of inference for Qwen/Qwen3-Next-80B-A3B-Instruct using OpenVINO backend:

from transformers import AutoTokenizer
from optimum.intel.openvino import OVModelForCausalLM

model_path = "./Qwen3-Next-80B-A3B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = OVModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)

# change input text as desired
input_text = "The capital of France is"
# tokenize the text
input_tokens = tokenizer(input_text, return_tensors="pt")
# generate output tokens
output = model.generate(**input_tokens, max_length=10)
# decode output tokens into text
output = tokenizer.batch_decode(output)
print(output[0])

Before submitting

[N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2025-11-16T13:12:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

optimum/exporters/openvino/model_patcher.py

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

optimum/exporters/openvino/model_configs.py

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

optimum/exporters/openvino/model_configs.py

docs/source/openvino/models.mdx

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

optimum/exporters/openvino/model_patcher.py

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

IlyasMoutawwakil

LGTM !

MaximProshin · 2026-03-09T12:10:15Z

FYI After fixes in OV for group-wise quantization, I was able to measure the accuracy for default int4 and I got (CPU, WWB Similarity): 0.960114

savvadesogle · 2026-03-09T20:21:28Z

Thank you 🙏

savvadesogle · 2026-03-11T06:43:41Z

Hello, Roman. @rkazants

Can we convert models from the Intel repository to OpenVINO? AutoRound models.
And how to do in properly

https://huggingface.co/Intel/Qwen3-Coder-Next-int4-AutoRound

rkazants · 2026-03-12T07:01:04Z

optimum-cli export openvino -m Qwen/Qwen3-Next-80B-A3B-Instruct Qwen3-Next-80B-A3B-Instruct

Hi @savvadesogle,

Can you try this command for conversion:

optimum-cli export openvino -m Intel/Qwen3-Coder-Next-int4-AutoRound Qwen3-Coder-Next-int4-AutoRound

@ljaljushkin, @MaximProshin, @mvafin, did you see any problems for converting this quantized model from NNCF perspective or PyTorch FE?

Best regards,
Roman

[OpenVINO] Support Qwen3-next

61d0f3e

rkazants marked this pull request as draft November 16, 2025 13:10

rkazants added 16 commits November 16, 2025 20:58

Fix config and add base patching

ea6b4b3

Extend patching

7e37aae

Initial patching for linear attention

8bc1c5a

Patch recurrent gated delta rule

26a4b65

Use module extension for conversion of chunked_attention_cell

a0e8d3c

Implement conversion extension for chunked gated delta rule cell

486a4f8

Patch sparse moe block

f623e57

Use core_attn_out

e76f243

Fix use of mask

b191d59

Correct shape for recurrent_state in config file

0b1bb21

Re-write patch for MoE

6a3d22f

9df28e3

Merge remote-tracking branch 'upstream/main' into support_qwen3_next

9ddaad9

Apply code-formatting

6384b9f

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Fix previous commit with main merge

f66862a

Re-patch sparse MoE

f4af348

zhangYiIntel reviewed Jan 21, 2026

View reviewed changes

optimum/exporters/openvino/model_patcher.py Outdated Show resolved Hide resolved

rkazants added 10 commits January 30, 2026 10:04

Merge remote-tracking branch 'upstream/main' into support_qwen3_next

500810f

Merge remote-tracking branch 'upstream/main' into support_qwen3_next

aee20f4

Fix code formatting

92ec0e5

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Add tests for qwen3 next

3e1c66f

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Unify representation for CausalConv1d

5f45761

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Apply code-formatting

d49c7cd

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Leave only one GatedDeltaNet representation

9874d8c

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Fix support for other models

f1dd676

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Fix test_decoder.py

2665dc9

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Use chunk size equal to one

162bb72

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>

Copilot started reviewing on behalf of rkazants March 4, 2026 17:54 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

optimum/exporters/openvino/model_configs.py Outdated Show resolved Hide resolved

Handle bf16 weights

33cb551

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>