[Qwen3.5] Enable MTP spec_v2 and add test for nvidia/Qwen3.5-397B-A17B-NVFP4 by hlu1 · Pull Request #19391 · sgl-project/sglang

hlu1 · 2026-02-26T04:00:56Z

Motivation

Make MTP_v2 work for Qwen3.5 by passing mm_input_embeds to the MTP head.
Add MTP_v1/v2 and non-MTP accuracy test for nvidia/Qwen3.5-397B-A17B-NVFP4 and check acceptance length in the MTP tests. Note that it uses the eval harness from sglang.test.run_eval import run_eval which applies the chat_template. Without the chat_template, the accuracy is very bad. The sampling parameters are based on the official recommendation from https://huggingface.co/Qwen/Qwen3.5-397B-A17B-FP8
Remove the two extra_buffer and mtp_v2 checks. These two checks are incorrect because mtp_v2 only requires extra_buffer when radix_cache is on. To make it more user friendly, I changed the default behavior of turning off radix-cache silently when spec decoding, no_buffer, and radix cache are enabled at the same time to raising an exception, in case the user want to enable spec decoding (v1 or v2) and radix cache but forgot to enable extra_buffer.
Remove duplicated code from server_args.py

Accuracy

Without radix cache

gpqa:
Repeat: 8, mean: 0.866
Scores: ['0.859', '0.869', '0.874', '0.869', '0.884', '0.848', '0.869', '0.859']

With radix cache

Repeat: 8, mean: 0.861
Scores: ['0.848', '0.843', '0.874', '0.854', '0.864', '0.859', '0.894', '0.854']

Benchmark

radix cache off

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-26T04:01:00Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

hlu1 · 2026-02-26T04:01:19Z

/tag-and-rerun-ci

hlu1 · 2026-02-28T02:15:51Z

/tag-and-rerun-ci

vincentzed · 2026-03-01T19:26:34Z

I tested trtllm_mha under this as well:
| Latency (s) | Tokens | Acc Length | Speed (token/s) |
| 3.088 | 512 | 3.413 | 165.82 |

Triton
| 2.177 | 512 | 3.303 | 235.23 |

ShangmingCai

LGTM as long as the CI passes. CC: @yizhang2077 Please double check.

hzh0425

LGTM

hlu1 · 2026-03-02T21:27:59Z

The gb200 CI is temporarily disabled. I ran the tests locally and they all pass.

Co-authored-by: vincentzed <207368749+vincentzed@users.noreply.github.com> Co-authored-by: lzy <tomlzy213@gmail.com>

hlu1 · 2026-03-03T21:30:05Z

/rerun-failed-ci

hlu1 · 2026-03-04T01:16:40Z

/rerun-failed-ci

hlu1 · 2026-03-04T07:20:07Z

/rerun-failed-ci

hlu1 · 2026-03-04T19:55:10Z

/rerun-failed-ci

hlu1 · 2026-03-04T21:33:31Z

Both test/registered/4-gpu-models/test_qwen3_next_models_mtp.py and test/registered/4-gpu-models/test_qwen35_models.py have passed in the latest CI run.

…B-NVFP4 (sgl-project#19391)

github-actions Bot added the run-ci label Feb 26, 2026

hlu1 requested review from Fridge003, hebiao064, ispobock and yizhang2077 February 26, 2026 06:21

hlu1 force-pushed the qwen35_test branch from 9dfffa0 to 981270a Compare February 28, 2026 01:56

hlu1 requested review from Ying1123, hnyls2002 and merrymercy as code owners February 28, 2026 01:56

hlu1 changed the title ~~[Qwen3.5] Add test for nvidia/Qwen3.5-397B-A17B-NVFP4~~ [Qwen3.5] Enable MTP_v2 and add test for nvidia/Qwen3.5-397B-A17B-NVFP4 Feb 28, 2026

hlu1 force-pushed the qwen35_test branch from 981270a to 1f27b7b Compare February 28, 2026 02:14

hlu1 requested a review from hanming-lu February 28, 2026 02:15

hzh0425 assigned yizhang2077 and hzh0425 Feb 28, 2026

hlu1 requested review from ByronHsu, ShangmingCai and xiezhq-hermann as code owners February 28, 2026 06:00

hlu1 force-pushed the qwen35_test branch from ef57914 to 0047912 Compare February 28, 2026 06:03

ShangmingCai reviewed Feb 28, 2026

View reviewed changes

Comment thread python/sglang/srt/server_args.py

mmangkad mentioned this pull request Feb 28, 2026

Add Qwen3.5 FP8 and NVFP4 sgl-project/sgl-cookbook#168

Merged

YAMY1234 mentioned this pull request Mar 1, 2026

[Feature] Enable spec v2 for Qwen3.5 #19621

Closed

5 tasks

hlu1 force-pushed the qwen35_test branch from 0047912 to 8fdb64b Compare March 2, 2026 08:15

hlu1 changed the title ~~[Qwen3.5] Enable MTP_v2 and add test for nvidia/Qwen3.5-397B-A17B-NVFP4~~ [Qwen3.5] Enable MTP spec_v2 and add test for nvidia/Qwen3.5-397B-A17B-NVFP4 Mar 2, 2026

yizhang2077 reviewed Mar 2, 2026

View reviewed changes

Comment thread python/sglang/srt/disaggregation/decode.py

Comment thread python/sglang/srt/mem_cache/memory_pool.py

Comment thread test/registered/4-gpu-models/test_qwen35_models.py

hlu1 force-pushed the qwen35_test branch from 8fdb64b to 7cc5ef7 Compare March 2, 2026 08:22

ShangmingCai approved these changes Mar 2, 2026

View reviewed changes

yizhang2077 approved these changes Mar 2, 2026

View reviewed changes

hzh0425 approved these changes Mar 2, 2026

View reviewed changes

hlu1 force-pushed the qwen35_test branch from 7cc5ef7 to 27f619b Compare March 2, 2026 19:03

[Qwen3.5] Enable MTP_v2 and add unit tests

8e509ff

Co-authored-by: vincentzed <207368749+vincentzed@users.noreply.github.com> Co-authored-by: lzy <tomlzy213@gmail.com>

hlu1 force-pushed the qwen35_test branch from 27f619b to 8e509ff Compare March 3, 2026 02:31

moehanabi mentioned this pull request Mar 3, 2026

[SPEC_V2] fix: Include draft_extend_v2 in mrope extend path and align eagle v2 mm embeds flow #19505

Closed

5 tasks

b8zhong added the high priority label Mar 3, 2026

hlu1 mentioned this pull request Mar 3, 2026

[Tracking] Qwen3.5/Qwen3-Next Optimizations #18590

Open

38 tasks

Fix timeout

1b8739c

Fridge003 merged commit 9457c04 into sgl-project:main Mar 4, 2026
175 of 203 checks passed

qeternity pushed a commit to qeternity/sglang that referenced this pull request Mar 6, 2026

[Qwen3.5] Enable MTP spec_v2 and add test for nvidia/Qwen3.5-397B-A17…

a88aec1

…B-NVFP4 (sgl-project#19391)

nvpohanh mentioned this pull request Mar 6, 2026

[Tracking] Qwen3.5-397B (G)B200 Functional Support and Optimizations #20024

Open

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Mar 6, 2026

[Qwen3.5] Enable MTP spec_v2 and add test for nvidia/Qwen3.5-397B-A17…

d6886be

…B-NVFP4 (sgl-project#19391)

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

[Qwen3.5] Enable MTP spec_v2 and add test for nvidia/Qwen3.5-397B-A17…

beea836

…B-NVFP4 (sgl-project#19391)

felixzhu555 mentioned this pull request Apr 1, 2026

[Bug] Qwen3.5-35B-A3B with spec v2 MTP crashes on startup #21813

Closed

5 tasks

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

[Qwen3.5] Enable MTP spec_v2 and add test for nvidia/Qwen3.5-397B-A17…

b759518

…B-NVFP4 (sgl-project#19391)

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[Qwen3.5] Enable MTP spec_v2 and add test for nvidia/Qwen3.5-397B-A17…

cba1f2d

…B-NVFP4 (sgl-project#19391)

Conversation

hlu1 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Accuracy

Benchmark

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Feb 26, 2026

Uh oh!

hlu1 commented Feb 26, 2026

Uh oh!

hlu1 commented Feb 28, 2026

Uh oh!

Uh oh!

vincentzed commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

hzh0425 left a comment

Choose a reason for hiding this comment

Uh oh!

hlu1 commented Mar 2, 2026

Uh oh!

hlu1 commented Mar 3, 2026

Uh oh!

hlu1 commented Mar 4, 2026

Uh oh!

hlu1 commented Mar 4, 2026

Uh oh!

hlu1 commented Mar 4, 2026

Uh oh!

hlu1 commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

hlu1 commented Feb 26, 2026 •

edited

Loading

vincentzed commented Mar 1, 2026 •

edited

Loading