[Qwen3.5] Enable MTP spec_v2 and add test for nvidia/Qwen3.5-397B-A17B-NVFP4#19391
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
/tag-and-rerun-ci |
|
/tag-and-rerun-ci |
|
I tested trtllm_mha under this as well: Triton |
ShangmingCai
left a comment
There was a problem hiding this comment.
LGTM as long as the CI passes. CC: @yizhang2077 Please double check.
|
The gb200 CI is temporarily disabled. I ran the tests locally and they all pass. |
Co-authored-by: vincentzed <207368749+vincentzed@users.noreply.github.com> Co-authored-by: lzy <tomlzy213@gmail.com>
|
/rerun-failed-ci |
|
/rerun-failed-ci |
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
|
Both |
Motivation
mm_input_embedsto the MTP head.nvidia/Qwen3.5-397B-A17B-NVFP4and check acceptance length in the MTP tests. Note that it uses the eval harnessfrom sglang.test.run_eval import run_evalwhich applies the chat_template. Without the chat_template, the accuracy is very bad. The sampling parameters are based on the official recommendation from https://huggingface.co/Qwen/Qwen3.5-397B-A17B-FP8co-author: @vincentzed #18906
Accuracy
Without radix cache
With radix cache
Benchmark
radix cache off
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci