[FixBug]online serving fails for high-resolution videos by princepride · Pull Request #198 · vllm-project/vllm-omni

princepride · 2025-12-04T15:27:52Z

Purpose

This PR resolves a critical bug in the Qwen3-Omni model's deepstack feature that caused crashes when processing high-resolution videos. The root cause was incorrect tensor dimension indexing when retrieving the sequence length from input_ids, which led to shape mismatches between visual embeddings and hidden states during the deepstack processing.

Root Cause:

input_ids has shape [batch_size, seq_len], e.g., [1, 8192]
The code incorrectly used input_ids.size(0) to get sequence length, which returned the batch size (1) instead of the actual sequence length
This caused only 1 token's worth of deepstack embeddings to be retrieved from the buffer, while the model expected the full sequence length (e.g., 8192 or 3701 tokens)

Fix:
Changed input_ids.size(0) to input_ids.size(1) in two locations to correctly retrieve the sequence length (second dimension) instead of batch size (first dimension).
BTW, I used two H200s for testing, and an OOM error occurred. Therefore, I adjusted the qwen3 omni deployment file.

Test Plan

Setup: Start vLLM-Omni server with Qwen3-Omni-30B-A3B-Instruct model
```
vllm serve /path/to/Qwen3-Omni-30B-A3B-Instruct --omni --port 8091
```

Test with high-resolution video: Run the multimodal generation client with a large video file

python examples/online_serving/qwen3_omni/openai_chat_completion_client_for_multimodal_generation.py \
    --query-type use_video \
    --video-path sample_demo_2.mp4 \
    --prompt "explain this video" \
    --model /path/to/Qwen3-Omni-30B-A3B-Instruct

sample_demo_2.mp4

Test Result

audio_0.wav

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

Signed-off-by: princepride <wangzhipeng628@gmail.com>

chatgpt-codex-connector · 2025-12-04T15:27:59Z

The account who enabled Codex for this repo no longer has access to Codex. Please contact the admins of this repo to enable Codex again.

SamitHuang

nice bugfix!

vllm_omni/model_executor/stage_configs/qwen3_omni_moe.yaml

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

princepride · 2025-12-04T23:47:14Z

@SamitHuang I already revert the yaml change, can you help merge it, thank you!😊

Gaohan123

LGTM. Great!

david6666666 · 2025-12-05T02:33:50Z

nice catch !

…#198) Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

…#198) Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Signed-off-by: Fanli Lin <fanli.lin@intel.com>

…#198) Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

fix online serving fails for high-resolution videos

fe2bb7f

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride requested a review from hsliuustc0106 as a code owner December 4, 2025 15:27

SamitHuang reviewed Dec 4, 2025

View reviewed changes

vllm_omni/model_executor/stage_configs/qwen3_omni_moe.yaml Outdated Show resolved Hide resolved

revert yaml

5fa2b49

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

princepride requested a review from SamitHuang December 4, 2025 23:46

Gaohan123 approved these changes Dec 5, 2025

View reviewed changes

Gaohan123 enabled auto-merge (squash) December 5, 2025 02:05

Merge branch 'main' into fix-qwen3-omni-high-resolution

e60b8e5

Gaohan123 merged commit f3c69df into vllm-project:main Dec 5, 2025
4 checks passed

david6666666 mentioned this pull request Dec 10, 2025

[WIP][NPU][Model] Support Qwen3-Omni for NPU #266

Closed

5 tasks

princepride mentioned this pull request Dec 11, 2025

[Bug]: ValueError: Free memory on device is less than desired GPU memory utilization when serving Qwen3-Omni-30B-A3B-Instruct with 2×A100 (80GB) #276

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FixBug]online serving fails for high-resolution videos#198

[FixBug]online serving fails for high-resolution videos#198
Gaohan123 merged 3 commits intovllm-project:mainfrom
princepride:fix-qwen3-omni-high-resolution

princepride commented Dec 4, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot commented Dec 4, 2025

Uh oh!

SamitHuang left a comment

Uh oh!

Uh oh!

princepride commented Dec 4, 2025 •

edited

Loading

Uh oh!

Gaohan123 left a comment

Uh oh!

Uh oh!

david6666666 commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

princepride commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot commented Dec 4, 2025

Uh oh!

SamitHuang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

princepride commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

david6666666 commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

princepride commented Dec 4, 2025 •

edited

Loading

princepride commented Dec 4, 2025 •

edited

Loading