VLLM depoly Qwen2.5_omni server error

### System Info

```bash
INFO 07-01 03:29:45 [__init__.py:244] Automatically detected platform cuda.
Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version                : Could not collect
CMake version                : version 3.30.2
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.7.1+cu126
Is debug build               : False
CUDA used to build PyTorch   : 12.6
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-5.4.0-169-generic-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.6.20
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : GPU 0: NVIDIA A100-SXM4-80GB
Nvidia driver version        : 535.216.03
cuDNN version                : Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.3.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.3.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.3.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.3.0
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.3.0
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.3.0
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.3.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.3.0
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      43 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             128
On-line CPU(s) list:                0-127
Vendor ID:                          AuthenticAMD
Model name:                         AMD EPYC 7763 64-Core Processor
CPU family:                         25
Model:                              1
Thread(s) per core:                 1
Core(s) per socket:                 64
Socket(s):                          2
Stepping:                           1
Frequency boost:                    enabled
CPU max MHz:                        2450.0000
CPU min MHz:                        1500.0000
BogoMIPS:                           4890.68
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca sme sev sev_es
Virtualization:                     AMD-V
L1d cache:                          4 MiB (128 instances)
L1i cache:                          4 MiB (128 instances)
L2 cache:                           64 MiB (128 instances)
L3 cache:                           512 MiB (16 instances)
NUMA node(s):                       4
NUMA node0 CPU(s):                  0-31
NUMA node1 CPU(s):                  32-63
NUMA node2 CPU(s):                  64-95
NUMA node3 CPU(s):                  96-127
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec store bypass:    Vulnerable
Vulnerability Spectre v1:           Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:           Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flash_attn==2.8.0.post2
[pip3] flake8==7.1.1
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cudnn-frontend==1.5.2
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-cufile-cu12==1.11.1.6
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-dali-cuda120==1.40.0
[pip3] nvidia-ml-py==12.575.51
[pip3] nvidia-ml-py3==7.352.0
[pip3] nvidia-modelopt==0.15.0
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvimgcodec-cu12==0.3.0.5
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] nvidia-pyindex==1.0.9
[pip3] nvidia-smi==0.1.3
[pip3] onnx==1.16.1
[pip3] onnxruntime-gpu==1.17.1
[pip3] onnxsim==0.4.36
[pip3] open-clip-torch==2.24.0
[pip3] optree==0.13.0
[pip3] pynvml==12.0.0
[pip3] pytorch-lightning==2.2.4
[pip3] pytorch-triton==3.0.0+dedb7bdf3
[pip3] pyzmq==26.2.0
[pip3] sentence-transformers==4.1.0
[pip3] torch==2.7.1
[pip3] torchaudio==2.7.0
[pip3] torchmetrics==1.4.0.post0
[pip3] torchpack==0.3.1
[pip3] torchprofile==0.0.4
[pip3] torchvision==0.22.1
[pip3] transformers==4.52.4
[pip3] transformers-stream-generator==0.0.5
[pip3] triton==3.3.1
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
Neuron SDK Version           : N/A
vLLM Version                 : 0.9.1
vLLM Build Flags:
  CUDA Archs: 5.2 6.0 6.1 7.0 7.2 7.5 8.0 8.6 8.7 9.0+PTX; ROCm: Disabled; Neuron: Disabled
GPU Topology:
        GPU0    NIC0    NIC1    NIC2    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      SYS     SYS     SYS     64-95   2               N/A
NIC0    SYS      X      SYS     SYS
NIC1    SYS     SYS      X      SYS
NIC2    SYS     SYS     SYS      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=GPU-84d7af4f-bb6d-9c62-0358-bcf0488cbbe5
CUBLAS_VERSION=12.6.0.22
NVIDIA_REQUIRE_CUDA=cuda>=9.0
CUDA_CACHE_DISABLE=1
TORCH_CUDA_ARCH_LIST=5.2 6.0 6.1 7.0 7.2 7.5 8.0 8.6 8.7 9.0+PTX
NCCL_VERSION=2.22.3
NVIDIA_DRIVER_CAPABILITIES=video,compute,utility,graphics
NVIDIA_PRODUCT_NAME=PyTorch
CUDA_VERSION=12.6.0.022
PYTORCH_VERSION=2.5.0a0+872d972
PYTORCH_BUILD_NUMBER=0
CUDNN_FRONTEND_VERSION=1.5.2
CUDNN_VERSION=9.3.0.75
PYTORCH_HOME=/opt/pytorch/pytorch
LD_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/torch/lib:/usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
NVIDIA_BUILD_ID=107063150
CUDA_DRIVER_VERSION=560.35.03
PYTORCH_BUILD_VERSION=2.5.0a0+872d972
CUDA_HOME=/usr/local/cuda
CUDA_HOME=/usr/local/cuda
CUDA_MODULE_LOADING=LAZY
NVIDIA_REQUIRE_JETPACK_HOST_MOUNTS=
NVIDIA_PYTORCH_VERSION=24.08
TORCH_ALLOW_TF32_CUBLAS_OVERRIDE=1
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
```

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction


But when I use the following code to deploy the vllm servic
```bash
VLLM_USE_V1=0 \
vllm serve "Qwen/Qwen2.5-Omni-3B" \
    --port "8080" \
    --dtype bfloat16 \
    --allowed-local-media-path / \
    --served-model-name "Qwen2.5-Omni-3B" \
    --limit-mm-per-prompt "image=12"
```

transformers==4.53.0 with https://github.com/huggingface/transformers/pull/39125
reported this error
```log
ERROR 07-01 03:20:42 [engine.py:458] cu_seqlens_q must have shape (batch_size + 1)
ERROR 07-01 03:20:42 [engine.py:458] Traceback (most recent call last):
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 446, in run_mp_engine
ERROR 07-01 03:20:42 [engine.py:458]     engine = MQLLMEngine.from_vllm_config(
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 133, in from_vllm_config
ERROR 07-01 03:20:42 [engine.py:458]     return cls(
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 87, in __init__
ERROR 07-01 03:20:42 [engine.py:458]     self.engine = LLMEngine(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 268, in __init__
ERROR 07-01 03:20:42 [engine.py:458]     self._initialize_kv_caches()
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 413, in _initialize_kv_caches
ERROR 07-01 03:20:42 [engine.py:458]     self.model_executor.determine_num_available_blocks())
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 104, in determine_num_available_blocks
ERROR 07-01 03:20:42 [engine.py:458]     results = self.collective_rpc("determine_num_available_blocks")
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
ERROR 07-01 03:20:42 [engine.py:458]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/utils.py", line 2671, in run_method
ERROR 07-01 03:20:42 [engine.py:458]     return func(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-01 03:20:42 [engine.py:458]     return func(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/worker/worker.py", line 256, in determine_num_available_blocks
ERROR 07-01 03:20:42 [engine.py:458]     self.model_runner.profile_run()
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-01 03:20:42 [engine.py:458]     return func(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1300, in profile_run
ERROR 07-01 03:20:42 [engine.py:458]     self._dummy_run(max_num_batched_tokens, max_num_seqs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1426, in _dummy_run
ERROR 07-01 03:20:42 [engine.py:458]     self.execute_model(model_input, kv_caches, intermediate_tensors)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-01 03:20:42 [engine.py:458]     return func(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1844, in execute_model
ERROR 07-01 03:20:42 [engine.py:458]     hidden_or_intermediate_states = model_executable(
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 07-01 03:20:42 [engine.py:458]     return self._call_impl(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 07-01 03:20:42 [engine.py:458]     return forward_call(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_5_omni_thinker.py", line 875, in forward
ERROR 07-01 03:20:42 [engine.py:458]     multimodal_embeddings = self.get_multimodal_embeddings_v0(**kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_5_omni_thinker.py", line 831, in get_multimodal_embeddings_v0
ERROR 07-01 03:20:42 [engine.py:458]     audio_embeds = self._process_audio_input(audio_input)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_5_omni_thinker.py", line 652, in _process_audio_input
ERROR 07-01 03:20:42 [engine.py:458]     audio_outputs = self.audio_tower(
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 07-01 03:20:42 [engine.py:458]     return self._call_impl(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 07-01 03:20:42 [engine.py:458]     return forward_call(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py", line 838, in forward
ERROR 07-01 03:20:42 [engine.py:458]     layer_outputs = encoder_layer(hidden_states, cu_seqlens, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/transformers/modeling_layers.py", line 83, in __call__
ERROR 07-01 03:20:42 [engine.py:458]     return super().__call__(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 07-01 03:20:42 [engine.py:458]     return self._call_impl(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 07-01 03:20:42 [engine.py:458]     return forward_call(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py", line 704, in forward
ERROR 07-01 03:20:42 [engine.py:458]     hidden_states = self.self_attn(
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 07-01 03:20:42 [engine.py:458]     return self._call_impl(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 07-01 03:20:42 [engine.py:458]     return forward_call(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/transformers/models/qwen2_5_omni/modeling_qwen2_5_omni.py", line 650, in forward
ERROR 07-01 03:20:42 [engine.py:458]     attn_output, _ = attention_interface(
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/transformers/integrations/flash_attention.py", line 65, in flash_attention_forward
ERROR 07-01 03:20:42 [engine.py:458]     attn_output = _flash_attention_forward(
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/transformers/modeling_flash_attention_utils.py", line 520, in _flash_attention_forward
ERROR 07-01 03:20:42 [engine.py:458]     attn_output_unpad = _flash_attn_varlen_func(
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 1443, in flash_attn_varlen_func
ERROR 07-01 03:20:42 [engine.py:458]     return FlashAttnVarlenFunc.apply(
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 575, in apply
ERROR 07-01 03:20:42 [engine.py:458]     return super().apply(*args, **kwargs)  # type: ignore[misc]
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 925, in forward
ERROR 07-01 03:20:42 [engine.py:458]     out_padded, softmax_lse, S_dmask, rng_state = _wrapped_flash_attn_varlen_forward(
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/_ops.py", line 1158, in __call__
ERROR 07-01 03:20:42 [engine.py:458]     return self._op(*args, **(kwargs or {}))
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/_library/custom_ops.py", line 335, in backend_impl
ERROR 07-01 03:20:42 [engine.py:458]     result = self._backend_fns[device_type](*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/_compile.py", line 51, in inner
ERROR 07-01 03:20:42 [engine.py:458]     return disable_fn(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
ERROR 07-01 03:20:42 [engine.py:458]     return fn(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/torch/_library/custom_ops.py", line 367, in wrapped_fn
ERROR 07-01 03:20:42 [engine.py:458]     return fn(*args, **kwargs)
ERROR 07-01 03:20:42 [engine.py:458]   File "/home/jun.zhou10/.local/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 165, in _flash_attn_varlen_forward
ERROR 07-01 03:20:42 [engine.py:458]     out, softmax_lse, S_dmask, rng_state = flash_attn_gpu.varlen_fwd(
ERROR 07-01 03:20:42 [engine.py:458] RuntimeError: cu_seqlens_q must have shape (batch_size + 1)
```
full error log in [log.txt](https://github.com/user-attachments/files/20990247/log.txt)

It runs normally under 4.52.4

### Expected behavior

start server

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VLLM depoly Qwen2.5_omni server error #39141

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VLLM depoly Qwen2.5_omni server error #39141

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions