Skip to content

[Bug] Qwen2.5-VL-72B image input not working in SGLang, works fine in vLLM #4645

@qWaitCrypto

Description

@qWaitCrypto

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

🧾 Description:

When deploying qwen2.5-vl-72b-awq using SGLang, image inputs (via image_url) are not correctly handled. The same prompt works as expected in vLLM, where the model successfully describes the image.

Reproduction

✅ Reproduction Steps:

✅ SGLang Launch Command:

python -m sglang.launch_server \
  --model-path qwen-vl-72b \
  --port 30000 \
  --trust-remote-code \
  --host 0.0.0.0 \
  --mem-fraction-static 0.8 \
  --tp 4 \
  --tool-call-parser qwen25

✅ OpenAI-Compatible API Call (cURL):

curl -X POST "http://0.0.0.0:30000/v1/chat/completions" \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-vl",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "describe this picture"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
            }
          }
        ]
      }
    ],
    "top_p": 0.8
  }'

🧾 SGLang Response:

{
  "id": "803d3c01743b4429b61c0a83d60eda5b",
  "object": "chat.completion",
  "created": 1742528000,
  "model": "qwen2.5-vl",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'm sorry, but I cannot see any picture attached to your message. Could you please provide more information or upload the picture again? I'll do my best to describe it for you."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 22,
    "completion_tokens": 41,
    "total_tokens": 63
  }
}

✅ Comparison with vLLM:

Using the exact same model and cURL request, the image is successfully described in vLLM deployment. This confirms that the issue is not with the prompt or model, but with how SGLang handles image_url type content in the message payload.


📌 Expected Behavior:

SGLang should support OpenAI-compatible image inputs by correctly parsing messages.content[].image_url.url and feeding the image into the model’s visual encoder.

Environment

🧪 Environment:

  • Model: qwen2.5-vl-72b-awq
  • Deployment: SGLang 0.4.4.post1
  • API Protocol: OpenAI-compatible Chat Completions API
  • vLLM Behavior: ✅ Working as expected

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions