-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Motivation
According to #2653, the Deepseek-vl2 models are supported, but not all models in the series are supported as I used. Deepseek-vl2's models series is composed of three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively. The tiny models has different model structure with small and normal model, which MLA (Multi-Head Latent Attention) is disabled. And if using DeepseekV2ForCasualLLM as a language model, the qk_nope_head_dim and qk_rope_head_dim added as qk_head_dim will cause a ZeroDivisionError in the later sampling var calculation for **-0.5 operation (
self.scaling = self.qk_head_dim**-0.5 |
Besides, the chat template in deepseek-vl2 is also not aligned with vllm. A <image>
will be added in the end of prompt automatically during the chat conversation generation process. If user pass <image>
in their prompt to denote the image, then there will be a dismatch between real image counts and prompt <image>
counts.
I have already validated the deepseek-vl2-tiny model locally, ensuring that the output results are consistent with those from vllm. Additionally, I’ve proved some performance improvements, with speeds potentially being 5% to 20% faster depending on the number of decoding steps (thanks to the excellent sglang backend). I’m wondering if I can contribute to this feature. with just a little more work of reorganizing code elegantly and adding some tests. I’m really looking forward to receiving feedback from the community. Thanks!
Related resources
Difference between tiny and normal size models.
[deepseek-vl2] https://huggingface.co/deepseek-ai/deepseek-vl2/blob/main/config.json
[deepseek-vl2-tiny] https://huggingface.co/deepseek-ai/deepseek-vl2-tiny/blob/main/config.json
Special token maps:
https://huggingface.co/deepseek-ai/deepseek-vl2-small/blob/main/special_tokens_map.json
Deepseek-vl2 chat examples (with <image> input):
https://github.com/deepseek-ai/DeepSeek-VL2?tab=readme-ov-file#simple-inference-example-with-one-image
Some code suggestions and examples from vllm:
https://github.com/vllm-project/vllm/blob/686623c5e7a0ee0c7679c052ced565dd83055709/vllm/model_executor/models/deepseek_vl2.py#L355