Skip to content

[Feature] Support Deepseek-vl2-tiny model, in which mla is disabled #5537

@bppps

Description

@bppps

Checklist

Motivation

According to #2653, the Deepseek-vl2 models are supported, but not all models in the series are supported as I used. Deepseek-vl2's models series is composed of three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small and DeepSeek-VL2, with 1.0B, 2.8B and 4.5B activated parameters respectively. The tiny models has different model structure with small and normal model, which MLA (Multi-Head Latent Attention) is disabled. And if using DeepseekV2ForCasualLLM as a language model, the qk_nope_head_dim and qk_rope_head_dim added as qk_head_dim will cause a ZeroDivisionError in the later sampling var calculation for **-0.5 operation (

self.scaling = self.qk_head_dim**-0.5
). There is a stardard solution in vllm, which replaces DeepseekV2ForCasualLLM with DeepseekForCasualLLM.

Besides, the chat template in deepseek-vl2 is also not aligned with vllm. A <image> will be added in the end of prompt automatically during the chat conversation generation process. If user pass <image> in their prompt to denote the image, then there will be a dismatch between real image counts and prompt <image> counts.

I have already validated the deepseek-vl2-tiny model locally, ensuring that the output results are consistent with those from vllm. Additionally, I’ve proved some performance improvements, with speeds potentially being 5% to 20% faster depending on the number of decoding steps (thanks to the excellent sglang backend). I’m wondering if I can contribute to this feature. with just a little more work of reorganizing code elegantly and adding some tests. I’m really looking forward to receiving feedback from the community. Thanks!

Related resources

Difference between tiny and normal size models.
[deepseek-vl2] https://huggingface.co/deepseek-ai/deepseek-vl2/blob/main/config.json
[deepseek-vl2-tiny] https://huggingface.co/deepseek-ai/deepseek-vl2-tiny/blob/main/config.json

Special token maps:
https://huggingface.co/deepseek-ai/deepseek-vl2-small/blob/main/special_tokens_map.json

Deepseek-vl2 chat examples (with <image> input):
https://github.com/deepseek-ai/DeepSeek-VL2?tab=readme-ov-file#simple-inference-example-with-one-image

Some code suggestions and examples from vllm:
https://github.com/vllm-project/vllm/blob/686623c5e7a0ee0c7679c052ced565dd83055709/vllm/model_executor/models/deepseek_vl2.py#L355

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions