Skip to content

vllm 驱动Qwq32B打开 sliding window失败 #20

@SmallBlueE

Description

@SmallBlueE

工具: Vllm=0.8.4
模型:Qwq32B
配置:
{
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 27648,
"max_position_embeddings": 40960,
"max_window_layers": 64,
"model_type": "qwen2",
"num_attention_heads": 40,
"num_hidden_layers": 64,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": 40960,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.1",
"use_cache": true,
"use_sliding_window": true,
"vocab_size": 152064
}

启动命令:
python -m vllm.entrypoints.openai.api_server --model /home/user/Models/QwQ-32B --host "::" --port 8600 --tensor-parallel-size 8 --gpu-memory-utilization 0.95 --max-model-len 40960 --dtype bfloat16 --max-num-seqs 16 --served-model-name qwq32b --swap-space 10 --enable_prefix_caching --enable-chunked-prefill --use-v2-block-manager --enforce-eager --disable-custom-all-reduce --trust-remote-code

错误信息:

(VllmWorker rank=1 pid=3457224) raise ValueError("Sliding window for some but all layers is not "
(VllmWorker rank=1 pid=3457224) ValueError: Sliding window for some but all layers is not supported. This model uses sliding window but max_window_layers = 64 is less than num_hidden_layers = 64. Please open an issue to discuss this feature.
CRITICAL 04-28 19:54:55 [core_client.py:359] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
(VllmWorker rank=1 pid=3457224) Exception ignored in atexit callback: <function shutdown at 0x7227b809c9d0>
(VllmWorker rank=1 pid=3457224) Traceback (most recent call last):

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions