-
Notifications
You must be signed in to change notification settings - Fork 255
[BugFix] fix ep=1 etp=16 #985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: ttanzhiqiang <[email protected]>
Signed-off-by: ttanzhiqiang <[email protected]>
Signed-off-by: ttanzhiqiang <[email protected]>
Signed-off-by: ttanzhiqiang <[email protected]>
Signed-off-by: ttanzhiqiang <[email protected]>
Signed-off-by: ttanzhiqiang <[email protected]>
Signed-off-by: ttanzhiqiang <[email protected]>
Signed-off-by: ttanzhiqiang <[email protected]>
The latest branch is running smoothly, vllm-ascend: commit 6eddbd2 ![]() ![]() |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
#1012 do it |
What this PR does / why we need it?
Fixed ep=1 etp=16 bug #971, Refer to #863 this pr
Does this PR introduce any user-facing change?
Added etp logic branch in deepseekv2 and fused_moe
How was this patch tested?
nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM

--trust-remote-code
--distributed-executor-backend=mp
-tp=16
-dp=1
--port 8006
--max-num-seqs 24
--max-model-len 32768
--max-num-batched-tokens 32768
--block-size 128
--enable-expert-parallel
--compilation_config 0
--gpu-memory-utilization 0.96
--additional-config '{"expert_tensor_parallel_size":1, "ascend_scheduler_config":{}}' &> run.log &
nohup python -m vllm.entrypoints.openai.api_server --model=/mnt/deepseek/DeepSeek-R1-W8A8-VLLM

--trust-remote-code
--distributed-executor-backend=mp
-tp=16
-dp=1
--port 8006
--max-num-seqs 24
--max-model-len 32768
--max-num-batched-tokens 32768
--block-size 128
--enable-expert-parallel
--compilation_config 0
--gpu-memory-utilization 0.96
--additional-config '{"expert_tensor_parallel_size":16, "ascend_scheduler_config":{}}' &> run.log &