fix pp for deepseek #6152

zhjc1124 · 2025-05-09T10:02:12Z

Motivation

run DeepSeek in pp.
#5724 #5925

$python3 -m sglang.bench_one_batch_server --model /data/modelscope/DeepSeek-Coder-V2-Lite-Instruct/ --batch-size 1 --trust-remote-code --base-gpu-id 4 --port 38884 --pp 2 --tp 2
batch size: 16
latency: 2.50 s
output throughput: 102.25 token/s
(input + output) throughput: 6645.98 token/s
[2025-05-09 17:37:07 TP0 PP0] Prefill batch. #new-seq: 1, #new-token: 1024, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-05-09 17:37:07 TP0 PP1] Prefill batch. #new-seq: 1, #new-token: 1024, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-05-09 17:37:07 TP0 PP0] Decode batch. #running-req: 1, #token: 1027, token usage: 0.00, gen throughput (token/s): 20.20, #queue-req: 0
[2025-05-09 17:37:07 TP0 PP1] Decode batch. #running-req: 1, #token: 1027, token usage: 0.00, gen throughput (token/s): 20.19, #queue-req: 0
[2025-05-09 17:37:07] INFO: 127.0.0.1:59888 - "POST /generate HTTP/1.1" 200 OK
batch size: 1
latency: 0.17 s
output throughput: 95.08 token/s
(input + output) throughput: 6180.45 token/s

Modifications

modify deepseep_v2.py to fit pp.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

Edenzzzz · 2025-05-14T13:34:03Z

@zhjc1124 Can I ask why you closed it? Is it because deepseek should use EP instead?

zhjc1124 · 2025-05-15T02:33:34Z

@zhjc1124 Can I ask why you closed it? Is it because deepseek should use EP instead?

I saw ur comment too.
Once I found I failed to run DeepSeek-R1 in three nodes with tp=8 pp=3. So I considered if there are some compatibility with mla backend need to handle.
In fact, I found there was something wrong in my machine nccl configuration. Now I run DeepSeek-R1 with tp=8 and pp=3 sucessfully.But I am still not sure if the compatibility with mla backend need to be handled

fix pp for deepseek

9d4edb1

zhjc1124 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners May 9, 2025 10:02

zhjc1124 closed this May 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix pp for deepseek #6152

fix pp for deepseek #6152

Uh oh!

zhjc1124 commented May 9, 2025

Uh oh!

Edenzzzz commented May 14, 2025 •

edited

Loading

Uh oh!

zhjc1124 commented May 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

fix pp for deepseek #6152

fix pp for deepseek #6152

Uh oh!

Conversation

zhjc1124 commented May 9, 2025

Motivation

Modifications

Checklist

Uh oh!

Edenzzzz commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhjc1124 commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Edenzzzz commented May 14, 2025 •

edited

Loading

zhjc1124 commented May 15, 2025 •

edited

Loading