[Feature] add --served-model-name arg for bench_serving by gujingit · Pull Request #4141 · sgl-project/sglang

gujingit · 2025-03-06T14:21:22Z

Motivation

add --served-model-name arg for bench_serving.py

Modifications

add --served-model-name arg for bench_serving.py

python3 -m sglang.bench_serving --backend vllm --model /models/qwq-32b --served-model-name qwq-32b --host qwq-32b-v1 --port 8000 --dataset-name random --random-input 256 --random-output 10 --num-prompts=10 --random-range-ratio 1.0 --dataset-path /root/ShareGPT_V3_unfiltered_cleaned_split.json --disable-stream

Output:

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
benchmark_args=Namespace(backend='vllm', base_url=None, host='qwq-32b-v1', port=8000, dataset_name='random', dataset_path='/root/ShareGPT_V3_unfiltered_cleaned_split.json', model='/models/qwq-32b', served_model_name='qwq-32b', tokenizer=None, num_prompts=10, sharegpt_output_len=None, sharegpt_context_len=None, random_input_len=256, random_output_len=10, random_range_ratio=1.0, request_rate=inf, max_concurrency=None, output_file=None, disable_tqdm=False, disable_stream=True, return_logprob=False, seed=1, disable_ignore_eos=False, extra_request_body=None, apply_chat_template=False, profile=False, lora_name=None, prompt_suffix='', pd_seperated=False, gsp_num_groups=64, gsp_prompts_per_group=16, gsp_system_prompt_len=2048, gsp_question_len=128, gsp_output_len=256)
Namespace(backend='vllm', base_url=None, host='qwq-32b-v1', port=8000, dataset_name='random', dataset_path='/root/ShareGPT_V3_unfiltered_cleaned_split.json', model='/models/qwq-32b', served_model_name='qwq-32b', tokenizer=None, num_prompts=10, sharegpt_output_len=None, sharegpt_context_len=None, random_input_len=256, random_output_len=10, random_range_ratio=1.0, request_rate=inf, max_concurrency=None, output_file=None, disable_tqdm=False, disable_stream=True, return_logprob=False, seed=1, disable_ignore_eos=False, extra_request_body=None, apply_chat_template=False, profile=False, lora_name=None, prompt_suffix='', pd_seperated=False, gsp_num_groups=64, gsp_prompts_per_group=16, gsp_system_prompt_len=2048, gsp_question_len=128, gsp_output_len=256)

#Input tokens: 2560
#Output tokens: 100
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
100%|███████████████████████████████████| 10/10 [00:01<00:00,  7.69it/s]

============ Serving Benchmark Result ============
Backend:                                 vllm
Traffic request rate:                    inf
Max reqeuest concurrency:                not set
Successful requests:                     10
Benchmark duration (s):                  1.30
Total input tokens:                      2560
Total generated tokens:                  100
Total generated tokens (retokenized):    100
Request throughput (req/s):              7.69
Input token throughput (tok/s):          1969.34
Output token throughput (tok/s):         76.93
Total token throughput (tok/s):          2046.27
Concurrency:                             9.99
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   1298.26
Median E2E Latency (ms):                 1298.23
---------------Time to First Token----------------
Mean TTFT (ms):                          1298.26
Median TTFT (ms):                        1298.24
P99 TTFT (ms):                           1298.93
---------------Inter-Token Latency----------------
Mean ITL (ms):                           0.00
Median ITL (ms):                         0.00
P95 ITL (ms):                            0.00
P99 ITL (ms):                            0.00
Max ITL (ms):                            0.00
==================================================

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

Signed-off-by: zibai <zibai.gj@alibaba-inc.com>

gujingit · 2025-03-06T14:21:54Z

vllm pr: vllm-project/vllm#12109

gujingit · 2025-03-10T06:15:53Z

/assign @zhyncs

hnyls2002 · 2025-09-09T03:31:18Z

@gujingit why is this needed?

hnyls2002 · 2025-09-15T13:07:46Z

Please resolve the conflicts and explain why this argument should be introduced in bench_serving then we can reopen it.

add --served-model-name arg for bench_serving

8dcd5a6

Signed-off-by: zibai <zibai.gj@alibaba-inc.com>

Merge branch 'main' into feature/add-served-model-name

cc26e80

Merge branch 'main' into feature/add-served-model-name

b23b6a7

gujingit changed the title ~~add --served-model-name arg for bench_serving~~ [Feature] add --served-model-name arg for bench_serving Apr 2, 2025

Merge branch 'main' into feature/add-served-model-name

2dc0117

hnyls2002 self-assigned this Sep 15, 2025

hnyls2002 closed this Sep 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] add --served-model-name arg for bench_serving#4141

[Feature] add --served-model-name arg for bench_serving#4141
gujingit wants to merge 4 commits intosgl-project:mainfrom
gujingit:feature/add-served-model-name

gujingit commented Mar 6, 2025 •

edited

Loading

Uh oh!

gujingit commented Mar 6, 2025

Uh oh!

gujingit commented Mar 10, 2025

Uh oh!

hnyls2002 commented Sep 9, 2025

Uh oh!

hnyls2002 commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gujingit commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

gujingit commented Mar 6, 2025

Uh oh!

gujingit commented Mar 10, 2025

Uh oh!

hnyls2002 commented Sep 9, 2025

Uh oh!

hnyls2002 commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gujingit commented Mar 6, 2025 •

edited

Loading