Skip to content

[Feature] add --served-model-name arg for bench_serving#4141

Closed
gujingit wants to merge 4 commits intosgl-project:mainfrom
gujingit:feature/add-served-model-name
Closed

[Feature] add --served-model-name arg for bench_serving#4141
gujingit wants to merge 4 commits intosgl-project:mainfrom
gujingit:feature/add-served-model-name

Conversation

@gujingit
Copy link
Copy Markdown

@gujingit gujingit commented Mar 6, 2025

Motivation

add --served-model-name arg for bench_serving.py

Modifications

add --served-model-name arg for bench_serving.py

python3 -m sglang.bench_serving --backend vllm --model /models/qwq-32b --served-model-name qwq-32b --host qwq-32b-v1 --port 8000 --dataset-name random --random-input 256 --random-output 10 --num-prompts=10 --random-range-ratio 1.0 --dataset-path /root/ShareGPT_V3_unfiltered_cleaned_split.json --disable-stream

Output:

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
benchmark_args=Namespace(backend='vllm', base_url=None, host='qwq-32b-v1', port=8000, dataset_name='random', dataset_path='/root/ShareGPT_V3_unfiltered_cleaned_split.json', model='/models/qwq-32b', served_model_name='qwq-32b', tokenizer=None, num_prompts=10, sharegpt_output_len=None, sharegpt_context_len=None, random_input_len=256, random_output_len=10, random_range_ratio=1.0, request_rate=inf, max_concurrency=None, output_file=None, disable_tqdm=False, disable_stream=True, return_logprob=False, seed=1, disable_ignore_eos=False, extra_request_body=None, apply_chat_template=False, profile=False, lora_name=None, prompt_suffix='', pd_seperated=False, gsp_num_groups=64, gsp_prompts_per_group=16, gsp_system_prompt_len=2048, gsp_question_len=128, gsp_output_len=256)
Namespace(backend='vllm', base_url=None, host='qwq-32b-v1', port=8000, dataset_name='random', dataset_path='/root/ShareGPT_V3_unfiltered_cleaned_split.json', model='/models/qwq-32b', served_model_name='qwq-32b', tokenizer=None, num_prompts=10, sharegpt_output_len=None, sharegpt_context_len=None, random_input_len=256, random_output_len=10, random_range_ratio=1.0, request_rate=inf, max_concurrency=None, output_file=None, disable_tqdm=False, disable_stream=True, return_logprob=False, seed=1, disable_ignore_eos=False, extra_request_body=None, apply_chat_template=False, profile=False, lora_name=None, prompt_suffix='', pd_seperated=False, gsp_num_groups=64, gsp_prompts_per_group=16, gsp_system_prompt_len=2048, gsp_question_len=128, gsp_output_len=256)

#Input tokens: 2560
#Output tokens: 100
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
100%|███████████████████████████████████| 10/10 [00:01<00:00,  7.69it/s]

============ Serving Benchmark Result ============
Backend:                                 vllm
Traffic request rate:                    inf
Max reqeuest concurrency:                not set
Successful requests:                     10
Benchmark duration (s):                  1.30
Total input tokens:                      2560
Total generated tokens:                  100
Total generated tokens (retokenized):    100
Request throughput (req/s):              7.69
Input token throughput (tok/s):          1969.34
Output token throughput (tok/s):         76.93
Total token throughput (tok/s):          2046.27
Concurrency:                             9.99
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   1298.26
Median E2E Latency (ms):                 1298.23
---------------Time to First Token----------------
Mean TTFT (ms):                          1298.26
Median TTFT (ms):                        1298.24
P99 TTFT (ms):                           1298.93
---------------Inter-Token Latency----------------
Mean ITL (ms):                           0.00
Median ITL (ms):                         0.00
P95 ITL (ms):                            0.00
P99 ITL (ms):                            0.00
Max ITL (ms):                            0.00
==================================================

Checklist

Signed-off-by: zibai <zibai.gj@alibaba-inc.com>
@gujingit
Copy link
Copy Markdown
Author

gujingit commented Mar 6, 2025

vllm pr: vllm-project/vllm#12109

@gujingit
Copy link
Copy Markdown
Author

/assign @zhyncs

@gujingit gujingit changed the title add --served-model-name arg for bench_serving [Feature] add --served-model-name arg for bench_serving Apr 2, 2025
@hnyls2002
Copy link
Copy Markdown
Collaborator

@gujingit why is this needed?

@hnyls2002 hnyls2002 self-assigned this Sep 15, 2025
@hnyls2002
Copy link
Copy Markdown
Collaborator

Please resolve the conflicts and explain why this argument should be introduced in bench_serving then we can reopen it.

@hnyls2002 hnyls2002 closed this Sep 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants