Skip to content

Commit 0651687

Browse files
clean up chat-template for VLMs
Signed-off-by: Xinyuan Tong <[email protected]>
1 parent 79961af commit 0651687

14 files changed

+16
-50
lines changed

benchmark/mmmu/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
Host the VLM:
66

77
```
8-
python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --chat-template qwen2-vl --port 30000
8+
python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --port 30000
99
```
1010

1111
It's recommended to reduce the memory usage by appending something like `--mem-fraction-static 0.6` to the command above.

benchmark/mmmu/bench_sglang.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Bench the sglang-hosted vLM with benchmark MMMU
33
44
Usage:
5-
Host the VLM: python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --chat-template qwen2-vl --port 30000
5+
Host the VLM: python -m sglang.launch_server --model-path Qwen/Qwen2-VL-7B-Instruct --port 30000
66
77
Benchmark: python benchmark/mmmu/bench_sglang.py --port 30000 --concurrency 16
88

docs/backend/openai_api_vision.ipynb

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,7 @@
2727
"source": [
2828
"## Launch A Server\n",
2929
"\n",
30-
"Launch the server in your terminal and wait for it to initialize.\n",
31-
"\n",
32-
"**Remember to add** `--chat-template` **for example** `--chat-template=qwen2-vl` **to specify the [vision chat template](https://docs.sglang.ai/backend/openai_api_vision.html#Chat-Template), otherwise, the server will only support text (images won’t be passed in), which can lead to degraded performance.**\n",
33-
"\n",
34-
"We need to specify `--chat-template` for vision language models because the chat template provided in Hugging Face tokenizer only supports text."
30+
"Launch the server in your terminal and wait for it to initialize."
3531
]
3632
},
3733
{
@@ -51,8 +47,7 @@
5147
"\n",
5248
"vision_process, port = launch_server_cmd(\n",
5349
" \"\"\"\n",
54-
"python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct \\\n",
55-
" --chat-template=qwen2-vl\n",
50+
"python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-VL-7B-Instruct\n",
5651
"\"\"\"\n",
5752
")\n",
5853
"\n",
@@ -255,9 +250,9 @@
255250
"cell_type": "markdown",
256251
"metadata": {},
257252
"source": [
258-
"## Chat Template\n",
253+
"## Chat Template (for sglang version < 0.4.6.post2)\n",
259254
"\n",
260-
"As mentioned before, if you do not specify a vision model's `--chat-template`, the server uses Hugging Face's default template, which only supports text.\n",
255+
"If you do not specify a vision model's `--chat-template`, the server uses Hugging Face's default template, which only supports text, and may lead to degraded performance.\n",
261256
"\n",
262257
"We list popular vision models with their chat templates:\n",
263258
"\n",

docs/backend/sampling_params.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ Detailed example in [openai compatible api](https://docs.sglang.ai/backend/opena
135135
Launch a server:
136136

137137
```bash
138-
python3 -m sglang.launch_server --model-path lmms-lab/llava-onevision-qwen2-7b-ov --chat-template chatml-llava
138+
python3 -m sglang.launch_server --model-path lmms-lab/llava-onevision-qwen2-7b-ov
139139
```
140140

141141
Download an image:

docs/supported_models/embedding_models.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
SGLang provides robust support for embedding models by integrating efficient serving mechanisms with its flexible programming interface. This integration allows for streamlined handling of embedding tasks, facilitating faster and more accurate retrieval and semantic search operations. SGLang's architecture enables better resource utilization and reduced latency in embedding model deployment.
44

55
```{important}
6-
They are executed with `--is-embedding` and some may require `--trust-remote-code` and/or `--chat-template`
6+
They are executed with `--is-embedding` and some may require `--trust-remote-code`
77
```
88

99
## Example launch Command
@@ -13,7 +13,6 @@ python3 -m sglang.launch_server \
1313
--model-path Alibaba-NLP/gme-Qwen2-VL-2B-Instruct \ # example HF/local path
1414
--is-embedding \
1515
--host 0.0.0.0 \
16-
--chat-template gme-qwen2-vl \ # set chat template
1716
--port 30000 \
1817
```
1918

docs/supported_models/vision_language_models.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,11 @@
22

33
These models accept multi-modal inputs (e.g., images and text) and generate text output. They augment language models with visual encoders and require a specific chat template for handling vision prompts.
44

5-
```{important}
6-
We need to specify `--chat-template` for VLMs because the chat template provided in HuggingFace tokenizer only supports text. If you do not specify a vision model’s `--chat-template`, the server uses HuggingFace’s default template, which only supports text and the images won’t be passed in.
7-
```
8-
95
## Example launch Command
106

117
```shell
128
python3 -m sglang.launch_server \
139
--model-path meta-llama/Llama-3.2-11B-Vision-Instruct \ # example HF/local path
14-
--chat-template llama_3_vision \ # required chat template
1510
--host 0.0.0.0 \
1611
--port 30000 \
1712
```

examples/runtime/engine/offline_batch_inference_vlm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
"""
22
Usage:
3-
python offline_batch_inference_vlm.py --model-path Qwen/Qwen2-VL-7B-Instruct --chat-template=qwen2-vl
3+
python offline_batch_inference_vlm.py --model-path Qwen/Qwen2-VL-7B-Instruct
44
"""
55

66
import argparse

examples/runtime/llava_onevision/http_llava_onevision_test.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
"""
22
Usage:
33
4-
python3 -m sglang.launch_server --model-path lmms-lab/llava-onevision-qwen2-72b-ov --port=30000 --tp-size=8 --chat-template=chatml-llava
4+
python3 -m sglang.launch_server --model-path lmms-lab/llava-onevision-qwen2-72b-ov --port=30000 --tp-size=8
55
66
python3 http_llava_onevision_test.py
77
"""

examples/runtime/multimodal_embedding.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# launch server
2-
# python -m sglang.launch_server --model-path Alibaba-NLP/gme-Qwen2-VL-2B-Instruct --is-embedding --chat-template gme-qwen2-vl
2+
# python -m sglang.launch_server --model-path Alibaba-NLP/gme-Qwen2-VL-2B-Instruct --is-embedding
33

44
import requests
55

test/srt/models/test_vlm_models.py

Lines changed: 4 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,12 @@
1919

2020
# VLM models for testing
2121
MODELS = [
22-
SimpleNamespace(
23-
model="google/gemma-3-27b-it", chat_template="gemma-it", mmmu_accuracy=0.45
24-
),
22+
SimpleNamespace(model="google/gemma-3-27b-it", mmmu_accuracy=0.45),
2523
SimpleNamespace(
2624
model="Qwen/Qwen2.5-VL-3B-Instruct",
27-
chat_template="qwen2-vl",
2825
mmmu_accuracy=0.4,
2926
),
30-
SimpleNamespace(
31-
model="openbmb/MiniCPM-V-2_6", chat_template="minicpmv", mmmu_accuracy=0.4
32-
),
27+
SimpleNamespace(model="openbmb/MiniCPM-V-2_6", mmmu_accuracy=0.4),
3328
]
3429

3530

@@ -50,7 +45,6 @@ def setUpClass(cls):
5045
def run_mmmu_eval(
5146
self,
5247
model_version: str,
53-
chat_template: str,
5448
output_path: str,
5549
*,
5650
env: dict | None = None,
@@ -69,11 +63,7 @@ def run_mmmu_eval(
6963
os.makedirs(output_path, exist_ok=True)
7064

7165
# -------- compose --model_args --------
72-
model_args = (
73-
f'model_version="{model_version}",'
74-
f'chat_template="{chat_template}",'
75-
f"tp={tp}"
76-
)
66+
model_args = f'model_version="{model_version}",' f"tp={tp}"
7767

7868
# -------- build command list --------
7969
cmd = [
@@ -122,8 +112,6 @@ def test_vlm_mmmu_benchmark(self):
122112
timeout=self.time_out,
123113
api_key=self.api_key,
124114
other_args=[
125-
"--chat-template",
126-
model.chat_template,
127115
"--trust-remote-code",
128116
"--cuda-graph-max-bs",
129117
"32",
@@ -134,7 +122,7 @@ def test_vlm_mmmu_benchmark(self):
134122
)
135123

136124
# Run evaluation
137-
self.run_mmmu_eval(model.model, model.chat_template, "./logs")
125+
self.run_mmmu_eval(model.model, "./logs")
138126

139127
# Get the result file
140128
result_file_path = glob.glob("./logs/*.json")[0]

0 commit comments

Comments
 (0)