Reasoning parser #3859

ShaoZhang0115 · 2025-02-25T18:27:06Z

Motivation

Rewrite #3202

Modifications

add --enable-reasoning and --reasoning-parser options for deepseek r1 series models.
return reasoning_content as in official api, ref: https://api-docs.deepseek.com/zh-cn/guides/reasoning_model, in both streaming and non-streaming chat completions.
Example:

python -m sglang.launch_server --host 0.0.0.0 \
--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--tp 1 --enable-reasoning --reasoning-parser deepseek-r1

curl --location --request POST 'http: //localhost:30000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
    "model": "default",
    "messages": [
        {
            "role": "user",
            "content": "Calculate 1 + 3"
        }
    ],
    "stream": false
}'

Get response:

{
    "id": "53de20f7f1244195826e7b52011c37a4",
    "object": "chat.completion",
    "created": 1740507802,
    "model": "default",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\n**Solution:**\n\nTo calculate \\(1 + 3\\), follow these easy steps:\n\n1. **Identify the numbers to add:**  \n   You have the number **1** and the number **3**.\n\n2. **Add the numbers together:**  \n   \\[\n   1 + 3 = 4\n   \\]\n\n3. **Final Answer:**  \n   \\[\n   \\boxed{4}\n   \\]",
                "reasoning_content": "To calculate the sum of 1 and 3, I will begin by identifying the two numbers involved in the addition. The first number is 1, and the second number is 3.\n\nNext, I will add these two numbers together. Adding 1 and 3 gives me a total of 4.\n\nTherefore, the result of 1 plus 3 is 4.\n",
                "tool_calls": null
            },
            "logprobs": null,
            "finish_reason": "stop",
            "matched_stop": 151643
        }
    ],
    "usage": {
        "prompt_tokens": 11,
        "total_tokens": 179,
        "completion_tokens": 168,
        "prompt_tokens_details": null
    }
}

Docs with be updated as soon as possible.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

…esponse behavior.

…model, also handle first response while separating reasoning.

…ive just in case.

shuaills · 2025-02-25T18:59:10Z

python/sglang/srt/reasoning_parser.py

+        self.think_start_token = "<think>"
+        self.think_end_token = "</think>"


Can we extend this to all reasoning models? Not just dpsk R1. There might be different thinking tokens.

I think different reasoning models need different parers, and I add docs for it.

xihuai18 · 2025-02-26T07:12:30Z

Add Docs
Test with streaming and non-streaming cases, with truncated or non-truncated max-tokens for reasoning.

xihuai18 · 2025-02-26T07:13:27Z

However, I can not pass my tests with --enable-torch-compile, which is confusing.

xihuai18 · 2025-02-27T05:12:11Z

However, I can not pass my tests with --enable-torch-compile, which is confusing.

possible related issue： #3730 (comment)

tot0 · 2025-02-27T19:24:19Z

python/sglang/srt/reasoning_parser.py

+        self.think_start_token = think_start_token
+        self.think_end_token = think_end_token
+        self.pattern = re.compile(
+            rf"{self.think_start_token}(.*?){self.think_end_token}", re.DOTALL


The most recent tokenizer hardcodes the opening <think> tag: https://huggingface.co/deepseek-ai/DeepSeek-R1/commit/8a58a132790c9935686eb97f042afa8013451c9f

This means the text coming back from inference won't include <think>, this is why I updated #3202 to assume the model is reasoning until </think> is seen, it also strips out <think> to handle the old chat template.

@tot0

The PR added the start token if it is missing:

# Add the start token to the beginning of the text. text = self.think_start_token + text

You can see it in detect_and_parse

tot0 · 2025-02-27T19:25:51Z

docs/backend/reasoning_parser.md

+```bash
+python -m sglang.launch_server --host 0.0.0.0 \
+--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \
+--enable-reasoning --reasoning-parser deepseek-r1


Appreciate the docs I was too lazy to add!

Would you consider also supporting the separate_reasoning contract? For my use case we want inference users to be able to control whether reasoning_content is separated, rather than set it as default behavior on sglang launch, which I understand some sglang users will want to do.

you mean add a separate_reasoning parameter in sending requests?

Separating reasoning and non-reasoning outputs is super useful, and would love for that to be a toggle rather than always on or always off.

Happy to merge the great changes from this PR in #3202 to try and get best of both worlds?
Or visa versa, @ShaoZhang0115?

Updated #3202 to combine functionality form this PR, and added some unittests.

Microsoft has already shipped the separate_reasoning api to production and intends to keep it there, so would very much like to have it merged into main instead of maintaining a fork.

could you give a reference?

Uh, as in the functionality being deployed by Microsoft?

If you've got access to GitHub models, then this request should show the functionality (I used http-yac vscode extension):

POST https://models.inference.ai.azure.com/chat/completions Content-Type: application/json Authorization: Bearer {{github_pat}} X-Auth-Provider: Github { "model": "deepseek-r1", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Who won the world series in 2020?"} ], "separate_reasoning": true, "stream": false }

Expected Response:

{ "id": "51438c1eda364aeb9d7ccecfca078165", "object": "chat.completion", "created": 1740765748, "model": "deepseek-r1", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The Los Angeles Dodgers won the 2020 World Series, defeating the Tampa Bay Rays in six games. The series was held at Globe Life Field in Arlington, Texas, marking the first time the World Series was played entirely at a neutral site. Corey Seager was named the World Series Most Valuable Player (MVP). This victory ended the Dodgers' 32-year championship drought, their previous title having been won in 1988.", "tool_calls": null, "reasoning_content": "Okay, the user is asking who won the World Series in 2020. Let me think. The World Series is the championship series of Major League Baseball (MLB) in the United States and Canada. Each year, the champions of the American League and the National League compete in a best-of-seven playoff. \n\nFirst, I need to recall which teams played in the 2020 World Series. The 2020 season was unique because of the COVID-19 pandemic. The season was shortened to 60 games, and the playoffs were held in a bubble format to minimize travel and reduce the risk of infection. The World Series itself was held at a neutral site, which was Globe Life Field in Arlington, Texas. That was the first time the World Series was played entirely at a single neutral site.\n\nNow, thinking about the teams. In 2020, the American League champions were the Tampa Bay Rays. They won the AL pennant by defeating the Houston Astros in the AL Championship Series. On the National League side, the Los Angeles Dodgers won the NL pennant by beating the Atlanta Braves in the NL Championship Series. So the World Series was between the Dodgers and the Rays.\n\nWait, let me confirm that. The Rays were indeed the AL champions in 2020. The Dodgers had been a strong team in recent years but hadn't won the World Series since 1988. In 2020, they finally clinched the title. The series went to six games. The Dodgers won Game 6 to take the series 4-2. Corey Seager was named the World Series MVP. \n\nSo, putting it all together, the Los Angeles Dodgers won the 2020 World Series against the Tampa Bay Rays. It's important to note the context of the pandemic affecting the season structure, which might be a follow-up question from the user. But the direct answer is the Dodgers. I should make sure there's no confusion with other years. For example, the Dodgers also won in 2020, then again in 2020? Wait, no, they won in 2020. They might have won again more recently, but the question is specifically about 2020. \n\nDouble-checking a reliable source would be good, but based on my existing knowledge, I believe the Dodgers are correct. Let me think if there's any chance I mixed up the year. No, 2020 was their first since 1988. Then they won again in 2020? Wait, no, that can't be. Wait, no, they won in 2020. Wait, no, they won in 2020. Let me confirm. Yes, the Dodgers won the 2020 World Series. The Rays haven't won a World Series yet. So the answer is the Los Angeles Dodgers. \n\nYes, that's correct. The user might be interested in the MVP or the context, but the main answer is the Dodgers.\n" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 19, "total_tokens": 710, "completion_tokens": 691, "prompt_tokens_details": null } }

maximegmd · 2025-02-27T23:11:03Z

How does that work with grammars? Does the grammar kick-in only after the reasoning parser?

tot0 · 2025-02-27T23:32:54Z

How does that work with grammars? Does the grammar kick-in only after the reasoning parser?

Have a similar question about this as well, though I don't think it's specific to this PR or #3202 , the reasoning parsers (and tool parsers) operate at the level of text coming out of the underlying engine to the API layer.
As far as I can tell (after taking a look at #3298) the grammar engine choice is passed down to the underlying engine via sampling_params, and enforcement is done there. This suggests that to not enforce grammar constraints until reasoning models are done reasoning would involve exposing the knowledge the ReasoningParser has about the "end reasoning" token (</think> for R1) to the underlying engine.

cc @JC1DA and @mmoskal

maximegmd · 2025-02-27T23:55:14Z

How does that work with grammars? Does the grammar kick-in only after the reasoning parser?

Have a similar question about this as well, though I don't think it's specific to this PR or #3202 , the reasoning parsers (and tool parsers) operate at the level of text coming out of the underlying engine to the API layer. As far as I can tell (after taking a look at #3298) the grammar engine choice is passed down to the underlying engine via sampling_params, and enforcement is done there. This suggests that to not enforce grammar constraints until reasoning models are done reasoning would involve exposing the knowledge the ReasoningParser has about the "end reasoning" token (</think> for R1) to the underlying engine.

Ideally we would be able to pass a grammar for reasoning and a grammar for content, but I believe the default grammar behavior should apply only to the content.

gaocegege · 2025-02-28T01:06:50Z

docs/backend/reasoning_parser.md

+for chunk in response:
+    if chunk.choices[0].delta.content:
+      content += chunk.choices[0].delta.content
+    elif chunk.choices[0].delta.reasoning_content:


Is this functioning correctly now? When I test the feature for the vllm, it triggers an error from the OpenAI Python client.

Please note that it is not compatible with the OpenAI Python client library. You can use the requests library to make streaming requests.

It is correct and tested. The variable is changed and OpenAI python client library supports custom variables.

class DeltaMessage(BaseModel): role: Optional[str] = None content: Optional[str] = None reasoning_content: Optional[str] = None tool_calls: Optional[List[ToolCall]] = Field(default=None, examples=[None])

… unittests.

tot0 · 2025-02-28T05:44:13Z

python/sglang/srt/reasoning_parser.py

+            return text, ""
+        else:
+            # Add the start token to the beginning of the text.
+            text = self.think_start_token + text


this isn't backwards compatible with the old chat template which didn't include the <think> token, would add it twice.

tot0 · 2025-02-28T18:06:25Z

docs/backend/reasoning_parser.md

+```bash
+python -m sglang.launch_server --host 0.0.0.0 \
+--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \
+--enable-reasoning --reasoning-parser deepseek-r1


Uh, as in the functionality being deployed by Microsoft?

If you've got access to GitHub models, then this request should show the functionality (I used http-yac vscode extension):

POST https://models.inference.ai.azure.com/chat/completions Content-Type: application/json Authorization: Bearer {{github_pat}} X-Auth-Provider: Github { "model": "deepseek-r1", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Who won the world series in 2020?"} ], "separate_reasoning": true, "stream": false }

Expected Response:

{ "id": "51438c1eda364aeb9d7ccecfca078165", "object": "chat.completion", "created": 1740765748, "model": "deepseek-r1", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The Los Angeles Dodgers won the 2020 World Series, defeating the Tampa Bay Rays in six games. The series was held at Globe Life Field in Arlington, Texas, marking the first time the World Series was played entirely at a neutral site. Corey Seager was named the World Series Most Valuable Player (MVP). This victory ended the Dodgers' 32-year championship drought, their previous title having been won in 1988.", "tool_calls": null, "reasoning_content": "Okay, the user is asking who won the World Series in 2020. Let me think. The World Series is the championship series of Major League Baseball (MLB) in the United States and Canada. Each year, the champions of the American League and the National League compete in a best-of-seven playoff. \n\nFirst, I need to recall which teams played in the 2020 World Series. The 2020 season was unique because of the COVID-19 pandemic. The season was shortened to 60 games, and the playoffs were held in a bubble format to minimize travel and reduce the risk of infection. The World Series itself was held at a neutral site, which was Globe Life Field in Arlington, Texas. That was the first time the World Series was played entirely at a single neutral site.\n\nNow, thinking about the teams. In 2020, the American League champions were the Tampa Bay Rays. They won the AL pennant by defeating the Houston Astros in the AL Championship Series. On the National League side, the Los Angeles Dodgers won the NL pennant by beating the Atlanta Braves in the NL Championship Series. So the World Series was between the Dodgers and the Rays.\n\nWait, let me confirm that. The Rays were indeed the AL champions in 2020. The Dodgers had been a strong team in recent years but hadn't won the World Series since 1988. In 2020, they finally clinched the title. The series went to six games. The Dodgers won Game 6 to take the series 4-2. Corey Seager was named the World Series MVP. \n\nSo, putting it all together, the Los Angeles Dodgers won the 2020 World Series against the Tampa Bay Rays. It's important to note the context of the pandemic affecting the season structure, which might be a follow-up question from the user. But the direct answer is the Dodgers. I should make sure there's no confusion with other years. For example, the Dodgers also won in 2020, then again in 2020? Wait, no, they won in 2020. They might have won again more recently, but the question is specifically about 2020. \n\nDouble-checking a reliable source would be good, but based on my existing knowledge, I believe the Dodgers are correct. Let me think if there's any chance I mixed up the year. No, 2020 was their first since 1988. Then they won again in 2020? Wait, no, that can't be. Wait, no, they won in 2020. Wait, no, they won in 2020. Let me confirm. Yes, the Dodgers won the 2020 World Series. The Rays haven't won a World Series yet. So the answer is the Los Angeles Dodgers. \n\nYes, that's correct. The user might be interested in the MVP or the context, but the main answer is the Dodgers.\n" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 19, "total_tokens": 710, "completion_tokens": 691, "prompt_tokens_details": null } }

…ust model references

Lucas Pickup and others added 21 commits February 6, 2025 16:08

Support reasoning_content in ChatCompletion choices like DeepSeek api.

ad623a0

Fix up silly mistakes in non-streaming path

1fe6654

Flip accumulate_reasoning to stream_reasoning to match the api changes.

0a0eaad

Ensure finish_reason is null by default to match OpenAI streaming r…

9cc7b76

…esponse behavior.

fix silly python tuple mistake.

6400800

Don't send streaming chunks for empty content.

d856124

Merge branch 'main' into lupickup/deepseek/reasoning_content

414d467

Adapt reasoning_parser to handle <think> token not being produced by …

99f2583

…model, also handle first response while separating reasoning.

Fix silly typo

f39a256

Fix up <think> token stripping.

132e5d6

use split correctly

4a60111

Remove unused reasoning_regex.

9ec06b1

wow i really can't read, or it's late, or both.

22a0b61

Fix another case.

cde50fc

parse_result.normal_text _shouldn't_ ever be None, but lets be defens…

7330b0b

…ive just in case.

Merge branch 'main' into lupickup/deepseek/reasoning_content

aa63f5d

Make content=None if iparse_results.normal_text returns an empty string

3fdce0d

Merge branch 'main' into lupickup/deepseek/reasoning_content

cf9d440

Merge branch 'main' into lupickup/deepseek/reasoning_content

de7618b

Run pre-commit hook to format changes.

1f7daae

Merge branch 'main' into lupickup/deepseek/reasoning_content

287be31

ShaoZhang0115 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners February 25, 2025 18:27

shuaills suggested changes Feb 25, 2025

View reviewed changes

tot0 reviewed Feb 27, 2025

View reviewed changes

Merge branch 'main' into lupickup/deepseek/reasoning_content

210fbdc

gaocegege reviewed Feb 28, 2025

View reviewed changes

Lucas Pickup added 3 commits February 28, 2025 02:10

Merge in awesome docs from sgl-project#3859 by @ShaoZhang0115 and add…

e165bf7

… unittests.

Adding missing format string.

5a89225

Remove local testing hacks.

fa85c96

xihuai18 mentioned this pull request Feb 28, 2025

Support for reasoning_content in API #3202

Closed

4 tasks

tot0 reviewed Feb 28, 2025

View reviewed changes

zhaochenyang20 and others added 3 commits February 28, 2025 12:24

Merge branch 'main' into lupickup/deepseek/reasoning_content

5309df7

Move reasoning_parser.md to docs/references

55acaaa

Fixup incorrect handling of request: list

2ae4fa4

This was referenced Mar 1, 2025

[v0][structured output] Support reasoning output vllm-project/vllm#12955

Merged

[Doc]: Update the reasoning output streaming example with OpenAI client vllm-project/vllm#14070

Closed

[Refactor] Update reasoning handling in ChatCompletionRequest and adj…

94bee72

…ust model references

xihuai18 force-pushed the reasoning-parser branch from c8429d4 to 94bee72 Compare March 2, 2025 13:09

xihuai18 added 5 commits March 2, 2025 21:15

revert dockerfile changes

411473b

add more testcases

ce6c485

add main for unit tests

022590a

revert some typos

98be910

fix(reasoning content): 🐛 fix typos

9ff2a19

ShaoZhang0115 closed this Mar 2, 2025

xihuai18 mentioned this pull request Mar 2, 2025

Reasoning parser #4000

Merged

6 tasks

		self.think_start_token = "<think>"
		self.think_end_token = "</think>"

Reasoning parser #3859

Reasoning parser #3859

Uh oh!

Conversation

ShaoZhang0115 commented Feb 25, 2025

Motivation

Modifications

Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xihuai18 commented Feb 26, 2025

Uh oh!

xihuai18 commented Feb 26, 2025

Uh oh!

xihuai18 commented Feb 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maximegmd commented Feb 27, 2025

Uh oh!

tot0 commented Feb 27, 2025

Uh oh!

maximegmd commented Feb 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!