Skip to content

Reasoning parser #3859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 34 commits into from
Closed

Conversation

ShaoZhang0115
Copy link

Motivation

Rewrite #3202

Modifications

  1. add --enable-reasoning and --reasoning-parser options for deepseek r1 series models.
  2. return reasoning_content as in official api, ref: https://api-docs.deepseek.com/zh-cn/guides/reasoning_model, in both streaming and non-streaming chat completions.
    Example:
python -m sglang.launch_server --host 0.0.0.0 \
--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--tp 1 --enable-reasoning --reasoning-parser deepseek-r1 
curl --location --request POST 'http: //localhost:30000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
    "model": "default",
    "messages": [
        {
            "role": "user",
            "content": "Calculate 1 + 3"
        }
    ],
    "stream": false
}'

Get response:

{
    "id": "53de20f7f1244195826e7b52011c37a4",
    "object": "chat.completion",
    "created": 1740507802,
    "model": "default",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\n**Solution:**\n\nTo calculate \\(1 + 3\\), follow these easy steps:\n\n1. **Identify the numbers to add:**  \n   You have the number **1** and the number **3**.\n\n2. **Add the numbers together:**  \n   \\[\n   1 + 3 = 4\n   \\]\n\n3. **Final Answer:**  \n   \\[\n   \\boxed{4}\n   \\]",
                "reasoning_content": "To calculate the sum of 1 and 3, I will begin by identifying the two numbers involved in the addition. The first number is 1, and the second number is 3.\n\nNext, I will add these two numbers together. Adding 1 and 3 gives me a total of 4.\n\nTherefore, the result of 1 plus 3 is 4.\n",
                "tool_calls": null
            },
            "logprobs": null,
            "finish_reason": "stop",
            "matched_stop": 151643
        }
    ],
    "usage": {
        "prompt_tokens": 11,
        "total_tokens": 179,
        "completion_tokens": 168,
        "prompt_tokens_details": null
    }
}

Docs with be updated as soon as possible.

Checklist

Comment on lines 32 to 33
self.think_start_token = "<think>"
self.think_end_token = "</think>"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extend this to all reasoning models? Not just dpsk R1. There might be different thinking tokens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think different reasoning models need different parers, and I add docs for it.

@xihuai18
Copy link
Contributor

  • Add Docs
  • Test with streaming and non-streaming cases, with truncated or non-truncated max-tokens for reasoning.

@xihuai18
Copy link
Contributor

However, I can not pass my tests with --enable-torch-compile, which is confusing.

@xihuai18
Copy link
Contributor

However, I can not pass my tests with --enable-torch-compile, which is confusing.

possible related issue: #3730 (comment)

self.think_start_token = think_start_token
self.think_end_token = think_end_token
self.pattern = re.compile(
rf"{self.think_start_token}(.*?){self.think_end_token}", re.DOTALL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most recent tokenizer hardcodes the opening <think> tag: https://huggingface.co/deepseek-ai/DeepSeek-R1/commit/8a58a132790c9935686eb97f042afa8013451c9f

This means the text coming back from inference won't include <think>, this is why I updated #3202 to assume the model is reasoning until </think> is seen, it also strips out <think> to handle the old chat template.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tot0

The PR added the start token if it is missing:

            # Add the start token to the beginning of the text.
            text = self.think_start_token + text

You can see it in detect_and_parse

```bash
python -m sglang.launch_server --host 0.0.0.0 \
--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \
--enable-reasoning --reasoning-parser deepseek-r1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the docs I was too lazy to add!

Would you consider also supporting the separate_reasoning contract? For my use case we want inference users to be able to control whether reasoning_content is separated, rather than set it as default behavior on sglang launch, which I understand some sglang users will want to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean add a separate_reasoning parameter in sending requests?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separating reasoning and non-reasoning outputs is super useful, and would love for that to be a toggle rather than always on or always off.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge the great changes from this PR in #3202 to try and get best of both worlds?
Or visa versa, @ShaoZhang0115?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated #3202 to combine functionality form this PR, and added some unittests.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microsoft has already shipped the separate_reasoning api to production and intends to keep it there, so would very much like to have it merged into main instead of maintaining a fork.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you give a reference?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, as in the functionality being deployed by Microsoft?

If you've got access to GitHub models, then this request should show the functionality (I used http-yac vscode extension):

POST https://models.inference.ai.azure.com/chat/completions
Content-Type: application/json
Authorization: Bearer {{github_pat}}
X-Auth-Provider: Github

{
  "model": "deepseek-r1",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"}
  ],
  "separate_reasoning": true,
  "stream": false
}

Expected Response:

{
  "id": "51438c1eda364aeb9d7ccecfca078165",
  "object": "chat.completion",
  "created": 1740765748,
  "model": "deepseek-r1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The Los Angeles Dodgers won the 2020 World Series, defeating the Tampa Bay Rays in six games. The series was held at Globe Life Field in Arlington, Texas, marking the first time the World Series was played entirely at a neutral site. Corey Seager was named the World Series Most Valuable Player (MVP). This victory ended the Dodgers' 32-year championship drought, their previous title having been won in 1988.",
        "tool_calls": null,
        "reasoning_content": "Okay, the user is asking who won the World Series in 2020. Let me think. The World Series is the championship series of Major League Baseball (MLB) in the United States and Canada. Each year, the champions of the American League and the National League compete in a best-of-seven playoff. \n\nFirst, I need to recall which teams played in the 2020 World Series. The 2020 season was unique because of the COVID-19 pandemic. The season was shortened to 60 games, and the playoffs were held in a bubble format to minimize travel and reduce the risk of infection. The World Series itself was held at a neutral site, which was Globe Life Field in Arlington, Texas. That was the first time the World Series was played entirely at a single neutral site.\n\nNow, thinking about the teams. In 2020, the American League champions were the Tampa Bay Rays. They won the AL pennant by defeating the Houston Astros in the AL Championship Series. On the National League side, the Los Angeles Dodgers won the NL pennant by beating the Atlanta Braves in the NL Championship Series. So the World Series was between the Dodgers and the Rays.\n\nWait, let me confirm that. The Rays were indeed the AL champions in 2020. The Dodgers had been a strong team in recent years but hadn't won the World Series since 1988. In 2020, they finally clinched the title. The series went to six games. The Dodgers won Game 6 to take the series 4-2. Corey Seager was named the World Series MVP. \n\nSo, putting it all together, the Los Angeles Dodgers won the 2020 World Series against the Tampa Bay Rays. It's important to note the context of the pandemic affecting the season structure, which might be a follow-up question from the user. But the direct answer is the Dodgers. I should make sure there's no confusion with other years. For example, the Dodgers also won in 2020, then again in 2020? Wait, no, they won in 2020. They might have won again more recently, but the question is specifically about 2020. \n\nDouble-checking a reliable source would be good, but based on my existing knowledge, I believe the Dodgers are correct. Let me think if there's any chance I mixed up the year. No, 2020 was their first since 1988. Then they won again in 2020? Wait, no, that can't be. Wait, no, they won in 2020. Wait, no, they won in 2020. Let me confirm. Yes, the Dodgers won the 2020 World Series. The Rays haven't won a World Series yet. So the answer is the Los Angeles Dodgers. \n\nYes, that's correct. The user might be interested in the MVP or the context, but the main answer is the Dodgers.\n"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 19,
    "total_tokens": 710,
    "completion_tokens": 691,
    "prompt_tokens_details": null
  }
}

@maximegmd
Copy link

How does that work with grammars? Does the grammar kick-in only after the reasoning parser?

@tot0
Copy link

tot0 commented Feb 27, 2025

How does that work with grammars? Does the grammar kick-in only after the reasoning parser?

Have a similar question about this as well, though I don't think it's specific to this PR or #3202 , the reasoning parsers (and tool parsers) operate at the level of text coming out of the underlying engine to the API layer.
As far as I can tell (after taking a look at #3298) the grammar engine choice is passed down to the underlying engine via sampling_params, and enforcement is done there. This suggests that to not enforce grammar constraints until reasoning models are done reasoning would involve exposing the knowledge the ReasoningParser has about the "end reasoning" token (</think> for R1) to the underlying engine.

cc @JC1DA and @mmoskal

@maximegmd
Copy link

How does that work with grammars? Does the grammar kick-in only after the reasoning parser?

Have a similar question about this as well, though I don't think it's specific to this PR or #3202 , the reasoning parsers (and tool parsers) operate at the level of text coming out of the underlying engine to the API layer. As far as I can tell (after taking a look at #3298) the grammar engine choice is passed down to the underlying engine via sampling_params, and enforcement is done there. This suggests that to not enforce grammar constraints until reasoning models are done reasoning would involve exposing the knowledge the ReasoningParser has about the "end reasoning" token (</think> for R1) to the underlying engine.

Ideally we would be able to pass a grammar for reasoning and a grammar for content, but I believe the default grammar behavior should apply only to the content.

for chunk in response:
if chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
elif chunk.choices[0].delta.reasoning_content:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this functioning correctly now? When I test the feature for the vllm, it triggers an error from the OpenAI Python client.

Please note that it is not compatible with the OpenAI Python client library. You can use the requests library to make streaming requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is correct and tested. The variable is changed and OpenAI python client library supports custom variables.

class DeltaMessage(BaseModel):
    role: Optional[str] = None
    content: Optional[str] = None
    reasoning_content: Optional[str] = None
    tool_calls: Optional[List[ToolCall]] = Field(default=None, examples=[None])

return text, ""
else:
# Add the start token to the beginning of the text.
text = self.think_start_token + text
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't backwards compatible with the old chat template which didn't include the <think> token, would add it twice.

```bash
python -m sglang.launch_server --host 0.0.0.0 \
--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \
--enable-reasoning --reasoning-parser deepseek-r1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, as in the functionality being deployed by Microsoft?

If you've got access to GitHub models, then this request should show the functionality (I used http-yac vscode extension):

POST https://models.inference.ai.azure.com/chat/completions
Content-Type: application/json
Authorization: Bearer {{github_pat}}
X-Auth-Provider: Github

{
  "model": "deepseek-r1",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who won the world series in 2020?"}
  ],
  "separate_reasoning": true,
  "stream": false
}

Expected Response:

{
  "id": "51438c1eda364aeb9d7ccecfca078165",
  "object": "chat.completion",
  "created": 1740765748,
  "model": "deepseek-r1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The Los Angeles Dodgers won the 2020 World Series, defeating the Tampa Bay Rays in six games. The series was held at Globe Life Field in Arlington, Texas, marking the first time the World Series was played entirely at a neutral site. Corey Seager was named the World Series Most Valuable Player (MVP). This victory ended the Dodgers' 32-year championship drought, their previous title having been won in 1988.",
        "tool_calls": null,
        "reasoning_content": "Okay, the user is asking who won the World Series in 2020. Let me think. The World Series is the championship series of Major League Baseball (MLB) in the United States and Canada. Each year, the champions of the American League and the National League compete in a best-of-seven playoff. \n\nFirst, I need to recall which teams played in the 2020 World Series. The 2020 season was unique because of the COVID-19 pandemic. The season was shortened to 60 games, and the playoffs were held in a bubble format to minimize travel and reduce the risk of infection. The World Series itself was held at a neutral site, which was Globe Life Field in Arlington, Texas. That was the first time the World Series was played entirely at a single neutral site.\n\nNow, thinking about the teams. In 2020, the American League champions were the Tampa Bay Rays. They won the AL pennant by defeating the Houston Astros in the AL Championship Series. On the National League side, the Los Angeles Dodgers won the NL pennant by beating the Atlanta Braves in the NL Championship Series. So the World Series was between the Dodgers and the Rays.\n\nWait, let me confirm that. The Rays were indeed the AL champions in 2020. The Dodgers had been a strong team in recent years but hadn't won the World Series since 1988. In 2020, they finally clinched the title. The series went to six games. The Dodgers won Game 6 to take the series 4-2. Corey Seager was named the World Series MVP. \n\nSo, putting it all together, the Los Angeles Dodgers won the 2020 World Series against the Tampa Bay Rays. It's important to note the context of the pandemic affecting the season structure, which might be a follow-up question from the user. But the direct answer is the Dodgers. I should make sure there's no confusion with other years. For example, the Dodgers also won in 2020, then again in 2020? Wait, no, they won in 2020. They might have won again more recently, but the question is specifically about 2020. \n\nDouble-checking a reliable source would be good, but based on my existing knowledge, I believe the Dodgers are correct. Let me think if there's any chance I mixed up the year. No, 2020 was their first since 1988. Then they won again in 2020? Wait, no, that can't be. Wait, no, they won in 2020. Wait, no, they won in 2020. Let me confirm. Yes, the Dodgers won the 2020 World Series. The Rays haven't won a World Series yet. So the answer is the Los Angeles Dodgers. \n\nYes, that's correct. The user might be interested in the MVP or the context, but the main answer is the Dodgers.\n"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 19,
    "total_tokens": 710,
    "completion_tokens": 691,
    "prompt_tokens_details": null
  }
}

@xihuai18 xihuai18 mentioned this pull request Mar 2, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants