[fix] Skip special tokens for sglang when decoding token ids by CharlieFRuan · Pull Request #210 · NovaSky-AI/SkyRL

CharlieFRuan · 2025-08-26T22:22:38Z

Background

Prior PR #192 changed skip_special_token to True in sampling params for vLLM and SGLang.

Take Qwen as an example, the implication is that:

For string output, i.e. InferenceEngineOutput.response, <|im_end|> will be excluded
For token IDs output, i.e. InferenceEngineOutput.response_ids, the corresponding token will NOT be excluded, which is what we want for token-in-token-out

The goal is two folds:

In SkyRLGymGenerator, env.step(action)'s action is InferenceEngineOutput.response, in which we do not want EOS tokens. This is more consistent with most inference engine output convention
If users re-tokenize with chat template (e.g. current Qwen3 codepath), the Jinja template applies <|im_end|> after message.content, meaning that the string content does not expect to include <|im_end|>
- Qwen has Jinja template: '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n'
- Though for SkyRLGymGenerator, we have the following code, so double EOS <|im_end|><|im_end|> wouldn't happen (but for custom generator, it would be more subtle for the users to realize).

  # remove eos token from end of output if it exists, since it will be reapplied by the chat template
  if output.endswith(self.tokenizer.eos_token):
      output = output[: -len(self.tokenizer.eos_token)]

This PR's fix

This PR further fixes for SGLang. We use token-in-token-out for SGLang by passing in skip_tokenizer_init=True, which means SGLang does not return string output at all (different from vLLM, which returns both token IDs and string output). Thus, sampling_params.skip_special_token has no effect on SGLang. This PR fixes this by adding skip_special_token=True when we convert SGLang's output token IDs to string output so its behavior is the same as vLLM backend.

Verified with test_policy_local_engines_e2e.py.

…g responsej

gemini-code-assist

Code Review

This pull request addresses an inconsistency in how special tokens are handled between the vLLM and SGLang inference backends. The changes ensure that when decoding token IDs from SGLang, special tokens are skipped, aligning its behavior with vLLM and preventing issues like duplicate end-of-sequence tokens. The modifications are targeted, correct, and consistent across both the local and remote SGLang engine implementations. The accompanying update to the documentation in base.py clearly reflects this new behavior. Overall, this is a solid fix that improves the consistency and correctness of the inference pipeline.

tyler-griggs

Great, thanks!

### Background Prior PR NovaSky-AI#192 changed `skip_special_token` to True in sampling params for vLLM and SGLang. Take Qwen as an example, the implication is that: - For string output, i.e. `InferenceEngineOutput.response`, `<|im_end|>` will be excluded - For token IDs output, i.e. `InferenceEngineOutput.response_ids`, the corresponding token **will NOT be excluded**, which is what we want for token-in-token-out The goal is two folds: - In SkyRLGymGenerator, `env.step(action)`'s `action` is `InferenceEngineOutput.response`, in which we do not want EOS tokens. This is more consistent with most inference engine output convention - If users re-tokenize with chat template (e.g. current Qwen3 codepath), the Jinja template applies `<|im_end|>` after `message.content`, meaning that the string content does not expect to include `<|im_end|>` - Qwen has Jinja template: `'<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n'` - Though for SkyRLGymGenerator, we have the following code, so double EOS `<|im_end|><|im_end|>` wouldn't happen (but for custom generator, it would be more subtle for the users to realize). ```python # remove eos token from end of output if it exists, since it will be reapplied by the chat template if output.endswith(self.tokenizer.eos_token): output = output[: -len(self.tokenizer.eos_token)] ``` ### This PR's fix This PR further fixes for SGLang. We use token-in-token-out for SGLang by passing in `skip_tokenizer_init=True`, which means SGLang does not return string output at all (different from vLLM, which returns both token IDs and string output). Thus, `sampling_params.skip_special_token` has no effect on SGLang. This PR fixes this by adding `skip_special_token=True` when we convert SGLang's output token IDs to string output so its behavior is the same as vLLM backend. Verified with `test_policy_local_engines_e2e.py`.

[fix] Skip special tokens for sglang when decoding token ids to strin…

5b8141b

…g responsej

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

CharlieFRuan marked this pull request as draft August 26, 2025 22:30

CharlieFRuan marked this pull request as ready for review August 26, 2025 22:40

tyler-griggs approved these changes Aug 26, 2025

View reviewed changes

tyler-griggs merged commit 71c954b into NovaSky-AI:main Aug 26, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Skip special tokens for sglang when decoding token ids#210

[fix] Skip special tokens for sglang when decoding token ids#210
tyler-griggs merged 1 commit intoNovaSky-AI:mainfrom
CharlieFRuan:fix-0826-skip-special

CharlieFRuan commented Aug 26, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

tyler-griggs left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CharlieFRuan commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

This PR's fix

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

tyler-griggs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CharlieFRuan commented Aug 26, 2025 •

edited

Loading