[chat_template]: handle system message in apply_chat_template fall… by khazic · Pull Request #5903 · verl-project/verl

khazic · 2026-04-07T13:07:24Z

The existing fallback for models like Qwen3.5 that require at least one user message unconditionally prepends a dummy user message before the real messages:

apply_template([dummy_user] + messages)

This works for user/assistant messages, but fails when the single message being processed is a system message, because Qwen3.5's chat template enforces two constraints:

At least one user message must be present.
System message must appear first.

The fallback satisfies (1) but violates (2), raising:
TemplateError: System message must be at the beginning.

Reproduce: run multiturn SFT with a dataset where some conversations have a system message and some do not. verl tokenizes each message individually via _process_single_message; when a system message is processed alone, the fallback inserts dummy_user before it, breaking the template constraint.

Fix: detect whether the messages list contains a system role. If so, append the dummy user after the real messages instead, and strip the dummy suffix from the output using the length-difference trick (len(two_users) - len(one_user)) to recover the pure system tokens. The original prefix-stripping path is preserved for all non-system cases.

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

…back The existing fallback for models like Qwen3.5 that require at least one user message unconditionally prepends a dummy user message before the real messages: apply_template([dummy_user] + messages) This works for user/assistant messages, but fails when the single message being processed is a system message, because Qwen3.5's chat template enforces two constraints: 1. At least one user message must be present. 2. System message must appear first. The fallback satisfies (1) but violates (2), raising: TemplateError: System message must be at the beginning. Reproduce: run multiturn SFT with a dataset where some conversations have a system message and some do not. verl tokenizes each message individually via _process_single_message; when a system message is processed alone, the fallback inserts dummy_user before it, breaking the template constraint. Fix: detect whether the messages list contains a system role. If so, append the dummy user *after* the real messages instead, and strip the dummy suffix from the output using the length-difference trick (len(two_users) - len(one_user)) to recover the pure system tokens. The original prefix-stripping path is preserved for all non-system cases.

gemini-code-assist

Code Review

This pull request modifies the apply_chat_template function to handle system messages for models like Qwen3.5 by appending a dummy user message to the end of the sequence and stripping it using a length-difference calculation. A significant issue was identified in the logic when add_generation_prompt is set to true: the stripping logic fails to account for the generation prompt tokens appended after the dummy user message, which results in incorrect token truncation and leaves trailing dummy tokens in the output.

gemini-code-assist · 2026-04-07T13:10:39Z

verl/utils/chat_template.py

+            output = processor.apply_chat_template(
+                messages + dummy_user_message,
+                tokenize=tokenize,
+                add_generation_prompt=add_generation_prompt,
+                tools=tools,
+                return_dict=return_dict,
+                **kwargs,
+            )


The logic in the has_system block fails when add_generation_prompt=True.

When has_system is true, the dummy user message is appended to the end of the conversation. If add_generation_prompt=True, the template typically appends the assistant header (e.g., <|im_start|>assistant\n) after the last message. Thus, the output sequence ends with [... dummy_user_tokens, generation_prompt_tokens].

However, user_len is calculated from two_users - one_user (which both use add_generation_prompt=False), so it only represents the length of the dummy user message. Stripping user_len from the end of output (e.g., output[:-user_len]) will incorrectly remove the generation prompt and leave trailing tokens from the dummy user message in the result.

To fix this, you should account for the length of the generation prompt when slicing, or compute the output without the generation prompt, strip the dummy user, and then append the generation prompt tokens separately.

…ystem messages When add_generation_prompt=True, the previous implementation computed output = apply_template(messages + dummy_user, add_generation_prompt=True), which produced: system_tokens + dummy_user_tokens + gen_prompt_tokens. Stripping user_len from the tail then removed the generation prompt and left stray dummy-user tokens in the result. Fix: always call apply_template with add_generation_prompt=False so the dummy user sits cleanly at the tail, strip it, then separately compute the generation prompt tokens via one_user_with_gen[len(one_user):] and re-attach them when add_generation_prompt=True. Reported by gemini-code-assist in PR review.

khazic · 2026-04-07T13:39:31Z

Note: this bug is triggered whenever apply_chat_template is called with a
single system message (which happens for every row that has a system message
in the dataset), not just for mixed datasets. A pure-system-message dataset
would hit the same error on every sample.

khazic · 2026-04-07T13:44:09Z

@wuxibin89 Could you please review this fix?

This PR addresses a bug in the apply_chat_template fallback path: when messages contain only a system message (no user message), the original code prepends dummy_user + messages, which violates Qwen3.5's constraint that the system message must come first. This causes a TemplateError.

The fix detects has_system and appends the dummy user after messages instead, then strips the suffix using the difference trick.

khazic changed the title ~~fix(chat_template): handle system message in apply_chat_template fall…~~ [bug](chat_template): handle system message in apply_chat_template fall… Apr 7, 2026

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

khazic changed the title ~~[bug](chat_template): handle system message in apply_chat_template fall…~~ [bug]chat_template: handle system message in apply_chat_template fall… Apr 7, 2026

khazic changed the title ~~[bug]chat_template: handle system message in apply_chat_template fall…~~ [chat_template]: handle system message in apply_chat_template fall… Apr 7, 2026

style(chat_template): apply ruff format fixes

7040236

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chat_template]: handle system message in apply_chat_template fall…#5903

[chat_template]: handle system message in apply_chat_template fall…#5903
khazic wants to merge 3 commits intoverl-project:mainfrom
khazic:fix/chat-template-system-message

khazic commented Apr 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

khazic commented Apr 7, 2026

Uh oh!

khazic commented Apr 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

khazic commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

khazic commented Apr 7, 2026

Uh oh!

khazic commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

khazic commented Apr 7, 2026 •

edited

Loading

khazic commented Apr 7, 2026 •

edited

Loading