Skip to content

[chat_template]: handle system message in apply_chat_template fall…#5903

Open
khazic wants to merge 3 commits intoverl-project:mainfrom
khazic:fix/chat-template-system-message
Open

[chat_template]: handle system message in apply_chat_template fall…#5903
khazic wants to merge 3 commits intoverl-project:mainfrom
khazic:fix/chat-template-system-message

Conversation

@khazic
Copy link
Copy Markdown
Contributor

@khazic khazic commented Apr 7, 2026

The existing fallback for models like Qwen3.5 that require at least one user message unconditionally prepends a dummy user message before the real messages:

apply_template([dummy_user] + messages)

This works for user/assistant messages, but fails when the single message being processed is a system message, because Qwen3.5's chat template enforces two constraints:

  1. At least one user message must be present.
  2. System message must appear first.

The fallback satisfies (1) but violates (2), raising:
TemplateError: System message must be at the beginning.

Reproduce: run multiturn SFT with a dataset where some conversations have a system message and some do not. verl tokenizes each message individually via _process_single_message; when a system message is processed alone, the fallback inserts dummy_user before it, breaking the template constraint.

Fix: detect whether the messages list contains a system role. If so, append the dummy user after the real messages instead, and strip the dummy suffix from the output using the length-difference trick (len(two_users) - len(one_user)) to recover the pure system tokens. The original prefix-stripping path is preserved for all non-system cases.

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

…back

The existing fallback for models like Qwen3.5 that require at least one
user message unconditionally prepends a dummy user message before the
real messages:

    apply_template([dummy_user] + messages)

This works for user/assistant messages, but fails when the single message
being processed is a system message, because Qwen3.5's chat template
enforces two constraints:
  1. At least one user message must be present.
  2. System message must appear first.

The fallback satisfies (1) but violates (2), raising:
    TemplateError: System message must be at the beginning.

Reproduce: run multiturn SFT with a dataset where some conversations
have a system message and some do not. verl tokenizes each message
individually via _process_single_message; when a system message is
processed alone, the fallback inserts dummy_user before it, breaking
the template constraint.

Fix: detect whether the messages list contains a system role. If so,
append the dummy user *after* the real messages instead, and strip the
dummy suffix from the output using the length-difference trick
(len(two_users) - len(one_user)) to recover the pure system tokens.
The original prefix-stripping path is preserved for all non-system cases.
@khazic khazic changed the title fix(chat_template): handle system message in apply_chat_template fall… [bug](chat_template): handle system message in apply_chat_template fall… Apr 7, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the apply_chat_template function to handle system messages for models like Qwen3.5 by appending a dummy user message to the end of the sequence and stripping it using a length-difference calculation. A significant issue was identified in the logic when add_generation_prompt is set to true: the stripping logic fails to account for the generation prompt tokens appended after the dummy user message, which results in incorrect token truncation and leaves trailing dummy tokens in the output.

Comment on lines +114 to +121
output = processor.apply_chat_template(
messages + dummy_user_message,
tokenize=tokenize,
add_generation_prompt=add_generation_prompt,
tools=tools,
return_dict=return_dict,
**kwargs,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic in the has_system block fails when add_generation_prompt=True.

When has_system is true, the dummy user message is appended to the end of the conversation. If add_generation_prompt=True, the template typically appends the assistant header (e.g., <|im_start|>assistant\n) after the last message. Thus, the output sequence ends with [... dummy_user_tokens, generation_prompt_tokens].

However, user_len is calculated from two_users - one_user (which both use add_generation_prompt=False), so it only represents the length of the dummy user message. Stripping user_len from the end of output (e.g., output[:-user_len]) will incorrectly remove the generation prompt and leave trailing tokens from the dummy user message in the result.

To fix this, you should account for the length of the generation prompt when slicing, or compute the output without the generation prompt, strip the dummy user, and then append the generation prompt tokens separately.

…ystem messages

When add_generation_prompt=True, the previous implementation computed
output = apply_template(messages + dummy_user, add_generation_prompt=True),
which produced: system_tokens + dummy_user_tokens + gen_prompt_tokens.
Stripping user_len from the tail then removed the generation prompt and
left stray dummy-user tokens in the result.

Fix: always call apply_template with add_generation_prompt=False so the
dummy user sits cleanly at the tail, strip it, then separately compute
the generation prompt tokens via
  one_user_with_gen[len(one_user):]
and re-attach them when add_generation_prompt=True.

Reported by gemini-code-assist in PR review.
@khazic khazic changed the title [bug](chat_template): handle system message in apply_chat_template fall… [bug]chat_template: handle system message in apply_chat_template fall… Apr 7, 2026
@khazic khazic changed the title [bug]chat_template: handle system message in apply_chat_template fall… [chat_template]: handle system message in apply_chat_template fall… Apr 7, 2026
@khazic
Copy link
Copy Markdown
Contributor Author

khazic commented Apr 7, 2026

Note: this bug is triggered whenever apply_chat_template is called with a
single system message (which happens for every row that has a system message
in the dataset), not just for mixed datasets. A pure-system-message dataset
would hit the same error on every sample.

@khazic
Copy link
Copy Markdown
Contributor Author

khazic commented Apr 7, 2026

@wuxibin89 Could you please review this fix?

This PR addresses a bug in the apply_chat_template fallback path: when messages contain only a system message (no user message), the original code prepends dummy_user + messages, which violates Qwen3.5's constraint that the system message must come first. This causes a TemplateError.

The fix detects has_system and appends the dummy user after messages instead, then strips the suffix using the difference trick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant