[chat_template]: handle system message in apply_chat_template fall…#5903
[chat_template]: handle system message in apply_chat_template fall…#5903khazic wants to merge 3 commits intoverl-project:mainfrom
Conversation
…back
The existing fallback for models like Qwen3.5 that require at least one
user message unconditionally prepends a dummy user message before the
real messages:
apply_template([dummy_user] + messages)
This works for user/assistant messages, but fails when the single message
being processed is a system message, because Qwen3.5's chat template
enforces two constraints:
1. At least one user message must be present.
2. System message must appear first.
The fallback satisfies (1) but violates (2), raising:
TemplateError: System message must be at the beginning.
Reproduce: run multiturn SFT with a dataset where some conversations
have a system message and some do not. verl tokenizes each message
individually via _process_single_message; when a system message is
processed alone, the fallback inserts dummy_user before it, breaking
the template constraint.
Fix: detect whether the messages list contains a system role. If so,
append the dummy user *after* the real messages instead, and strip the
dummy suffix from the output using the length-difference trick
(len(two_users) - len(one_user)) to recover the pure system tokens.
The original prefix-stripping path is preserved for all non-system cases.
There was a problem hiding this comment.
Code Review
This pull request modifies the apply_chat_template function to handle system messages for models like Qwen3.5 by appending a dummy user message to the end of the sequence and stripping it using a length-difference calculation. A significant issue was identified in the logic when add_generation_prompt is set to true: the stripping logic fails to account for the generation prompt tokens appended after the dummy user message, which results in incorrect token truncation and leaves trailing dummy tokens in the output.
| output = processor.apply_chat_template( | ||
| messages + dummy_user_message, | ||
| tokenize=tokenize, | ||
| add_generation_prompt=add_generation_prompt, | ||
| tools=tools, | ||
| return_dict=return_dict, | ||
| **kwargs, | ||
| ) |
There was a problem hiding this comment.
The logic in the has_system block fails when add_generation_prompt=True.
When has_system is true, the dummy user message is appended to the end of the conversation. If add_generation_prompt=True, the template typically appends the assistant header (e.g., <|im_start|>assistant\n) after the last message. Thus, the output sequence ends with [... dummy_user_tokens, generation_prompt_tokens].
However, user_len is calculated from two_users - one_user (which both use add_generation_prompt=False), so it only represents the length of the dummy user message. Stripping user_len from the end of output (e.g., output[:-user_len]) will incorrectly remove the generation prompt and leave trailing tokens from the dummy user message in the result.
To fix this, you should account for the length of the generation prompt when slicing, or compute the output without the generation prompt, strip the dummy user, and then append the generation prompt tokens separately.
…ystem messages When add_generation_prompt=True, the previous implementation computed output = apply_template(messages + dummy_user, add_generation_prompt=True), which produced: system_tokens + dummy_user_tokens + gen_prompt_tokens. Stripping user_len from the tail then removed the generation prompt and left stray dummy-user tokens in the result. Fix: always call apply_template with add_generation_prompt=False so the dummy user sits cleanly at the tail, strip it, then separately compute the generation prompt tokens via one_user_with_gen[len(one_user):] and re-attach them when add_generation_prompt=True. Reported by gemini-code-assist in PR review.
|
Note: this bug is triggered whenever apply_chat_template is called with a |
|
@wuxibin89 Could you please review this fix? This PR addresses a bug in the The fix detects |
The existing fallback for models like Qwen3.5 that require at least one user message unconditionally prepends a dummy user message before the real messages:
This works for user/assistant messages, but fails when the single message being processed is a system message, because Qwen3.5's chat template enforces two constraints:
The fallback satisfies (1) but violates (2), raising:
TemplateError: System message must be at the beginning.
Reproduce: run multiturn SFT with a dataset where some conversations have a system message and some do not. verl tokenizes each message individually via _process_single_message; when a system message is processed alone, the fallback inserts dummy_user before it, breaking the template constraint.
Fix: detect whether the messages list contains a system role. If so, append the dummy user after the real messages instead, and strip the dummy suffix from the output using the length-difference trick (len(two_users) - len(one_user)) to recover the pure system tokens. The original prefix-stripping path is preserved for all non-system cases.
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.