-
Notifications
You must be signed in to change notification settings - Fork 74
fix edge case for qwen3 data processing #626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d9a21d8 to
79c2f2e
Compare
765eb34 to
414c296
Compare
|
@RobotSail , we are working on CI. In the meantime please just run the Large e2e job manually on this PR when its ready for review |
|
E2E (NVIDIA L40S x4) (python 3.11) workflow launched on this PR: View run |
|
e2e workflow succeeded on this PR: View run, congrats! |
Maxusmusti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@Mergifyio rebase |
…_content` fields Signed-off-by: Oleg S <[email protected]>
Signed-off-by: Oleg S <[email protected]>
…: false` on samples functions as expected Signed-off-by: Oleg S <[email protected]>
✅ Branch has been successfully rebased |
968ba35 to
a05d1e4
Compare
With Qwen3, there's an edge case which can result in the unmask/mask logic breaking during data processing.
Root Cause: The error occurs specifically when using the Qwen/Qwen3-32B tokenizer, not with Qwen/Qwen2.5-32B-Instruct. The problematic sample contains multiple tags in the assistant's
response.
Issue Location: The error occurs in data_process.py:555 in the unmask_messages function, where it encounters an <|UNMASK_END|> token while not in an unmasking state.
Key Findings:
The Problem:
The Qwen/Qwen3-32B model's chat template is processing the <|UNMASK_BEGIN|> and <|UNMASK_END|> tokens in a way that causes them to appear out of order or in an unexpected state, leading to
the algorithm encountering an <|UNMASK_END|> token when it's not actively unmasking.
This is likely due to differences in how the chat templates of Qwen2.5 vs Qwen3 handle special tokens, particularly when there are multiple special tokens or complex content like the
tags present in the assistant's response.
Signed-off-by: Oleg S [email protected]