-
Notifications
You must be signed in to change notification settings - Fork 31.7k
[Qwen3VLMoe] Fixed: Expected self.dtype to be equal to src.dtype - routing_weights casting #41420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Qwen3VLMoe] Fixed: Expected self.dtype to be equal to src.dtype - routing_weights casting #41420
Conversation
|
+1 |
|
cc @ArthurZucker for Moe since this might affect more models (?) |
|
I think you didn't run the code which generates the modeling files and that's why you're seeing the difference. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, can you check this mistake was not propagated to other models please!
…en2_moe, qwen3_moe, qwen3_next,qwen3_omni_moe)
We found the same casting happening in a few more models (ernie4_5_moe, qwen2_moe, qwen3_moe, qwen3_next, qwen3_omni_moe) and updated them accordingly The |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: ernie4_5_moe, qwen2_moe, qwen3_moe, qwen3_next, qwen3_omni_moe, qwen3_vl_moe |
|
@ArthurZucker can you merge this so that we can properly train these models? |
|
Hey @ArthurZucker, all checks are green and the PR’s been referenced a couple times. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and sorry for coming back late
…uting_weights casting (huggingface#41420) * Fixed Expected self.dtype to be equal to src.dtype on eval * Fixed Expected self.dtype to be equal to src.dtype on eval * Fixed Expected self.dtype to be equal to src.dtype on eval * generated modeling_qwen3_vl_moe.py file * Fixed Ernie_4_5_MoE router casting * Fixed routing_weights dtype casting (ernie4_5_moe, hunyuan_v1_moe, qwen2_moe, qwen3_moe, qwen3_next,qwen3_omni_moe) * rollback hunyuan_v1_moe changes --------- Co-authored-by: Daniel Oliveira <[email protected]> Co-authored-by: Daniel Oliveira <[email protected]>
What does this PR do?
Fixed: Expected self.dtype to be equal to src.dtype - routing_weights casting
Related issue
Fixes #41418