Skip to content

Conversation

@danielquintas8
Copy link
Contributor

What does this PR do?

Fixed: Expected self.dtype to be equal to src.dtype - routing_weights casting

Related issue

Fixes #41418

@afonsosilva91
Copy link

+1

@vasqu
Copy link
Contributor

vasqu commented Oct 7, 2025

cc @ArthurZucker for Moe since this might affect more models (?)

@i3hz
Copy link
Contributor

i3hz commented Oct 8, 2025

I think you didn't run the code which generates the modeling files and that's why you're seeing the difference.

@Rocketknight1
Copy link
Member

cc @zucchini-nlp

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, can you check this mistake was not propagated to other models please!

@danielquintas8
Copy link
Contributor Author

danielquintas8 commented Oct 8, 2025

Thanks, can you check this mistake was not propagated to other models please!

We found the same casting happening in a few more models (ernie4_5_moe, qwen2_moe, qwen3_moe, qwen3_next, qwen3_omni_moe) and updated them accordingly

The hunyuan_v1_moe implementation of route_tokens_to_experts has different arguments compared to the other models, so this falls outside the scope of this issue. -> I will inspect it further in a future issue

@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: ernie4_5_moe, qwen2_moe, qwen3_moe, qwen3_next, qwen3_omni_moe, qwen3_vl_moe

@jaaabir
Copy link

jaaabir commented Oct 10, 2025

@ArthurZucker can you merge this so that we can properly train these models?

@danielquintas8
Copy link
Contributor Author

Hey @ArthurZucker, all checks are green and the PR’s been referenced a couple times.
Just wanted to check if there’s anything else needed before merge. Thanks!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and sorry for coming back late

@ArthurZucker ArthurZucker merged commit c620c38 into huggingface:main Oct 14, 2025
15 checks passed
@ArthurZucker ArthurZucker added for patch Tag issues / labels that should be included in the next patch and removed for patch Tag issues / labels that should be included in the next patch labels Oct 14, 2025
ngazagna-qc pushed a commit to ngazagna-qc/transformers that referenced this pull request Oct 23, 2025
…uting_weights casting (huggingface#41420)

* Fixed Expected self.dtype to be equal to src.dtype on eval

* Fixed Expected self.dtype to be equal to src.dtype on eval

* Fixed Expected self.dtype to be equal to src.dtype on eval

* generated modeling_qwen3_vl_moe.py file

* Fixed Ernie_4_5_MoE router casting

* Fixed routing_weights dtype casting (ernie4_5_moe, hunyuan_v1_moe, qwen2_moe, qwen3_moe, qwen3_next,qwen3_omni_moe)

* rollback hunyuan_v1_moe changes

---------

Co-authored-by: Daniel Oliveira <[email protected]>
Co-authored-by: Daniel Oliveira <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen3 VL Moe: Expected self.dtype to be equal to src.dtype

8 participants