Skip to content

Conversation

@astefanutti
Copy link
Contributor

What does this PR do?

This PR moves the definition of _no_split_modules for Llama4 pre-trained models from the base Llama4PreTrainedModel class to the subclasses so each model has the correct set of modules.

Otherwise loading the Llama4ForCausalLM currently fails because it doesn't have the Llama4VisionEncoderLayer module.

Fixes #37672

Who can review?

@ArthurZucker @amyeroberts @qubvel

@github-actions github-actions bot marked this pull request as draft April 22, 2025 11:05
@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@astefanutti astefanutti force-pushed the fix-llama4-no-split-modules branch from 771dd68 to e5976aa Compare April 22, 2025 11:06
@astefanutti astefanutti marked this pull request as ready for review April 22, 2025 11:07
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Can you merge main or push an empty commit to trigger CI?

@astefanutti astefanutti force-pushed the fix-llama4-no-split-modules branch 2 times, most recently from bb8718f to 14483b5 Compare April 22, 2025 12:45
@astefanutti
Copy link
Contributor Author

@zucchini-nlp thanks, I've just rebased and re-pushed.

@astefanutti
Copy link
Contributor Author

The CI error seems related to #37675.

@MekkCyber
Copy link
Contributor

Hey @astefanutti, should be fixed now ! you can just update the branch

@astefanutti astefanutti force-pushed the fix-llama4-no-split-modules branch from 14483b5 to 0f7432a Compare April 22, 2025 13:35
@astefanutti
Copy link
Contributor Author

@MekkCyber thanks for the update, I've just rebased on it.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp merged commit dde9b03 into huggingface:main Apr 22, 2025
20 checks passed
@astefanutti astefanutti deleted the fix-llama4-no-split-modules branch April 22, 2025 14:05
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError: Could not find the transformer layer class Llama4VisionEncoderLayer in the model

4 participants