ValueError: Could not find the transformer layer class Llama4VisionEncoderLayer in the model

### System Info

transformers==4.51.3
Python version: 3.11

### Who can help?

@ArthurZucker @amyeroberts @qubvel

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

Load `Llama4ForCausalLM` model with FSDP auto-wrap policy enabled, e.g.:

```
model = Llama4ForCausalLM.from_pretrained("meta-llama/Llama-4-Scout-17B-16E-Instruct", torch_dtype="auto")

    # Training
    trainer = SFTTrainer(
        # model=model_args.model_name_or_path,
        model=model,
        args=training_args,
        ...
    )
```

This produces the following error:

```
[rank0]: Traceback (most recent call last):
[rank0]:   File "/tmp/tmp.RYY4AI2EBM/ephemeral_script.py", line 137, in <module>
[rank0]:     main({'model_name_or_path': 'meta-llama/Llama-4-Scout-17B-16E-Instruct', 'model_revision': 'main', 'torch_dtype': 'bfloat16', 'attn_implementation': 'flex_attention', 'use_liger': False, 'use_peft': False, 'lora_r': 16, 'lora_alpha': 8, 'lora_dropout': 0.05, 'lora_target_modules': ['q_proj', 'v_proj', 'k_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'], 'lora_modules_to_save': ['lm_head', 'embed_tokens'], 'load_in_4bit': False, 'load_in_8bit': False, 'dataset_name': 'gsm8k', 'dataset_config': 'main', 'dataset_train_split': 'train', 'dataset_test_split': 'test', 'dataset_text_field': 'text', 'dataset_kwargs': {'add_special_tokens': False, 'append_concat_token': False}, 'max_seq_length': 8192, 'dataset_batch_size': 1000, 'packing': False, 'padding_free': False, 'num_train_epochs': 10, 'per_device_train_batch_size': 64, 'per_device_eval_batch_size': 64, 'auto_find_batch_size': False, 'eval_strategy': 'epoch', 'bf16': True, 'tf32': False, 'learning_rate': 0.0002, 'warmup_steps': 10, 'lr_scheduler_type': 'inverse_sqrt', 'optim': 'adamw_torch_fused', 'max_grad_norm': 1.0, 'seed': 42, 'gradient_accumulation_steps': 1, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': {'use_reentrant': False}, 'fsdp': 'full_shard auto_wrap', 'fsdp_config': {'activation_checkpointing': True, 'cpu_ram_efficient_loading': False, 'sync_module_states': True, 'use_orig_params': True, 'limit_all_gathers': False}, 'save_strategy': 'no', 'save_total_limit': 1, 'resume_from_checkpoint': False, 'log_level': 'info', 'logging_strategy': 'steps', 'logging_steps': 1, 'report_to': ['tensorboard'], 'output_dir': '/mnt/shared/Llama-4-Scout-17B-16E-Instruct'})
[rank0]:   File "/tmp/tmp.RYY4AI2EBM/ephemeral_script.py", line 130, in main
[rank0]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/transformers/trainer.py", line 2238, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/transformers/trainer.py", line 2357, in _inner_training_loop
[rank0]:     self.model = self.accelerator.prepare(self.model)
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/accelerate/accelerator.py", line 1446, in prepare
[rank0]:     result = tuple(
[rank0]:              ^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/accelerate/accelerator.py", line 1447, in <genexpr>
[rank0]:     self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/accelerate/accelerator.py", line 1289, in _prepare_one
[rank0]:     return self.prepare_model(obj, device_placement=device_placement)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/accelerate/accelerator.py", line 1630, in prepare_model
[rank0]:     self.state.fsdp_plugin.set_auto_wrap_policy(model)
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/accelerate/utils/dataclasses.py", line 1903, in set_auto_wrap_policy
[rank0]:     raise ValueError(f"Could not find the transformer layer class {layer_class} in the model.")
[rank0]: ValueError: Could not find the transformer layer class Llama4VisionEncoderLayer in the model.
```


### Expected behavior

The model should load and be wrapped by FSDP successfully.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ValueError: Could not find the transformer layer class Llama4VisionEncoderLayer in the model #37672

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ValueError: Could not find the transformer layer class Llama4VisionEncoderLayer in the model #37672

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions