Skip to content

ValueError: Could not find the transformer layer class Llama4VisionEncoderLayer in the model #37672

@astefanutti

Description

@astefanutti

System Info

transformers==4.51.3
Python version: 3.11

Who can help?

@ArthurZucker @amyeroberts @qubvel

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Load Llama4ForCausalLM model with FSDP auto-wrap policy enabled, e.g.:

model = Llama4ForCausalLM.from_pretrained("meta-llama/Llama-4-Scout-17B-16E-Instruct", torch_dtype="auto")

    # Training
    trainer = SFTTrainer(
        # model=model_args.model_name_or_path,
        model=model,
        args=training_args,
        ...
    )

This produces the following error:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/tmp/tmp.RYY4AI2EBM/ephemeral_script.py", line 137, in <module>
[rank0]:     main({'model_name_or_path': 'meta-llama/Llama-4-Scout-17B-16E-Instruct', 'model_revision': 'main', 'torch_dtype': 'bfloat16', 'attn_implementation': 'flex_attention', 'use_liger': False, 'use_peft': False, 'lora_r': 16, 'lora_alpha': 8, 'lora_dropout': 0.05, 'lora_target_modules': ['q_proj', 'v_proj', 'k_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'], 'lora_modules_to_save': ['lm_head', 'embed_tokens'], 'load_in_4bit': False, 'load_in_8bit': False, 'dataset_name': 'gsm8k', 'dataset_config': 'main', 'dataset_train_split': 'train', 'dataset_test_split': 'test', 'dataset_text_field': 'text', 'dataset_kwargs': {'add_special_tokens': False, 'append_concat_token': False}, 'max_seq_length': 8192, 'dataset_batch_size': 1000, 'packing': False, 'padding_free': False, 'num_train_epochs': 10, 'per_device_train_batch_size': 64, 'per_device_eval_batch_size': 64, 'auto_find_batch_size': False, 'eval_strategy': 'epoch', 'bf16': True, 'tf32': False, 'learning_rate': 0.0002, 'warmup_steps': 10, 'lr_scheduler_type': 'inverse_sqrt', 'optim': 'adamw_torch_fused', 'max_grad_norm': 1.0, 'seed': 42, 'gradient_accumulation_steps': 1, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': {'use_reentrant': False}, 'fsdp': 'full_shard auto_wrap', 'fsdp_config': {'activation_checkpointing': True, 'cpu_ram_efficient_loading': False, 'sync_module_states': True, 'use_orig_params': True, 'limit_all_gathers': False}, 'save_strategy': 'no', 'save_total_limit': 1, 'resume_from_checkpoint': False, 'log_level': 'info', 'logging_strategy': 'steps', 'logging_steps': 1, 'report_to': ['tensorboard'], 'output_dir': '/mnt/shared/Llama-4-Scout-17B-16E-Instruct'})
[rank0]:   File "/tmp/tmp.RYY4AI2EBM/ephemeral_script.py", line 130, in main
[rank0]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/transformers/trainer.py", line 2238, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/transformers/trainer.py", line 2357, in _inner_training_loop
[rank0]:     self.model = self.accelerator.prepare(self.model)
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/accelerate/accelerator.py", line 1446, in prepare
[rank0]:     result = tuple(
[rank0]:              ^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/accelerate/accelerator.py", line 1447, in <genexpr>
[rank0]:     self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/accelerate/accelerator.py", line 1289, in _prepare_one
[rank0]:     return self.prepare_model(obj, device_placement=device_placement)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/accelerate/accelerator.py", line 1630, in prepare_model
[rank0]:     self.state.fsdp_plugin.set_auto_wrap_policy(model)
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/accelerate/utils/dataclasses.py", line 1903, in set_auto_wrap_policy
[rank0]:     raise ValueError(f"Could not find the transformer layer class {layer_class} in the model.")
[rank0]: ValueError: Could not find the transformer layer class Llama4VisionEncoderLayer in the model.

Expected behavior

The model should load and be wrapped by FSDP successfully.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions