Recent Qwen2VL merge request (#35837) break compatibility with DeepSpeed

The recent merge request (#35837) works with accelerate but breaks with DeepSpeed (w/ and w/o deepspeed config) 

- distributed_type: MULTI_GPU (work)
- distributed_type: DEEPSPEED (no longer works)

To be more precise the issue lies in this section: https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py#L200 
```
    emb = torch.cat((rotary_pos_emb, rotary_pos_emb), dim=-1)
    cos = emb.cos().float()
    sin = emb.sin().float()
else:
    cos, sin = position_embeddings
q, k = apply_rotary_pos_emb_flashatt(q.unsqueeze(0), k.unsqueeze(0), cos, sin)
```

`cos, sin = position_embeddings` these are not casted to float and are subject to various dtypes depending on the DeepSpeed and mixed_precision config.

This accelerate config works:
```
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: MULTI_GPU
downcast_bf16: 'no'
enable_cpu_affinity:  #false
main_training_function: main
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
mixed_precision: bf16
```

This accelerate config  no longer works:
```
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: DEEPSPEED
deepspeed_config:
  zero_stage: 3
downcast_bf16: 'no'
enable_cpu_affinity: false
main_training_function: main
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```








Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recent Qwen2VL merge request (#35837) break compatibility with DeepSpeed #36187

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Recent Qwen2VL merge request (#35837) break compatibility with DeepSpeed #36187

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions