Refactor Attention implementation for ViT-based models #36545

qubvel · 2025-03-04T18:24:03Z

What does this PR do?

Updates the way attention implementation is chosen. Instead of defining separate classes we use functional approach and switch attention implementation on the fly with congig._attn_implementaiton param.

The following model will have SDPA and FA2 support:

vit
audio_spectrogram_transformer
deit
dinov2
dinov2_with_registers
dpt
ijepa
videomae
vit_mae
vit_msn
vitpose_backbone
vivit
yolos

It also affects the following models:

depth_anything (use dinov2 backbone)
zoedepth (use dinov2 backbone)

Fixes:

Request: Add Flash Attention 2.0 Support for ViTMAEForPreTraining #36527

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-03-04T18:56:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qubvel · 2025-03-05T09:30:40Z

run-slow: vit, audio_spectrogram_transformer, deit, dinov2, dinov2_with_registers, dpt, ijepa, videomae, vit_mae, vit_msn, vitpose_backbone, vivit, yolos

github-actions · 2025-03-05T09:32:04Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/audio_spectrogram_transformer', 'models/deit', 'models/dinov2', 'models/dinov2_with_registers', 'models/dpt', 'models/ijepa', 'models/videomae', 'models/vit', 'models/vit_mae', 'models/vit_msn', 'models/vitpose_backbone', 'models/vivit', 'models/yolos']
quantizations: [] ...

qubvel · 2025-03-12T16:34:58Z

run-slow: vit, audio_spectrogram_transformer, deit, dinov2, dinov2_with_registers, dpt, ijepa, videomae, vit_mae, vit_msn, vitpose_backbone, vivit, yolos

github-actions · 2025-03-12T16:36:17Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/audio_spectrogram_transformer', 'models/deit', 'models/dinov2', 'models/dinov2_with_registers', 'models/dpt', 'models/ijepa', 'models/videomae', 'models/vit', 'models/vit_mae', 'models/vit_msn', 'models/vitpose_backbone', 'models/vivit', 'models/yolos']
quantizations: [] ...

qubvel · 2025-03-14T13:26:41Z

cc @Cyrilvallez for review if you have bandwidth 🤗

ArthurZucker

🧼 clean and perfect! Thanks a lot for working on this, quite tedious!

src/transformers/models/ijepa/modeling_ijepa.py

src/transformers/models/vit/modeling_vit.py

…6545) * Refactor vit attention * Refactor ViT-based models * 🚨🚨🚨 Fix prefix for DPT * Update params order * trigger tests * Fix Dinov2 attention * Fix DPT attention impl propagation for backbone config * Common test fix: config is modif. inplace - avoid it * view->reshape * Fixup * Fixup * Enable IJepa FA2 * Add FA2 in corresponding model docs

qubvel added 3 commits March 4, 2025 13:56

Refactor vit attention

005db82

Refactor ViT-based models

3cf2574

🚨🚨🚨 Fix prefix for DPT

c4ba9c4

Update params order

f0a26d0

Merge branch 'main' into refactor-vit-attention

0b0f614

qubvel marked this pull request as ready for review March 12, 2025 15:33

github-actions bot requested review from ArthurZucker and Rocketknight1 March 12, 2025 15:34

trigger tests

df3efe8

qubvel requested a review from Cyrilvallez March 14, 2025 13:26

qubvel added 5 commits March 14, 2025 13:55

Merge branch 'main' into refactor-vit-attention

22b1fcb

Fix Dinov2 attention

e179f8f

Fix DPT attention impl propagation for backbone config

ac25820

Common test fix: config is modif. inplace - avoid it

e0ec416

view->reshape

0c5aae8

ArthurZucker approved these changes Mar 20, 2025

View reviewed changes

src/transformers/models/ijepa/modeling_ijepa.py Show resolved Hide resolved

src/transformers/models/vit/modeling_vit.py Show resolved Hide resolved

qubvel added 6 commits March 20, 2025 12:44

Merge branch 'main' into refactor-vit-attention

8c41ee9

Fixup

a699987

Fixup

54500b9

Enable IJepa FA2

b747193

Add FA2 in corresponding model docs

f0bcc99

Merge branch 'main' into refactor-vit-attention

3fa1d84

qubvel merged commit 6629177 into huggingface:main Mar 20, 2025
22 of 23 checks passed

sbucaille mentioned this pull request Mar 22, 2025

Add RF-DETR #36895

Open

5 tasks

ydshieh mentioned this pull request Oct 15, 2025

Remove the head masking block in some vision models #41620

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor Attention implementation for ViT-based models #36545

Refactor Attention implementation for ViT-based models #36545

Uh oh!

qubvel commented Mar 4, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 4, 2025

Uh oh!

qubvel commented Mar 5, 2025

Uh oh!

github-actions bot commented Mar 5, 2025

Uh oh!

qubvel commented Mar 12, 2025

Uh oh!

github-actions bot commented Mar 12, 2025

Uh oh!

qubvel commented Mar 14, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Refactor Attention implementation for ViT-based models #36545

Refactor Attention implementation for ViT-based models #36545

Uh oh!

Conversation

qubvel commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 4, 2025

Uh oh!

qubvel commented Mar 5, 2025

Uh oh!

github-actions bot commented Mar 5, 2025

Uh oh!

qubvel commented Mar 12, 2025

Uh oh!

github-actions bot commented Mar 12, 2025

Uh oh!

qubvel commented Mar 14, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qubvel commented Mar 4, 2025 •

edited

Loading