sdpa vulkan flash attention with cooperative matrix optimization#6528
sdpa vulkan flash attention with cooperative matrix optimization#6528nihui merged 23 commits intoTencent:masterfrom
Conversation
|
|
8a64a08 to
c8bbea2
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6528 +/- ##
========================================
Coverage 92.95% 92.95%
========================================
Files 809 809
Lines 256808 257186 +378
========================================
+ Hits 238713 239079 +366
- Misses 18095 18107 +12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR introduces a cooperative-matrix–based Vulkan flash-attention kernel and integrates it into the SDPA_vulkan path to accelerate attention when cooperative matrices and BF16 are available.
Changes:
- Add a new
sdpa_fa_cmcompute shader implementing online softmax and QK/V matmuls using cooperative matrices and shared-memory tiling. - Extend
SDPA_vulkanto create/destroy a dedicated flash-attention pipeline and to route the forward path through the new kernel when conditions are met. - Add helper printing functions to inspect
MatandVkMatdata from the SDPA Vulkan implementation.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| src/layer/vulkan/shader/sdpa_fa_cm.comp | New cooperative-matrix flash-attention shader (QK, online softmax, PV, and packed FP16/BF16 I/O). |
| src/layer/vulkan/sdpa_vulkan.h | Adds flash-attention pipeline pointer and unroll parameters to the SDPA_vulkan class. |
| src/layer/vulkan/sdpa_vulkan.cpp | Creates/destroys the flash-attention pipeline, wires it into forward, and adds CPU/Vulkan pretty-print debugging helpers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
eb8a15d to
8b3d79d
Compare
z-image-ncnn 1024x1024