Skip to content

sdpa vulkan flash attention with cooperative matrix optimization#6528

Merged
nihui merged 23 commits intoTencent:masterfrom
nihui:opt-vulkan-sdpa-3
Feb 4, 2026
Merged

sdpa vulkan flash attention with cooperative matrix optimization#6528
nihui merged 23 commits intoTencent:masterfrom
nihui:opt-vulkan-sdpa-3

Conversation

@nihui
Copy link
Member

@nihui nihui commented Jan 29, 2026

z-image-ncnn 1024x1024

end2end 5060ti 9060xt 7900xtx
baseline 1m49s 1m27s 47.8s
+sdpa fa2 1m33s 1m8s 40.4s

@tencent-adm
Copy link
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@nihui nihui force-pushed the opt-vulkan-sdpa-3 branch from 8a64a08 to c8bbea2 Compare January 30, 2026 09:29
@nihui nihui closed this Jan 30, 2026
@nihui nihui reopened this Jan 30, 2026
@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2026

Codecov Report

❌ Patch coverage is 95.91837% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.95%. Comparing base (b0be0c9) to head (b496a4a).

Files with missing lines Patch % Lines
src/layer/vulkan/sdpa_vulkan.cpp 95.91% 4 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           master    #6528    +/-   ##
========================================
  Coverage   92.95%   92.95%            
========================================
  Files         809      809            
  Lines      256808   257186   +378     
========================================
+ Hits       238713   239079   +366     
- Misses      18095    18107    +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a cooperative-matrix–based Vulkan flash-attention kernel and integrates it into the SDPA_vulkan path to accelerate attention when cooperative matrices and BF16 are available.

Changes:

  • Add a new sdpa_fa_cm compute shader implementing online softmax and QK/V matmuls using cooperative matrices and shared-memory tiling.
  • Extend SDPA_vulkan to create/destroy a dedicated flash-attention pipeline and to route the forward path through the new kernel when conditions are met.
  • Add helper printing functions to inspect Mat and VkMat data from the SDPA Vulkan implementation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
src/layer/vulkan/shader/sdpa_fa_cm.comp New cooperative-matrix flash-attention shader (QK, online softmax, PV, and packed FP16/BF16 I/O).
src/layer/vulkan/sdpa_vulkan.h Adds flash-attention pipeline pointer and unroll parameters to the SDPA_vulkan class.
src/layer/vulkan/sdpa_vulkan.cpp Creates/destroys the flash-attention pipeline, wires it into forward, and adds CPU/Vulkan pretty-print debugging helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added the test label Feb 3, 2026
@nihui nihui closed this Feb 3, 2026
@nihui nihui reopened this Feb 3, 2026
@nihui nihui force-pushed the opt-vulkan-sdpa-3 branch from eb8a15d to 8b3d79d Compare February 3, 2026 07:39
@nihui nihui changed the title [WIP] sdpa vulkan flash attention with cooperative matrix optimization sdpa vulkan flash attention with cooperative matrix optimization Feb 3, 2026
@nihui nihui merged commit e665c48 into Tencent:master Feb 4, 2026
104 of 107 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants