sdpa vulkan flash attention with cooperative matrix optimization by nihui · Pull Request #6528 · Tencent/ncnn

nihui · 2026-01-29T04:44:15Z

z-image-ncnn 1024x1024

end2end	5060ti	9060xt	7900xtx
baseline	1m49s	1m27s	47.8s
+sdpa fa2	1m33s	1m8s	40.4s

tencent-adm · 2026-01-29T04:44:33Z

Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov-commenter · 2026-01-30T11:09:49Z

Codecov Report

❌ Patch coverage is 95.91837% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.95%. Comparing base (b0be0c9) to head (b496a4a).

Files with missing lines	Patch %	Lines
src/layer/vulkan/sdpa_vulkan.cpp	95.91%	4 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #6528    +/-   ##
========================================
  Coverage   92.95%   92.95%            
========================================
  Files         809      809            
  Lines      256808   257186   +378     
========================================
+ Hits       238713   239079   +366     
- Misses      18095    18107    +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR introduces a cooperative-matrix–based Vulkan flash-attention kernel and integrates it into the SDPA_vulkan path to accelerate attention when cooperative matrices and BF16 are available.

Changes:

Add a new sdpa_fa_cm compute shader implementing online softmax and QK/V matmuls using cooperative matrices and shared-memory tiling.
Extend SDPA_vulkan to create/destroy a dedicated flash-attention pipeline and to route the forward path through the new kernel when conditions are met.
Add helper printing functions to inspect Mat and VkMat data from the SDPA Vulkan implementation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File	Description
src/layer/vulkan/shader/sdpa_fa_cm.comp	New cooperative-matrix flash-attention shader (QK, online softmax, PV, and packed FP16/BF16 I/O).
src/layer/vulkan/sdpa_vulkan.h	Adds flash-attention pipeline pointer and unroll parameters to the SDPA_vulkan class.
src/layer/vulkan/sdpa_vulkan.cpp	Creates/destroys the flash-attention pipeline, wires it into `forward`, and adds CPU/Vulkan pretty-print debugging helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/layer/vulkan/shader/sdpa_fa_cm.comp

src/layer/vulkan/sdpa_vulkan.cpp

src/layer/vulkan/shader/sdpa_fa_cm.comp

src/layer/vulkan/sdpa_vulkan.cpp

src/layer/vulkan/shader/sdpa_fa_cm.comp

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/layer/vulkan/shader/sdpa_fa_cm.comp

Co-authored-by: Copilot <[email protected]>

nihui and others added 4 commits January 23, 2026 22:03

w

1ff5407

apply code-format changes

08daf9d

Merge branch 'Tencent:master' into opt-vulkan-sdpa-3

3814471

fa2 sg cm v0

9262310

github-actions bot added the vulkan label Jan 29, 2026

nihui and others added 3 commits January 29, 2026 04:44

apply code-format changes

101fb03

wip

306c8be

qwq

c8bbea2

nihui force-pushed the opt-vulkan-sdpa-3 branch from 8a64a08 to c8bbea2 Compare January 30, 2026 09:29

apply code-format changes

5fea9bf

nihui closed this Jan 30, 2026

nihui reopened this Jan 30, 2026

nihui requested a review from Copilot February 2, 2026 02:12

Copilot started reviewing on behalf of nihui February 2, 2026 02:12 View session

Copilot AI reviewed Feb 2, 2026

View reviewed changes

nihui added 2 commits February 2, 2026 19:06

Merge branch 'master' into opt-vulkan-sdpa-3

10baceb

w

a1be062

nihui requested a review from Copilot February 2, 2026 12:29

Copilot started reviewing on behalf of nihui February 2, 2026 12:29 View session

Copilot AI reviewed Feb 2, 2026

View reviewed changes

nihui and others added 3 commits February 2, 2026 20:49

Update src/layer/vulkan/shader/sdpa_fa_cm.comp

62bdaea

Co-authored-by: Copilot <[email protected]>

Update src/layer/vulkan/shader/sdpa_fa_cm.comp

437fbea

Co-authored-by: Copilot <[email protected]>

w

2af9be4

github-actions bot added the test label Feb 3, 2026

u

64581d8

nihui closed this Feb 3, 2026

nihui reopened this Feb 3, 2026

w

8b3d79d

nihui force-pushed the opt-vulkan-sdpa-3 branch from eb8a15d to 8b3d79d Compare February 3, 2026 07:39

nihui added 2 commits February 3, 2026 15:43

cc

687b925

Merge branch 'master' into opt-vulkan-sdpa-3

c7b11c4

nihui changed the title ~~[WIP] sdpa vulkan flash attention with cooperative matrix optimization~~ sdpa vulkan flash attention with cooperative matrix optimization Feb 3, 2026

nihui and others added 6 commits February 3, 2026 19:17

b

e210a1c

f

71ed322

nv

5f52a24

nv0

8583c59

nv1

589be29

disable sm pad on old driver

b496a4a

nihui merged commit e665c48 into Tencent:master Feb 4, 2026
104 of 107 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sdpa vulkan flash attention with cooperative matrix optimization#6528

sdpa vulkan flash attention with cooperative matrix optimization#6528
nihui merged 23 commits intoTencent:masterfrom
nihui:opt-vulkan-sdpa-3

nihui commented Jan 29, 2026 •

edited

Loading

Uh oh!

tencent-adm commented Jan 29, 2026

Uh oh!

codecov-commenter commented Jan 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nihui commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tencent-adm commented Jan 29, 2026

Uh oh!

codecov-commenter commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nihui commented Jan 29, 2026 •

edited

Loading

codecov-commenter commented Jan 30, 2026 •

edited

Loading