Flash attention no longer working in most recent build? #15650
Master-Pr0grammer
started this conversation in
General
Replies: 2 comments 2 replies
-
I have the same problem when I try to run Gemm3-12B, except I have an AMD GPU. Here is my system info:
|
Beta Was this translation helpful? Give feedback.
1 reply
-
Flash Attention never worked for this combination, the difference is that Flash Attention is default now (turn it off with |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I keep getting a "FlashAttention without tensor cores only supports head sizes 64 and 128." error before a seg fault when ever i try to run any gemma3 model on the most recent build.
I have a GTX 1080ti which I know is old and does not have tensor cores, however I was able to run this perfectly before updating. I was wondering if anyone had a similar experience and/or a fix that doesn't involve downgrading. Or maybe this is a bug? I wanted to ask before filing a bug report.
Beta Was this translation helpful? Give feedback.
All reactions