Flash attention no longer working in most recent build? #15650

Master-Pr0grammer · 2025-08-29T03:27:07Z

Master-Pr0grammer
Aug 29, 2025

I keep getting a "FlashAttention without tensor cores only supports head sizes 64 and 128." error before a seg fault when ever i try to run any gemma3 model on the most recent build.

I have a GTX 1080ti which I know is old and does not have tensor cores, however I was able to run this perfectly before updating. I was wondering if anyone had a similar experience and/or a fix that doesn't involve downgrading. Or maybe this is a bug? I wanted to ask before filing a bug report.

mamei16 · 2025-09-08T13:16:21Z

mamei16
Sep 8, 2025

I have the same problem when I try to run Gemm3-12B, except I have an AMD GPU. Here is my system info:

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6800M, gfx1030 (0x1030), VMM: no, Wave Size: 32
version: 6380 (a68d9144)
built with cc (SUSE Linux) 7.5.0 for x86_64-suse-linux

1 reply

mamei16 Sep 8, 2025

I appears to be fixed in the latest version (b0d5299)!

CISC · 2025-09-08T13:27:46Z

CISC
Sep 8, 2025
Collaborator

Flash Attention never worked for this combination, the difference is that Flash Attention is default now (turn it off with -fa off), however it should simply fall back, not segfault, make sure you are using latest release!

1 reply

JohannesGaessler Sep 9, 2025
Collaborator

I think Pascal may have previously used the vector kernels for large batch sizes so it did technically work with bad performance. With the latest master it should be working with better performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flash attention no longer working in most recent build? #15650

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Flash attention no longer working in most recent build? #15650

Uh oh!

Master-Pr0grammer Aug 29, 2025

Replies: 2 comments · 2 replies

Uh oh!

mamei16 Sep 8, 2025

Uh oh!

mamei16 Sep 8, 2025

Uh oh!

Uh oh!

CISC Sep 8, 2025 Collaborator

Uh oh!

JohannesGaessler Sep 9, 2025 Collaborator

Master-Pr0grammer
Aug 29, 2025

Replies: 2 comments 2 replies

mamei16
Sep 8, 2025

CISC
Sep 8, 2025
Collaborator

JohannesGaessler Sep 9, 2025
Collaborator