memory_efficient_attention fails with half precision tensors, with pre-compiled xformers 0.0.30 on RTX 5090

# 🐛 Bug

I am encountering an error with xformers 0.0.30 memory_efficient_attention on RTX 5090, with half-precision tensors. I am using torch 2.7 with cuda 12.8.

Calling the function returns the following error:
```
CUDA error (/__w/xformers/xformers/third_party/flash-attention/hopper/flash_fwd_launch_template.h:175): no kernel image is available for execution on the device
```

Note that it works in full precision.

Also, I tried installing the latest version from source  (commit 8fc8ec5a4d6498ff81c0c418b89bbaf133ae3a44), and I cannot reproduce the issue, it works as expected. I installed it with `poetry run pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers`

## Command to reproduce

```
from xformers.ops import memory_efficient_attention
import torch
q, k, v = torch.randn(5, 10, 128).half().cuda(), torch.randn(5, 10, 128).half().cuda(), torch.randn(5, 10, 128).half().cuda()
memory_efficient_attention(q, k, v)
```

Returns 
```
CUDA error (/__w/xformers/xformers/third_party/flash-attention/hopper/flash_fwd_launch_template.h:175): no kernel image is available for execution on the device
```

## Environment


You can run the script with:

- PyTorch Version (e.g., 1.0): 2.7
- OS (e.g., Linux): ubuntu 24.04.2
- How you installed PyTorch (`conda`, `pip`, source): poetry/pip
- Build command you used (if compiling from source):
- Python version: 3.12
- CUDA/cuDNN version: 12.8 / Nvidia driver version: 570.133.20
- GPU models and configuration: RTX 5090
- Any other relevant information:



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memory_efficient_attention fails with half precision tensors, with pre-compiled xformers 0.0.30 on RTX 5090 #1251

🐛 Bug

Command to reproduce

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

memory_efficient_attention fails with half precision tensors, with pre-compiled xformers 0.0.30 on RTX 5090 #1251

Description

🐛 Bug

Command to reproduce

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions