-
Notifications
You must be signed in to change notification settings - Fork 721
Open
Description
🐛 Bug
I am encountering an error with xformers 0.0.30 memory_efficient_attention on RTX 5090, with half-precision tensors. I am using torch 2.7 with cuda 12.8.
Calling the function returns the following error:
CUDA error (/__w/xformers/xformers/third_party/flash-attention/hopper/flash_fwd_launch_template.h:175): no kernel image is available for execution on the device
Note that it works in full precision.
Also, I tried installing the latest version from source (commit 8fc8ec5), and I cannot reproduce the issue, it works as expected. I installed it with poetry run pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
Command to reproduce
from xformers.ops import memory_efficient_attention
import torch
q, k, v = torch.randn(5, 10, 128).half().cuda(), torch.randn(5, 10, 128).half().cuda(), torch.randn(5, 10, 128).half().cuda()
memory_efficient_attention(q, k, v)
Returns
CUDA error (/__w/xformers/xformers/third_party/flash-attention/hopper/flash_fwd_launch_template.h:175): no kernel image is available for execution on the device
Environment
You can run the script with:
- PyTorch Version (e.g., 1.0): 2.7
- OS (e.g., Linux): ubuntu 24.04.2
- How you installed PyTorch (
conda
,pip
, source): poetry/pip - Build command you used (if compiling from source):
- Python version: 3.12
- CUDA/cuDNN version: 12.8 / Nvidia driver version: 570.133.20
- GPU models and configuration: RTX 5090
- Any other relevant information:
mihkhub, ryseek, kelvin-homann and chatballkoiker and NightingaleCen
Metadata
Metadata
Assignees
Labels
No labels