Problem
In sam3/model/decoder.py, the else branch (non-FA3 path) of the attention computation uses:
with sdpa_kernel(SDPBackend.FLASH_ATTENTION):
out = torchF.scaled_dot_product_attention(q, k, v, dropout_p=dropout)
sdpa_kernel is exclusive, it disables any backend not in the list. So this disables EFFICIENT_ATTENTION and MATH entirely, leaving Flash Attention as the only option.
On Windows, PyTorch doesn't ship with the Flash Attention backend compiled in. This means zero backends are available and you get:
RuntimeError: No available kernel. Aborting execution.
with warnings like:
UserWarning: No available kernel. Aborting execution.
FlashAttention is not supported (Flash Attention was not compiled for this system)
mem_efficient_sdp_enabled was set to False
math_sdp_enabled was set to False
Other files already handle this correctly
vl_combiner.py allows all three backends:
sdpa_context = sdpa_kernel(
[
SDPBackend.MATH,
SDPBackend.EFFICIENT_ATTENTION,
SDPBackend.FLASH_ATTENTION,
]
)
And sam/transformer.py enables all three globally:
torch.backends.cuda.enable_flash_sdp(True)
torch.backends.cuda.enable_math_sdp(True)
torch.backends.cuda.enable_mem_efficient_sdp(True)
So decoder.py is the only place that forces a single backend exclusively.
Fix
Change the sdpa_kernel call in decoder.py to allow fallback backends, same as vl_combiner.py:
with sdpa_kernel([SDPBackend.FLASH_ATTENTION, SDPBackend.EFFICIENT_ATTENTION, SDPBackend.MATH]):
out = torchF.scaled_dot_product_attention(q, k, v, dropout_p=dropout)
This way Flash Attention is still preferred when available, but PyTorch can fall back to efficient attention or math on platforms that don't have it.
Environment
- Windows 11, CUDA 12.8
- PyTorch 2.7.0 (cu128)
- SAM3.1 multiplex video predictor with
use_fa3=False
Problem
In
sam3/model/decoder.py, theelsebranch (non-FA3 path) of the attention computation uses:sdpa_kernelis exclusive, it disables any backend not in the list. So this disablesEFFICIENT_ATTENTIONandMATHentirely, leaving Flash Attention as the only option.On Windows, PyTorch doesn't ship with the Flash Attention backend compiled in. This means zero backends are available and you get:
with warnings like:
Other files already handle this correctly
vl_combiner.pyallows all three backends:And
sam/transformer.pyenables all three globally:So
decoder.pyis the only place that forces a single backend exclusively.Fix
Change the
sdpa_kernelcall indecoder.pyto allow fallback backends, same asvl_combiner.py:This way Flash Attention is still preferred when available, but PyTorch can fall back to efficient attention or math on platforms that don't have it.
Environment
use_fa3=False