Commit 4dba29f
feat: add SM120 fmha_v2 kernels to AOT pip wheel builds (#2885)
## Summary
`gen_trtllm_fmha_v2_sm120_module()` exists in `jit/attention/modules.py`
and the JIT runtime path (`generate_kernels.py`) already dispatches to
it correctly. However, `aot.py`'s `gen_all_modules()` — which drives the
pip wheel AOT build — was missing it from the `has_sm120 or has_sm121`
section.
This means SM120/SM121 devices using a pip wheel would never get the
fmha_v2 SM120 kernels compiled into the wheel, and would have to fall
back to slower paths.
**Fix:** Add `gen_trtllm_fmha_v2_sm120_module()` to the `has_sm120 or
has_sm121` block in `aot.py`, alongside the other SM120 modules (fused
MOE, GEMM, FP4 quantization).
No behavior change for JIT users; only affects AOT pip wheel builds.
Addresses the AOT gap noted in #2555.
Contributed by Second Nature Computing (https://joinsecondnature.com)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Expanded optimized inference module support for SM120 and SM121 GPUs
to include attention kernels in addition to existing fused MoE and GEMM
optimizations.
* Increased runtime coverage and readiness for attention-heavy workloads
on those architectures, improving performance consistency for models
using attention.
[](https://app.coderabbit.ai/change-stack/flashinfer-ai/flashinfer/pull/2885)
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent 6885e76 commit 4dba29f
1 file changed
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| 47 | + | |
47 | 48 | | |
48 | 49 | | |
49 | 50 | | |
| |||
533 | 534 | | |
534 | 535 | | |
535 | 536 | | |
536 | | - | |
| 537 | + | |
537 | 538 | | |
538 | 539 | | |
539 | 540 | | |
540 | 541 | | |
541 | 542 | | |
542 | 543 | | |
| 544 | + | |
543 | 545 | | |
544 | 546 | | |
545 | 547 | | |
| |||
0 commit comments