🚀 Highlights
- Intra‑card CP for the backward pass, developed by @Erix025
- SM100 architecture support: warp-specialized kernels with tcgen05
- Support
state_v_firstand aligned the entry function signature with the latestflash-linear-attentioninterface
🛠️ Additional Improvements
- Upgraded the tilelang dependency to v0.1.9
- Updated unit tests
📊 Benchmarks
🙏 Acknowledgements
Special thanks to all contributors and community members for their valuable feedback. We welcome your continued participation through issues and pull requests.