0.1.0
Firts release. Contains FlashAttention-2 Triton implementation based on Tri Dao's paper "FlashAttention-2:
Faster Attention with Better Parallelism and Work Partitioning".
Key Features
- Cross-platform support (Linux and Windows)
- Dual-mode operation: deterministic (sequence-parallel disabled) and non-deterministic (higher performance)
- Hardware-aware optimizations for Turing (CC 7.5) and Ampere+ (CC 8.0+) architectures
- Custom configuration support for older GPU architectures or specialized tuning
- Support for homo and heterogeneous GPU clusters with automatic configuration selection
For more details, refer to the project’s README.
Status
Version 0.1.0 is an early stable release (Beta). The API is subject to change in future versions as the project evolves.