0.1.0

egaoharu-kensei released this 12 Dec 09:58

· 15 commits to main since this release

49c4463

Firts release. Contains FlashAttention-2 Triton implementation based on Tri Dao's paper "FlashAttention-2:
Faster Attention with Better Parallelism and Work Partitioning".

Key Features

Cross-platform support (Linux and Windows)
Dual-mode operation: deterministic (sequence-parallel disabled) and non-deterministic (higher performance)
Hardware-aware optimizations for Turing (CC 7.5) and Ampere+ (CC 8.0+) architectures
Custom configuration support for older GPU architectures or specialized tuning
Support for homo and heterogeneous GPU clusters with automatic configuration selection

For more details, refer to the project’s README.

Status

Version 0.1.0 is an early stable release (Beta). The API is subject to change in future versions as the project evolves.

Assets 2