Skip to content

0.1.0

Choose a tag to compare

@egaoharu-kensei egaoharu-kensei released this 12 Dec 09:58
· 15 commits to main since this release

Firts release. Contains FlashAttention-2 Triton implementation based on Tri Dao's paper "FlashAttention-2:
Faster Attention with Better Parallelism and Work Partitioning"
.

Key Features

  • Cross-platform support (Linux and Windows)
  • Dual-mode operation: deterministic (sequence-parallel disabled) and non-deterministic (higher performance)
  • Hardware-aware optimizations for Turing (CC 7.5) and Ampere+ (CC 8.0+) architectures
  • Custom configuration support for older GPU architectures or specialized tuning
  • Support for homo and heterogeneous GPU clusters with automatic configuration selection

For more details, refer to the project’s README.

Status

Version 0.1.0 is an early stable release (Beta). The API is subject to change in future versions as the project evolves.