Skip to content

Conversation

@dxqb
Copy link
Collaborator

@dxqb dxqb commented Dec 21, 2025

Split attention can significantly speed up higher-resolution training, and somewhat low resolution training:
grafik

To test:

git fetch origin pull/1218/head:pr-1218
git switch pr-1218

then update.

limitations:

  • only implemented for Qwen, even though it could easily be done for Chroma, Z-Image, more
  • probably only for Linux, unless torch have recently improved SDPA in torch 2.8 or 2.9. To be tested

On linux, select
grafik

On Windows, install Flash attention and select FLASH_SPLIT
Here are pre-built wheels for Windows by @zzlol63 https://github.com/zzlol63/flash-attention-prebuild-wheels/releases

uses huggingface/diffusers#12870

@dxqb dxqb linked an issue Dec 21, 2025 that may be closed by this pull request
dxqb added 2 commits December 26, 2025 19:03
… attention

update

merge with attention selection, add FLASH_SPLIT
@dxqb
Copy link
Collaborator Author

dxqb commented Dec 26, 2025

this should now also work on windows, with similar speed-up. Though I have no way to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feat]: Splitting batched attention

1 participant