Replace has_warp_spec with HAS_AUTO_WS env variable check #205

ardaunal · 2025-04-18T19:05:16Z

triton release/3.3.x branch only supports AutoWS which makes the check has_warp_spec = hasattr(tl, "async_task") incorrect. Instead this PR adds an environment variable check HAS_AUTO_WS = os.getenv("ENABLE_AUTO_WS") to replace that.

Depends on #187

cc: @xuzhao9

xuzhao9 · 2025-05-01T00:46:59Z

Can you please also merge #187 into this PR?

jeromeku · 2025-05-01T02:16:00Z

@ardaunal

Does the version of triton that ships with pytorch 2.8 support WS?

I have these versions:

pytorch-triton           3.3.0+git96316ce5
torch                    2.8.0.dev20250421+cu128

However, triton.Config doesn't support num_buffers_warp_spec, num_consumer_groups.

Also, in the triton release/3.3x branch, it seems WS is only enabled on Blackwell (cc >= 10) (see here and here).

…persistent' into ardau/replace-has-warp-spec-with-has-auto-ws

ardaunal · 2025-05-02T20:04:48Z

@jeromeku
release/3.3x branch should support WS on hopper. Your link points to main branch I think.

xuzhao9 · 2025-05-02T23:26:16Z

@ardaunal Sorry but can you do another rebase? There is a recent trunk fix: 0ee663d

Summary: - Add a target file for TK attention - Include seq_len_kv in the flash_attention benchmark Reviewed By: devashishshankar, jackiexu1992, xuzhao9 Differential Revision: D73878472 fbshipit-source-id: 2e16199e57aedbf5d58b7ec07310fdbde272df30

Summary: 1. Load gemm/addmm/bmm configs from inductor log and output json and csv to given directory. 2. Load the result json files into input tensors and run with Tritonbench Reviewed By: PaulZhang12 Differential Revision: D73898451 fbshipit-source-id: 6a89fab13dc9eff8058f7f9a5e6616e0bc61b829

Summary: Update the skip test yaml to fix the tests. Reviewed By: FindHao Differential Revision: D73932518 fbshipit-source-id: 4b5c0048eb32590853c20a10167adf3fb921d4b5

Summary: low_mem_dropout has too low tflops and is causing problem: pytorch/test-infra#6594 Remove it from tflops nightly. Pull Request resolved: pytorch-labs#216 Reviewed By: FindHao Differential Revision: D73970729 Pulled By: xuzhao9 fbshipit-source-id: fb2c0ca8061025065a2d40f8015adf4592dbce1a

Summary: Changes needed to run the flash_attention variants on servicelab and extract the data to the dashboard. Reviewed By: minjang Differential Revision: D73976270 fbshipit-source-id: c2da33763123e5d314a41914575eb929a1874867

Summary: Minor change to select the correct shapes depending on the args for b200 fp8_gemm_rowwise_prefill. Reviewed By: minjang Differential Revision: D73976271 fbshipit-source-id: 1b52c0e570554cd4d8774be3725af354cf69c603

Reviewed By: dtolnay Differential Revision: D74066175 fbshipit-source-id: 5ea56fd74a0c9afa45c13b08dcdeb09d3b87c754

Summary: pytorch-labs@d4a1e60 broke the code for OSS, fixing it. Pull Request resolved: pytorch-labs#217 Reviewed By: FindHao Differential Revision: D74087811 Pulled By: xuzhao9 fbshipit-source-id: bc941d586b6211655ae2f28179173450277b8604

Summary: We have fixed ThunderKittens in installation and can enable it in unit test now. Pull Request resolved: pytorch-labs#218 Reviewed By: FindHao Differential Revision: D74107030 Pulled By: xuzhao9 fbshipit-source-id: 19b4c1457e8e6b07d04e269f5aee343c697b78be

Summary: Match the results between two autotune configs. Reviewed By: FindHao Differential Revision: D74085680 fbshipit-source-id: c2778efea6a5e440ac7c9e3e442ce550c4542e65

Summary: Load operators and their kernels from metadata YAML files. This is to generate top-10 kernels and their inputs in data directory. Reviewed By: FindHao Differential Revision: D74132693 fbshipit-source-id: fe10b551947eec0691bdf2cc25a576f3a182bae8

Summary: Refactor `jagged_dense_dense_sum` operator to use input loader from durin data. Reviewed By: FindHao Differential Revision: D74135557 fbshipit-source-id: 30dd098010381d088cc66b49b62340e8e5eabfca

xuzhao9 and others added 2 commits April 15, 2025 08:28

update tma_ws

1426478

Replace has_warp_spec with HAS_AUTO_WS env variable check

f89090c

facebook-github-bot added the cla signed label Apr 18, 2025

ardaunal had a problem deploying to docker-s3-upload April 18, 2025 19:50 — with GitHub Actions Failure

ardaunal temporarily deployed to docker-s3-upload April 18, 2025 19:50 — with GitHub Actions Inactive

Merge branch 'main' into ardau/replace-has-warp-spec-with-has-auto-ws

f1e4b33

ardaunal had a problem deploying to docker-s3-upload May 1, 2025 00:47 — with GitHub Actions Failure

ardaunal temporarily deployed to docker-s3-upload May 1, 2025 00:47 — with GitHub Actions Inactive

Merge remote-tracking branch 'remotes/upstream/xz9/flash_attn_tma_ws_…

5f619a7

…persistent' into ardau/replace-has-warp-spec-with-has-auto-ws

ardaunal had a problem deploying to docker-s3-upload May 2, 2025 23:16 — with GitHub Actions Failure

y-sq and others added 13 commits May 7, 2025 10:21

Fix failed internal tests

9e0141f

Summary: Update the skip test yaml to fix the tests. Reviewed By: FindHao Differential Revision: D73932518 fbshipit-source-id: 4b5c0048eb32590853c20a10167adf3fb921d4b5

pytorch/tritonbench

3ab3b64

Reviewed By: dtolnay Differential Revision: D74066175 fbshipit-source-id: 5ea56fd74a0c9afa45c13b08dcdeb09d3b87c754

Match two autotune configs

ea9412e

Summary: Match the results between two autotune configs. Reviewed By: FindHao Differential Revision: D74085680 fbshipit-source-id: c2778efea6a5e440ac7c9e3e442ce550c4542e65

Add input loader for jagged_dense_dense_sum

9d12201

Summary: Refactor `jagged_dense_dense_sum` operator to use input loader from durin data. Reviewed By: FindHao Differential Revision: D74135557 fbshipit-source-id: 30dd098010381d088cc66b49b62340e8e5eabfca

update tma_ws

1e9a765

ardaunal temporarily deployed to docker-s3-upload May 7, 2025 22:04 — with GitHub Actions Inactive

ardaunal had a problem deploying to docker-s3-upload May 7, 2025 22:04 — with GitHub Actions Failure

ardaunal had a problem deploying to docker-s3-upload May 8, 2025 12:56 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace has_warp_spec with HAS_AUTO_WS env variable check #205

Replace has_warp_spec with HAS_AUTO_WS env variable check #205

ardaunal commented Apr 18, 2025 •

edited

Loading

xuzhao9 commented May 1, 2025

jeromeku commented May 1, 2025

ardaunal commented May 2, 2025

xuzhao9 commented May 2, 2025

Replace has_warp_spec with HAS_AUTO_WS env variable check #205

Are you sure you want to change the base?

Replace has_warp_spec with HAS_AUTO_WS env variable check #205

Conversation

ardaunal commented Apr 18, 2025 • edited Loading

xuzhao9 commented May 1, 2025

jeromeku commented May 1, 2025

ardaunal commented May 2, 2025

xuzhao9 commented May 2, 2025

ardaunal commented Apr 18, 2025 •

edited

Loading