Skip to content

Replace has_warp_spec with HAS_AUTO_WS env variable check #205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

ardaunal
Copy link

@ardaunal ardaunal commented Apr 18, 2025

triton release/3.3.x branch only supports AutoWS which makes the check has_warp_spec = hasattr(tl, "async_task") incorrect. Instead this PR adds an environment variable check HAS_AUTO_WS = os.getenv("ENABLE_AUTO_WS") to replace that.

Depends on #187

cc: @xuzhao9

@xuzhao9
Copy link
Contributor

xuzhao9 commented May 1, 2025

Can you please also merge #187 into this PR?

@ardaunal ardaunal had a problem deploying to docker-s3-upload May 1, 2025 00:47 — with GitHub Actions Failure
@ardaunal ardaunal temporarily deployed to docker-s3-upload May 1, 2025 00:47 — with GitHub Actions Inactive
@jeromeku
Copy link

jeromeku commented May 1, 2025

@ardaunal

Does the version of triton that ships with pytorch 2.8 support WS?

I have these versions:

pytorch-triton           3.3.0+git96316ce5
torch                    2.8.0.dev20250421+cu128

However, triton.Config doesn't support num_buffers_warp_spec, num_consumer_groups.

Also, in the triton release/3.3x branch, it seems WS is only enabled on Blackwell (cc >= 10) (see here and here).

…persistent' into ardau/replace-has-warp-spec-with-has-auto-ws
@ardaunal
Copy link
Author

ardaunal commented May 2, 2025

@jeromeku
release/3.3x branch should support WS on hopper. Your link points to main branch I think.

@ardaunal ardaunal had a problem deploying to docker-s3-upload May 2, 2025 23:16 — with GitHub Actions Failure
@ardaunal ardaunal had a problem deploying to docker-s3-upload May 2, 2025 23:16 — with GitHub Actions Failure
@xuzhao9
Copy link
Contributor

xuzhao9 commented May 2, 2025

@ardaunal Sorry but can you do another rebase? There is a recent trunk fix: 0ee663d

y-sq and others added 13 commits May 7, 2025 10:21
Summary:
- Add a target file for TK attention
- Include seq_len_kv in the flash_attention benchmark

Reviewed By: devashishshankar, jackiexu1992, xuzhao9

Differential Revision: D73878472

fbshipit-source-id: 2e16199e57aedbf5d58b7ec07310fdbde272df30
Summary:
1. Load gemm/addmm/bmm configs from inductor log and output json and csv to given directory.
2. Load the result json files into input tensors and run with Tritonbench

Reviewed By: PaulZhang12

Differential Revision: D73898451

fbshipit-source-id: 6a89fab13dc9eff8058f7f9a5e6616e0bc61b829
Summary: Update the skip test yaml to fix the tests.

Reviewed By: FindHao

Differential Revision: D73932518

fbshipit-source-id: 4b5c0048eb32590853c20a10167adf3fb921d4b5
Summary:
low_mem_dropout has too low tflops and is causing problem: pytorch/test-infra#6594

Remove it from tflops nightly.

Pull Request resolved: pytorch-labs#216

Reviewed By: FindHao

Differential Revision: D73970729

Pulled By: xuzhao9

fbshipit-source-id: fb2c0ca8061025065a2d40f8015adf4592dbce1a
Summary: Changes needed to run the flash_attention variants on servicelab and extract the data to the dashboard.

Reviewed By: minjang

Differential Revision: D73976270

fbshipit-source-id: c2da33763123e5d314a41914575eb929a1874867
Summary: Minor change to select the correct shapes depending on the args for b200 fp8_gemm_rowwise_prefill.

Reviewed By: minjang

Differential Revision: D73976271

fbshipit-source-id: 1b52c0e570554cd4d8774be3725af354cf69c603
Reviewed By: dtolnay

Differential Revision: D74066175

fbshipit-source-id: 5ea56fd74a0c9afa45c13b08dcdeb09d3b87c754
Summary:
pytorch-labs@d4a1e60 broke the code for OSS, fixing it.

Pull Request resolved: pytorch-labs#217

Reviewed By: FindHao

Differential Revision: D74087811

Pulled By: xuzhao9

fbshipit-source-id: bc941d586b6211655ae2f28179173450277b8604
Summary:
We have fixed ThunderKittens in installation and can enable it in unit test now.

Pull Request resolved: pytorch-labs#218

Reviewed By: FindHao

Differential Revision: D74107030

Pulled By: xuzhao9

fbshipit-source-id: 19b4c1457e8e6b07d04e269f5aee343c697b78be
Summary: Match the results between two autotune configs.

Reviewed By: FindHao

Differential Revision: D74085680

fbshipit-source-id: c2778efea6a5e440ac7c9e3e442ce550c4542e65
Summary:
Load operators and their kernels from metadata YAML files.
This is to generate top-10 kernels and their inputs in data directory.

Reviewed By: FindHao

Differential Revision: D74132693

fbshipit-source-id: fe10b551947eec0691bdf2cc25a576f3a182bae8
Summary: Refactor `jagged_dense_dense_sum` operator to use input loader from durin data.

Reviewed By: FindHao

Differential Revision: D74135557

fbshipit-source-id: 30dd098010381d088cc66b49b62340e8e5eabfca
@ardaunal ardaunal temporarily deployed to docker-s3-upload May 7, 2025 22:04 — with GitHub Actions Inactive
@ardaunal ardaunal had a problem deploying to docker-s3-upload May 7, 2025 22:04 — with GitHub Actions Failure
@ardaunal ardaunal had a problem deploying to docker-s3-upload May 8, 2025 12:56 — with GitHub Actions Failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants