Skip to content

Conversation

@drisspg
Copy link
Collaborator

@drisspg drisspg commented Jan 31, 2026

Stacked PRs:


[Draft][Ai-assisted] CLC work stealing

Not for land yet; the scheduling leaking abstraction all over the place is bad, im going to find a better way to encapsulate that.

I had to help alot on this one even though claude and codex did most of the setup.

Example work steal "trace" recreated from printf logs
image

Perf run

We would expect the highest gain for the most imbalanced workloads with the current scheduling for flex; We kind of see that e.g. alibi + causal are the same and dont currently have lpt schedule set. Document mask also sees a nice boost which makes sense

image

I dont really know why noop (fully dense fa4 with no sparsity and no scoremod) takes a hit for hdim 64 but only on NON - GQA path, the pattern looks too regular to be chance

What is needed to figure out before land

  1. A more unified API and a mechanism for turning on with the env var. think we should universally enable for Flex use cases. I think fwd only is fine for now but likely will want bwd integration
  2. I have been debugging the weirdest race condition for the 128x128 test. I have narrowed it down somewhat. My current working theory is that repsonse_ptr gets allocated with random smem data. We have num_tiles < num_sms, so only initial work is needed. We query clc, if we print after the consumer_wait in warp15 we see that we clc says no more work all invalid. The other consumer warps are not properly syncing and some of them end up pulling the random reponse data before its actually beeen populated. Racecheck shows me some error but not helping find the source of this race..
  3. The register spills for no-op MHA case are weird. I also spent some time debugging. NCU points pretty much to a huge registor spill. However there is not really any good reason for this to be happening in this case and not others (AFAIK). I dumped the ptx and then compiled with 13.1 ptxas and it showed no spills in the sass (claude was helping here). Im not 100% convinced this is just random ptxas edge case but leaning that way. Also the ptxas patch thing didnt seem to be realy working .. something else to look at.

drisspg added a commit that referenced this pull request Jan 31, 2026
stack-info: PR: #2218, branch: drisspg/stack/8
@drisspg drisspg mentioned this pull request Jan 31, 2026
@tridao
Copy link
Member

tridao commented Jan 31, 2026

I like the direction, CLC is the right thing to do

@tridao
Copy link
Member

tridao commented Jan 31, 2026

Cc @tzadouri who’s thinking about scheduling and persistence

@drisspg drisspg marked this pull request as draft February 1, 2026 22:23
@drisspg drisspg changed the base branch from drisspg/stack/7 to main February 1, 2026 22:23
drisspg added a commit that referenced this pull request Feb 1, 2026
stack-info: PR: #2218, branch: drisspg/stack/8
@drisspg drisspg changed the base branch from main to drisspg/stack/7 February 1, 2026 22:23
@drisspg drisspg marked this pull request as ready for review February 1, 2026 22:23
@drisspg drisspg marked this pull request as draft February 3, 2026 18:42
@drisspg drisspg changed the base branch from drisspg/stack/7 to main February 3, 2026 18:42
drisspg added a commit that referenced this pull request Feb 3, 2026
stack-info: PR: #2218, branch: drisspg/stack/8
@drisspg drisspg changed the base branch from main to drisspg/stack/7 February 3, 2026 18:43
@drisspg drisspg marked this pull request as ready for review February 3, 2026 18:43
@drisspg drisspg marked this pull request as draft February 3, 2026 21:53
@drisspg drisspg changed the base branch from drisspg/stack/7 to main February 3, 2026 21:53
drisspg added a commit that referenced this pull request Feb 3, 2026
stack-info: PR: #2218, branch: drisspg/stack/8
@drisspg drisspg changed the base branch from main to drisspg/stack/7 February 3, 2026 21:54
@drisspg drisspg marked this pull request as ready for review February 3, 2026 21:54
@drisspg drisspg marked this pull request as draft February 3, 2026 21:57
@drisspg drisspg changed the base branch from drisspg/stack/7 to main February 3, 2026 21:57
drisspg added a commit that referenced this pull request Feb 3, 2026
stack-info: PR: #2218, branch: drisspg/stack/8
@drisspg drisspg changed the base branch from main to drisspg/stack/7 February 3, 2026 21:57
@drisspg drisspg marked this pull request as ready for review February 3, 2026 21:57
@drisspg drisspg marked this pull request as draft February 3, 2026 21:59
@drisspg drisspg changed the base branch from drisspg/stack/9 to main February 8, 2026 04:56
@drisspg drisspg changed the base branch from main to drisspg/stack/9 February 8, 2026 04:58
@drisspg drisspg mentioned this pull request Feb 8, 2026
drisspg added a commit to drisspg/flash-attention that referenced this pull request Feb 8, 2026
stack-info: PR: Dao-AILab#2218, branch: drisspg/stack/8
@drisspg drisspg changed the base branch from drisspg/stack/9 to main February 8, 2026 04:59
@drisspg drisspg changed the base branch from main to drisspg/stack/9 February 8, 2026 04:59
drisspg added a commit to drisspg/flash-attention that referenced this pull request Feb 8, 2026
stack-info: PR: Dao-AILab#2218, branch: drisspg/stack/8
@drisspg drisspg changed the base branch from drisspg/stack/9 to main February 8, 2026 05:01
@drisspg drisspg changed the base branch from main to drisspg/stack/9 February 8, 2026 05:02
drisspg added a commit to drisspg/flash-attention that referenced this pull request Feb 8, 2026
stack-info: PR: Dao-AILab#2218, branch: drisspg/stack/8
@drisspg drisspg changed the base branch from drisspg/stack/9 to main February 8, 2026 17:33
@drisspg drisspg changed the base branch from main to drisspg/stack/9 February 8, 2026 17:33
drisspg added a commit to drisspg/flash-attention that referenced this pull request Feb 8, 2026
stack-info: PR: Dao-AILab#2218, branch: drisspg/stack/8
@drisspg drisspg changed the base branch from drisspg/stack/9 to main February 8, 2026 22:50
@drisspg drisspg changed the base branch from main to drisspg/stack/9 February 8, 2026 22:50
@drisspg drisspg changed the base branch from drisspg/stack/9 to main February 8, 2026 22:52
@drisspg drisspg changed the base branch from main to drisspg/stack/9 February 8, 2026 22:52
stack-info: PR: #2218, branch: drisspg/stack/8
@drisspg drisspg changed the base branch from drisspg/stack/9 to main February 8, 2026 22:56
@drisspg drisspg changed the base branch from main to drisspg/stack/9 February 8, 2026 22:56
@drisspg drisspg changed the base branch from drisspg/stack/9 to main February 10, 2026 04:01
@drisspg drisspg changed the base branch from main to drisspg/stack/9 February 10, 2026 04:01
@drisspg drisspg marked this pull request as ready for review February 10, 2026 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants