Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add splitk as a post op for gemm
q=self.q_proj(hidden_states).view(-1,self.num_local_heads,self.qk_head_dim)
q.split([self.qk_nope_head_dim,·self.qk_rope_head_dim],dim=-1)
In the case of MLA shape, the splitk dimension is M=512, N=24576, K=1536, NUM_HEAD=128, NOPE_DIM=128, ROPE_DIM=64
y=xwT (512x1536 24576x1536)
y=y.reshape(512, 128, 192)
y_nope, y_rope=y.splitk(128,64)
y_nope (512x128x128)
y_rope (512x128x64)