You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement Cutlass Memory Efficient Attention Kernel into Group Query
Attention Operator.
### Motivation and Context
Before this change, Group Query Attention Operator was supported only by
Flash-Attention. While this is the most efficient kernel for the
operation, it only supports sm >= 80. Cutlass Memory Efficient Attention
Kernel supports sm >= 53, allowing us to support a broader range of GPU
hardware.
Copy file name to clipboardExpand all lines: docs/ContribOperators.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2422,14 +2422,14 @@ This version of the operator has been available since version 1 of the 'com.micr
2422
2422
<dd>When buffered past_key and past_value is used (present_key uses same tensor as past_key), requiredto specify past_sequence_length (could be 0). Otherwise, past_sequence_length inferred from past_key.</dd>
2423
2423
</dl>
2424
2424
2425
-
#### Outputs (1 - 3)
2425
+
#### Outputs
2426
2426
2427
2427
<dl>
2428
2428
<dt><tt>output</tt> : T</dt>
2429
2429
<dd>3D output tensor with shape (batch_size, sequence_length, hidden_size)</dd>
2430
-
<dt><tt>present_key</tt> (optional) : T</dt>
2430
+
<dt><tt>present_key</tt> : T</dt>
2431
2431
<dd>present state key with support for format BSNH or BNSH. When past_key uses same tensor as present_key(k-v buffer), it is of length max_sequence_length... otherwise of length past_sequence_length +kv_sequence_length.</dd>
2432
-
<dt><tt>present_value</tt> (optional) : T</dt>
2432
+
<dt><tt>present_value</tt> : T</dt>
2433
2433
<dd>present state value with support for format BSNH or BNSH. When past_value uses same tensor as present_value(k-v buffer), it is of length max_sequence_length... otherwise of length past_sequence_length +kv_sequence_length.</dd>
0 commit comments