Refactor benchmarks for Flash Attention Prefill #447

muhammad-tanvir-1211 · 2025-06-26T09:39:03Z

This PR adds benchmarks for all the different data types supported with Flash Attention Prefill. It is a continuation of PR #443

* Add comment on final output type conversion

This PR separates the output type and accumulator type for Flash Attention Prefill. Combinations supported are: * bf16 inputs, fp32 accumulator, bf16 | fp32 output * fp16 inputs, fp32 accumulator, fp16 | fp32 output * fp8 inputs, fp32 accumulator, fp8 | fp32 output Tests added in: #446 Benchmarks added in: #447 --------- Co-authored-by: Alejandro Acosta <[email protected]>

applications/flash_attention_v2/collective/xe_flash_attn_prefill_epilogue.hpp

t4c1 · 2025-07-02T15:53:38Z

benchmarks/flash_attention/flash_attention_prefill/benchmark_runner.hpp

@@ -77,12 +77,17 @@ struct FMHAOptions {
    cmd.get_cmd_line_argument("num_heads_kv", num_heads_kv, num_heads_q);
    cmd.get_cmd_line_argument("seq_len_qo", seq_len_qo, 512);
    cmd.get_cmd_line_argument("seq_len_kv", seq_len_kv, seq_len_qo);
-    cmd.get_cmd_line_argument("head_size_vo", head_size_vo, 128);
+    cmd.get_cmd_line_argument("head_size_vo", head_size_vo, HEAD_DIM);


Why even keep this as an option if it must be set to a certain value?

This helps throw an error on line 88 for the case when the head_size_vo received by a benchmark does not match the head_size_vo this benchmark was built for. For example, if we pass head_size_vo=96 to the cutlass_benchmark_flash_attention_prefill_h64_xe exe file, it should not run the benchmark and throw an error instead.

What I am trying to point out is that given we have the head size in the name of the benchmark, we should not need an argument for head size at all. Without the argument there is no chance of passing wrong argument value.

Separate out type and accum type

f64c7b7

muhammad-tanvir-1211 requested a review from a team June 26, 2025 09:39

muhammad-tanvir-1211 mentioned this pull request Jun 26, 2025

Separate output and accumulator type for Flash Attention Prefill #443

Merged

muhammad-tanvir-1211 force-pushed the flash_prefill_separate_out_benchmarks branch 2 times, most recently from c863845 to e7ec638 Compare June 26, 2025 11:36

muhammad-tanvir-1211 added 2 commits June 27, 2025 14:41

Removed redundant wait from runners

2a87b5f

* Add comment on final output type conversion

Add benchmarks for all types with refactor

6a7ad57

muhammad-tanvir-1211 force-pushed the flash_prefill_separate_out_benchmarks branch from e7ec638 to 6a7ad57 Compare June 27, 2025 13:43

muhammad-tanvir-1211 added 2 commits July 1, 2025 09:21

Merge branch 'sycl-develop' into flash_prefill_separate_out_benchmarks

3f5c19a

Update output conversion

9503357

t4c1 reviewed Jul 2, 2025

View reviewed changes

Hardcode head_size_vo

2c679d9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor benchmarks for Flash Attention Prefill #447

Refactor benchmarks for Flash Attention Prefill #447

Uh oh!

muhammad-tanvir-1211 commented Jun 26, 2025

Uh oh!

Uh oh!

t4c1 Jul 2, 2025

Uh oh!

muhammad-tanvir-1211 Jul 3, 2025

Uh oh!

t4c1 Jul 3, 2025

Uh oh!

Uh oh!

Refactor benchmarks for Flash Attention Prefill #447

Are you sure you want to change the base?

Refactor benchmarks for Flash Attention Prefill #447

Uh oh!

Conversation

muhammad-tanvir-1211 commented Jun 26, 2025

Uh oh!

Uh oh!

t4c1 Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

muhammad-tanvir-1211 Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

t4c1 Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!