Disabling ggml_flash_attn_ext_set_prec #15838

pt13762104 · 2025-09-06T13:49:13Z

pt13762104
Sep 6, 2025

Line 1308 in 3c3635d

ggml_flash_attn_ext_set_prec (cur, GGML_PREC_F32);

Does disabling this have any implications? Without this, the FP16 kernel gets used, it gets about 5-10% more performance in PP.

Lines 414 to 415 in 3c3635d

    
           if (prec == GGML_PREC_DEFAULT && fast_fp16_available(cc)) { 
        
               return BEST_FATTN_KERNEL_TILE_F16;

(I tested GPQA with this disabled and the results looks fine)

Answered by pt13762104

since #15769, this is a no-op.

pt13762104 · 2025-09-07T04:01:22Z

since #15769, this is a no-op.

0 replies