Disabling ggml_flash_attn_ext_set_prec #15838
-
Line 1308 in 3c3635d Does disabling this have any implications? Without this, the FP16 kernel gets used, it gets about 5-10% more performance in PP. llama.cpp/ggml/src/ggml-cuda/fattn.cu Lines 414 to 415 in 3c3635d (I tested GPQA with this disabled and the results looks fine) |
Beta Was this translation helpful? Give feedback.
Answered by
pt13762104
Sep 7, 2025
Replies: 1 comment
-
since #15769, this is a no-op. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
pt13762104
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
since #15769, this is a no-op.