Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows #10723

stduhpf · 2024-12-08T16:46:01Z

I noticed when running test-backend-ops that the TANH op can sometimes output NaNs with Vulkan. I didn't experience any issues because of it, but it was a simple fix, that should hopefuly not cause any noticable slowdown.

Master:

> .\build\bin\Release\test-backend-ops.exe  | Select-String -Pattern "TANH"
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: none
ggml_vulkan: Compiling shaders..........................Done!

  TANH(type=f32,ne_a=[128,2,2,2],v=0): [TANH] NaN at index 0 (Vulkan0=-nan(ind) CPU=-1.000000) FAIL
  TANH(type=f32,ne_a=[5,7,11,13],v=0): [TANH] NaN at index 2 (Vulkan0=-nan(ind) CPU=1.000000) FAIL
  TANH(type=f32,ne_a=[128,2,2,2],v=1): not supported [Vulkan0]
  TANH(type=f32,ne_a=[5,7,11,13],v=1): not supported [Vulkan0]

PR:

>.\build\bin\Release\test-backend-ops.exe  | Select-String -Pattern "TANH"
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 5700 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | warp size: 64 | matrix cores: none
ggml_vulkan: Compiling shaders..........................Done!

  TANH(type=f32,ne_a=[128,2,2,2],v=0): OK
  TANH(type=f32,ne_a=[5,7,11,13],v=0): OK
  TANH(type=f32,ne_a=[128,2,2,2],v=1): not supported [Vulkan0]
  TANH(type=f32,ne_a=[5,7,11,13],v=1): not supported [Vulkan0]

0cc4m · 2024-12-08T17:14:50Z

We've had this issue before (#5260), last time I fixed it by replacing tanh(x) with 1 - 2 / (e^(2x) + 1). Not sure which version is more performant.

stduhpf · 2024-12-08T17:27:43Z

It looks like doing 1 - 2 / (e^(2x) + 1) is faster, at least on my end: https://www.shadertoy.com/view/MfGBzR

Edit: Actually, it looks like it's even faster than built-in tanh? 🤔 Maybe webGLSL isn't such a good way to test performance....

0cc4m · 2024-12-08T18:18:30Z

It looks like doing 1 - 2 / (e^(2x) + 1) is faster, at least on my end: https://www.shadertoy.com/view/MfGBzR

Edit: Actually, it looks like it's even faster than built-in tanh? 🤔 Maybe webGLSL isn't such a good way to test performance....

I think it's unlikely that any of these versions is slow enough to cause an issue, looks good, thank you for the fix.

This just leaves the q2_k and q3_k MMV issue on Windows AMD, all the other unit test failures are barely above the threshold.

…gml-org#10723) * Vulkan: fix NaN in tanh.comp * Faster NaN-free tanh

Vulkan: fix NaN in tanh.comp

93bdbc6

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Dec 8, 2024

Faster NaN-free tanh

d94ff95

0cc4m approved these changes Dec 8, 2024

View reviewed changes

0cc4m merged commit 06d7014 into ggml-org:master Dec 8, 2024
2 checks passed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (g…

e136690

…gml-org#10723) * Vulkan: fix NaN in tanh.comp * Faster NaN-free tanh

tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025

Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (g…

ff28bf2

…gml-org#10723) * Vulkan: fix NaN in tanh.comp * Faster NaN-free tanh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows #10723

Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows #10723

stduhpf commented Dec 8, 2024

0cc4m commented Dec 8, 2024

stduhpf commented Dec 8, 2024 •

edited

Loading

0cc4m commented Dec 8, 2024

Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows #10723

Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows #10723

Conversation

stduhpf commented Dec 8, 2024

Master:

PR:

0cc4m commented Dec 8, 2024

stduhpf commented Dec 8, 2024 • edited Loading

0cc4m commented Dec 8, 2024

stduhpf commented Dec 8, 2024 •

edited

Loading