cuBLAS: arch= detection broken since b1795, with clunky workaround #5046

themanyone · 2024-01-20T06:42:49Z

To reproduce: Compile with cuBLAS support and launch with -ngl flag.
(-allow-unsupported-compiler is problematic but not related to this particular issue)

~~NVCCFLAGS=-allow-unsupported-compiler make LLAMA_CUBLAS=1 -j 8~~

Run the resulting binary with -ngl flag (and any number > 0)

$ ./main -ngl 1 ...

Expected result: Works.
And this has worked fine for me up until this point.

Exception: On b1795 and later, this results in a crash.
ERROR: ggml-cuda was compiled without support for the current GPU architecture.
[nasty crash messages deleted to save eye strain]

Workaround:
It is now necessary to tack on a clunky, CUDA_DOCKER_ARCH tag to make everything work again.
~~CUDA_DOCKER_ARCH=compute_50 NVCCFLAGS=-allow-unsupported-compiler make LLAMA_CUBLAS=1 -j 8~~

Details

I'm no longer using `allow-unsupported-compiler as it's unsupported...

For now, you may have to look up your card's compute capability here and modify the above compile line. My card is compute_50 (Compute capability 5.0). Yours will probably be different. This workaround will go away soon, hopefully, when it is no longer necessary. Or use git checkout to try earlier versions that compile with no problems.

$ nvidia-smi
Fri Jan 19 21:32:15 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro M3000M                  Off | 00000000:01:00.0  On |                  N/A |
| N/A   40C    P8               9W /  75W |    296MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      3734      G   /usr/libexec/Xorg                           148MiB |
|    0   N/A  N/A      4426      G   /usr/lib64/firefox/firefox                  143MiB |
+---------------------------------------------------------------------------------------+

The text was updated successfully, but these errors were encountered:

LostRuins · 2024-01-20T09:45:38Z

We have observed the same issue with users on Pascal cards when compiling with -arch=all-major, strangely newer cards are unaffected.

JohannesGaessler · 2024-01-22T08:01:10Z

I don't understand why this is happening. You are not compiling with LLAMA_CUDA_F16=1, right?

LostRuins · 2024-01-22T14:14:16Z

Nope. Strangely, compiling with -arch=all seems to work for everyone but -arch=all-major does not work for Pascal users, although both are fine on Turing (RTX 2060).

I suspect (but not confirmed) that this only started after the traps for bad_arch were added in #4556 as it seemed to be working prior to that, though I cannot confirm as I don't have a Pascal card myself.

JohannesGaessler · 2024-01-23T12:35:21Z

I changed the error message to be more informative. @themanyone can you try running the latest master code without the workaround and post the error message that you get?

themanyone · 2024-01-24T09:25:18Z

I changed the error message to be more informative. @themanyone can you try running the latest master code without the workaround and post the error message that you get?

The latest pull generates nvcc warning but it compiles (after spending part of a day hacking around error #5042).

nvcc warning : Cannot find valid GPU for '-arch=native', default arch is used

./main generates this error now when I tried LLAMA_CUDA_F16=1.
Guess my card isn't compatible with that option.

ggml-cuda.cu:5857: ERROR: CUDA kernel soft_max_f16 has no device code compatible with CUDA arch 500. ggml-cuda.cu was compiled for: 500

Other than that, it builds and works fine now. Thanks!

themanyone added the bug-unconfirmed label Jan 20, 2024

themanyone changed the title ~~arch detection broken since b1795, with clunky workaround~~ cuBLAS: arch= detection broken since b1795, with clunky workaround Jan 20, 2024

themanyone mentioned this issue Jan 20, 2024

When use the GPU, llama-cpp-python[server] keeps returning # #5014

Closed

LostRuins mentioned this issue Jan 20, 2024

Koboldcpp 1.55: ERROR: ggml-cuda was compiled without support for the current GPU architecture. LostRuins/koboldcpp#610

Closed

themanyone closed this as completed Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuBLAS: arch= detection broken since b1795, with clunky workaround #5046

cuBLAS: arch= detection broken since b1795, with clunky workaround #5046

themanyone commented Jan 20, 2024 •

edited

Loading

LostRuins commented Jan 20, 2024

JohannesGaessler commented Jan 22, 2024

LostRuins commented Jan 22, 2024

JohannesGaessler commented Jan 23, 2024

themanyone commented Jan 24, 2024 •

edited

Loading

cuBLAS: arch= detection broken since b1795, with clunky workaround #5046

cuBLAS: arch= detection broken since b1795, with clunky workaround #5046

Comments

themanyone commented Jan 20, 2024 • edited Loading

LostRuins commented Jan 20, 2024

JohannesGaessler commented Jan 22, 2024

LostRuins commented Jan 22, 2024

JohannesGaessler commented Jan 23, 2024

themanyone commented Jan 24, 2024 • edited Loading

themanyone commented Jan 20, 2024 •

edited

Loading

themanyone commented Jan 24, 2024 •

edited

Loading