Skip to content

cuBLAS: arch= detection broken since b1795, with clunky workaround #5046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
themanyone opened this issue Jan 20, 2024 · 5 comments
Closed

cuBLAS: arch= detection broken since b1795, with clunky workaround #5046

themanyone opened this issue Jan 20, 2024 · 5 comments

Comments

@themanyone
Copy link

themanyone commented Jan 20, 2024

To reproduce: Compile with cuBLAS support and launch with -ngl flag.
(-allow-unsupported-compiler is problematic but not related to this particular issue)

NVCCFLAGS=-allow-unsupported-compiler make LLAMA_CUBLAS=1 -j 8

Run the resulting binary with -ngl flag (and any number > 0)

$ ./main -ngl 1 ...

Expected result: Works.
And this has worked fine for me up until this point.

Exception: On b1795 and later, this results in a crash.
ERROR: ggml-cuda was compiled without support for the current GPU architecture.
[nasty crash messages deleted to save eye strain]

Workaround:
It is now necessary to tack on a clunky, CUDA_DOCKER_ARCH tag to make everything work again.
CUDA_DOCKER_ARCH=compute_50 NVCCFLAGS=-allow-unsupported-compiler make LLAMA_CUBLAS=1 -j 8

Details
I'm no longer using `allow-unsupported-compiler as it's unsupported...

For now, you may have to look up your card's compute capability here and modify the above compile line. My card is compute_50 (Compute capability 5.0). Yours will probably be different. This workaround will go away soon, hopefully, when it is no longer necessary. Or use git checkout to try earlier versions that compile with no problems.

$ nvidia-smi
Fri Jan 19 21:32:15 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro M3000M                  Off | 00000000:01:00.0  On |                  N/A |
| N/A   40C    P8               9W /  75W |    296MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      3734      G   /usr/libexec/Xorg                           148MiB |
|    0   N/A  N/A      4426      G   /usr/lib64/firefox/firefox                  143MiB |
+---------------------------------------------------------------------------------------+
@themanyone themanyone changed the title arch detection broken since b1795, with clunky workaround cuBLAS: arch= detection broken since b1795, with clunky workaround Jan 20, 2024
@LostRuins
Copy link
Collaborator

We have observed the same issue with users on Pascal cards when compiling with -arch=all-major, strangely newer cards are unaffected.

@JohannesGaessler
Copy link
Collaborator

I don't understand why this is happening. You are not compiling with LLAMA_CUDA_F16=1, right?

@LostRuins
Copy link
Collaborator

Nope. Strangely, compiling with -arch=all seems to work for everyone but -arch=all-major does not work for Pascal users, although both are fine on Turing (RTX 2060).

I suspect (but not confirmed) that this only started after the traps for bad_arch were added in #4556 as it seemed to be working prior to that, though I cannot confirm as I don't have a Pascal card myself.

@JohannesGaessler
Copy link
Collaborator

I changed the error message to be more informative. @themanyone can you try running the latest master code without the workaround and post the error message that you get?

@themanyone
Copy link
Author

themanyone commented Jan 24, 2024

I changed the error message to be more informative. @themanyone can you try running the latest master code without the workaround and post the error message that you get?

The latest pull generates nvcc warning but it compiles (after spending part of a day hacking around error #5042).

nvcc warning : Cannot find valid GPU for '-arch=native', default arch is used

./main generates this error now when I tried LLAMA_CUDA_F16=1.
Guess my card isn't compatible with that option.

ggml-cuda.cu:5857: ERROR: CUDA kernel soft_max_f16 has no device code compatible with CUDA arch 500. ggml-cuda.cu was compiled for: 500

Other than that, it builds and works fine now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants