-
Notifications
You must be signed in to change notification settings - Fork 11.8k
cuBLAS: arch= detection broken since b1795, with clunky workaround #5046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We have observed the same issue with users on Pascal cards when compiling with |
I don't understand why this is happening. You are not compiling with |
Nope. Strangely, compiling with I suspect (but not confirmed) that this only started after the traps for |
I changed the error message to be more informative. @themanyone can you try running the latest master code without the workaround and post the error message that you get? |
The latest pull generates
Other than that, it builds and works fine now. Thanks! |
To reproduce: Compile with
cuBLAS
support and launch with-ngl
flag.(
-allow-unsupported-compiler
is problematic but not related to this particular issue)NVCCFLAGS=-allow-unsupported-compiler make LLAMA_CUBLAS=1 -j 8
Run the resulting binary with
-ngl
flag (and any number > 0)$ ./main -ngl 1 ...
Expected result: Works.
And this has worked fine for me up until this point.
Exception: On b1795 and later, this results in a crash.
ERROR: ggml-cuda was compiled without support for the current GPU architecture.
[nasty crash messages deleted to save eye strain]
Workaround:
It is now necessary to tack on a clunky, CUDA_DOCKER_ARCH tag to make everything work again.
CUDA_DOCKER_ARCH=compute_50 NVCCFLAGS=-allow-unsupported-compiler make LLAMA_CUBLAS=1 -j 8
Details
The text was updated successfully, but these errors were encountered: