-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[User] Running perplexity for LLaMA2 with CLBlast segfaults #2736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, I don't have the hardware to test your case, but Android with CLBlast shows wild perplexity. hchenphd mentions perplexity so high that his system can't represent the value, further proving ggml/gguf is probably unrelated. |
Hmm, not sure if this is the same thing since my problem seems specific to LLaMA2. I didn't have an issue running it on a LLaMA1 7B model. |
Any difference using other ctx sizes or batch sizes? |
Good question. Got some weird results:
|
With Clover and CLBlast I've never had any issues with LLaMA 2 perplexity on my AMD card. I tried running with some of the ctx/batch sizes that caused you to crash on a 13B and was able to process a couple blocks with no issues.
Does your RX 6600 even support Clover or is it ROCM OpenCL only? It might be worth a try to see if that fixes things, though keep in mind Clover is known to be slow. |
But, but, but I don't want to use the slow thing! :) Using CLBlast is actually a lot slower than the ROCM patch (which was finally merged 🎉) also. Seems like there might be Clover stuff in Mesa but it also seems like Clover is on the way out: https://www.phoronix.com/news/Mesa-Delete-Clover-Discussion - probably would be better to just try to use the normal OpenCL stuff in Mesa I would think. |
Yep I'm excited for Rusticl - but by then we should have Vulkan llama.cpp support and that's the better choice going forwards. |
I recently fixed buffer overflow in CLBlast backend. Maybe it was the cause here. If someone will confirm that my change fixes the crash, I will link this issue when I will be merging the fix. It may be triggered by certain models and batch sizes, so known problematic combinations may be used for testing. |
Current Behavior
Running perplexity segfaults. Seems like this occurs right at the end of calculating the first block.
Environment and Context
Tested with b8ad1b6 but this issue has been around for a while. Notably from before the GGUF stuff got merged, so it's not a problem with GGUF or subsequent changes.
GPU is an AMD Radeon RX 6600.
Linux 6.4.11-arch2-1 #1 SMP PREEMPT_DYNAMIC Sat, 19 Aug 2023 15:38:34 +0000 x86_64 GNU/Linux
Not sure if it matters but the CLBlast version is 1.6.1.
Failure Information (for bugs)
Steps to Reproduce
Seems like this happens with LLaMA2 models specifically. I can confirm it definitely happens with
openorca-platypus2-13b.ggmlv3.q5_K_M
.Can anyone else with CLBLast + AMD GPU replicate the issue on a 13B LLaMA2 model?
Failure Logs
I tried compiling with
LLAMA_DEBUG=1
and running in GDB but the results weren't too helpful. It crashes deep in the AMD libraries doing a memcpy in some separate thread.The text was updated successfully, but these errors were encountered: