Skip to content

[User] Running perplexity for LLaMA2 with CLBlast segfaults #2736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
KerfuffleV2 opened this issue Aug 23, 2023 · 9 comments
Closed

[User] Running perplexity for LLaMA2 with CLBlast segfaults #2736

KerfuffleV2 opened this issue Aug 23, 2023 · 9 comments

Comments

@KerfuffleV2
Copy link
Collaborator

Current Behavior

Running perplexity segfaults. Seems like this occurs right at the end of calculating the first block.

Environment and Context

Tested with b8ad1b6 but this issue has been around for a while. Notably from before the GGUF stuff got merged, so it's not a problem with GGUF or subsequent changes.

  • Physical (or virtual) hardware you are using, e.g. for Linux:

GPU is an AMD Radeon RX 6600.

ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'                                                                                               
ggml_opencl: selecting device: 'gfx1030'
ggml_opencl: device FP16 support: true
  • Operating System, e.g. for Linux:

Linux 6.4.11-arch2-1 #1 SMP PREEMPT_DYNAMIC Sat, 19 Aug 2023 15:38:34 +0000 x86_64 GNU/Linux

  • SDK version, e.g. for Linux:

Not sure if it matters but the CLBlast version is 1.6.1.

Failure Information (for bugs)

Steps to Reproduce

Seems like this happens with LLaMA2 models specifically. I can confirm it definitely happens with openorca-platypus2-13b.ggmlv3.q5_K_M.

Can anyone else with CLBLast + AMD GPU replicate the issue on a 13B LLaMA2 model?

Failure Logs

I tried compiling with LLAMA_DEBUG=1 and running in GDB but the results weren't too helpful. It crashes deep in the AMD libraries doing a memcpy in some separate thread.

@ghost
Copy link

ghost commented Aug 23, 2023

Hi, I don't have the hardware to test your case, but Android with CLBlast shows wild perplexity.

hchenphd mentions perplexity so high that his system can't represent the value, further proving ggml/gguf is probably unrelated.

@KerfuffleV2
Copy link
Collaborator Author

Hmm, not sure if this is the same thing since my problem seems specific to LLaMA2. I didn't have an issue running it on a LLaMA1 7B model.

@klosax
Copy link
Contributor

klosax commented Aug 23, 2023

Any difference using other ctx sizes or batch sizes?

@KerfuffleV2
Copy link
Collaborator Author

Any difference using other ctx sizes or batch sizes?

Good question. Got some weird results:

args result
-b 512 -c 128 CRASH (with "corrupted double-linked list")
-b 512 -c 512 CRASH
-b 520 -c 520 CRASH
-b 528 -c 528 CRASH
-b 544 -c 544 CRASH
-b 570 -c 570 CRASH (but after time for first block)
-b 576 -c 576 OK
-b 512 -c 1024 CRASH
-b 512 -c 2048 CRASH
-b 640 -c 640 OK
-b 768 -c 768 OK
-b 1024 -c 1024 OK
-b 1024 -c 1026 OK

CRASH by itself means it crashes immediately after (apparently) computing the first block without any other output. I am pretty sure it's a memory corruption issue and memory corruption can break stuff immediately or not. So the fact that it works sometimes might not indicate it's really okay.

@netrunnereve
Copy link
Collaborator

With Clover and CLBlast I've never had any issues with LLaMA 2 perplexity on my AMD card. I tried running with some of the ctx/batch sizes that caused you to crash on a 13B and was able to process a couple blocks with no issues.

ggml_opencl: selecting platform: 'Clover'
ggml_opencl: selecting device: 'AMD Radeon FirePro W8100 (hawaii, LLVM 15.0.7, DRM 3.42, 5.15.0-79-generic)'
ggml_opencl: device FP16 support: false

Does your RX 6600 even support Clover or is it ROCM OpenCL only? It might be worth a try to see if that fixes things, though keep in mind Clover is known to be slow.

@KerfuffleV2
Copy link
Collaborator Author

It might be worth a try to see if that fixes things, though keep in mind Clover is known to be slow.

But, but, but I don't want to use the slow thing! :) Using CLBlast is actually a lot slower than the ROCM patch (which was finally merged 🎉) also.

Seems like there might be Clover stuff in Mesa but it also seems like Clover is on the way out: https://www.phoronix.com/news/Mesa-Delete-Clover-Discussion - probably would be better to just try to use the normal OpenCL stuff in Mesa I would think.

@netrunnereve
Copy link
Collaborator

Yep I'm excited for Rusticl - but by then we should have Vulkan llama.cpp support and that's the better choice going forwards.

@shibe2
Copy link
Contributor

shibe2 commented Oct 12, 2023

I recently fixed buffer overflow in CLBlast backend. Maybe it was the cause here. If someone will confirm that my change fixes the crash, I will link this issue when I will be merging the fix. It may be triggered by certain models and batch sizes, so known problematic combinations may be used for testing.

@KerfuffleV2
Copy link
Collaborator Author

@shibe2 Thanks, I think you already fixed it in one of your previous changes. I can't reproduce the issue anymore even without #3603.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants