-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Can't run codellama #2829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@guranu did you try to lower the number of offloaded gpu layers ( |
Tried that and it worked thanks for the help. |
But now it gives me a "Segmentation fault" error. |
Hi, it appears you're using an Android with CLBlast, which is okay, except I dunno any Android device that performs better on GPU currently. It's also notable that OpenCL for Android is bugged. It'll run, but I'd use CPU for now. As for the segmant fault, if I understand correctly then this should've been already resolved, but try adding Edit: Assuming I'm correct about Android then where are you storing your model? Navigate to the model in Termux, type |
Well i don't get a segmentation fault error anymore but it dosen't do anything like it dosen't type anything. Oh and the model is stored in /data/data/com.termux/files/home/llama.cpp/models and here are the """results"""":
it did absoluteley nothing. |
@klosax if I correctly understand then he shouldn't have to add a prompt, but getting seg fault even though this was resolved with https://github.com/ggerganov/llama.cpp/releases/tag/b1046. Adding @guranu Your model location is good. You may be over-loading your device with |
Nope that didn't help at all. Also my device is a Oneplus 7. |
well it dosen't do nothing now i does "something" it says that it uses 0.48 tokens per second but i didn't get any kind of text:
|
The device is over-loaded, so try 4_K_M or 4_K_S as Q5_K_S is quite large. gsmarena shows that device has 8 cores, but you may need to find the best I'd lower |
After setting the thread count to 2 i managed to get some kind of text however the text is stupid:
|
At least
I find the instruct model slightly easier to use, but it's personal taste. Here's a prompt template for Codellama Instruct: |
It seemed to be working "fine" on the cpu though for some reason i had to type the prompt twice:
llama_print_timings: load time = 7100.79 ms
|
To improve the responsiveness slightly, try adding Looks resolved. |
Not yet, when added
then or 2)Download this .zip file. Then extract the file and cd into the folder you extracted then run it with If you don't know how to download, cd or any other terminal command, please see this link. Then if you don't know what an .zip file is, see this link. 3)You can find the
llama_print_timings: load time = 7159.67 ms
|
It's resolved because Codellama/llama.cpp is working, though it's unclear why you're prompting it that way as that was addressed 2 days ago. |
Hello, i downloaded codellama from TheBloke and i compiled llama.cpp with OpenBlas and CLBlast but i get an error when i try to launch it. I am using the latest of the code as the last commit was an hour ago:
The text was updated successfully, but these errors were encountered: