-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Phi-2 q4_km generating gibberish on ARM devices #4618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not sure - works on M2 Ultra What is the output from: make tests && ./tests/test-backend-ops Any fails? |
Sorry, that took really long to compile on my Android. The tests crash halfway, consistently at the same point, right after I modified the cpp file to remove https://github.com/ggerganov/llama.cpp/blob/master/tests/test-backend-ops.cpp#L1580 and recompiled. And all other tests seem to be passing for |
Is there an easy way to modify the makefile to skip either NEON or FMA? I'd like to see if I can pinpoint which one is causing issues. Also, I did check with another user over discord, they used Termux with identical compile and run settings to me and their output was coherent. So it might be a device specific thing? Slightly confused and wondering if anybody else has the same issue. For reference, my device (that didn't work)
|
Does it work with this patch: diff --git a/ggml-quants.c b/ggml-quants.c
index a15a2404..b5c76f00 100644
--- a/ggml-quants.c
+++ b/ggml-quants.c
@@ -5602,7 +5602,7 @@ void ggml_vec_dot_q4_K_q8_K(const int n, float * restrict s, const void * restri
const int nb = n / QK_K;
-#ifdef __ARM_NEON
+#ifdef __ARM_NEON_XXX
const uint8x16_t m4b = vdupq_n_u8(0xf);
|
Unfortunately not, it's still generating rubbish. |
Another update: I have done a full search and replace in all files replacing all instances of I will try to slowly replace each instance until I find the one responsible, unless you have a better approach to suggest. |
By trial and error, I have narrowed it down to By skipping https://github.com/ggerganov/llama.cpp/blob/master/ggml-quants.c#L5879 , this model works perfectly on my device. |
Could you verify that #4630 also fixes the issue on that device? |
Yes that seems to have fixed it! Awesome. Though I am wondering why the unit tests didn't catch that. |
I checked q4_k_m on my build with #4630 and thats fine, but I think there's something still off with q5_k_m. https://huggingface.co/afrideva/phi-2-uncensored-GGUF/blob/main/phi-2-uncensored.q5_k_m.gguf |
The tests cannot catch integer overflows. Should be fixed now |
Running the latest commit, testing the model https://huggingface.co/afrideva/phi-2-uncensored-GGUF/blob/main/phi-2-uncensored.q4_k_m.gguf in Termux.
./main -m ../phi-2-uncensored.q4_k_m.gguf -n 10 -p "Hi, my name is"
Hi, my name is vocwich Reeves TABLEeco Feahar Reeves sill Reeves
It generates complete gibberish.
The same model works fine on my x86_64 windows device.
Also, q2_k works fine on both systems.
Is it possible that that intrinsic ARM FMA or ARM NEON is responsible for this issue?
Also tagging @ebeyabraham
The text was updated successfully, but these errors were encountered: