-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Huge perplexity score generated by CLBLAST based on GPU of Android phone? #2133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
How are the values with CPU-only?
These is unexpected - |
Thanks for your reply.
Sorry for my careless. I have deleted some contents of wiki.test.raw before(I thought the file is too large to do perplexity calculating for a mobile device), so chunks number is wrong (160 chunks instead of 655 chunks) However, I use the original wiki.test.raw file to calculating perplexity(I can make sure this file is complete). The perplexity is also very huge, like [1]2717.5986,[2]3794.2774.
/bin/perplexity -m ../../storage/downloads/llm-models/ggml-model-q4_0.bin -f ../../storage/downloads/wikitext-2-raw/wiki.test.raw -t 8 system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | |
Hi, it may not be related, but llama.cpp performance improves by loading models from the $HOME path. Move the model to $HOME: Edit: It's not related. I found the same error. Here's CPU working as expected:
Here's GPU with high perplexity:
GPU perplexity >2500! Here's my device specs: lscpu:
uname -a
OpenCL:
|
Yep~~reproduce bug.
|
Since the CPU results look OK, this means that for some reason the OpenCL implementation fails on these devices. |
The model was quantized on Macbook but inference on Android. Is it possible that endianess is different when reading or writing model files?
|
new info, same result:
OpenCL produces enormous perplexity. CPU working as expected. |
New observation, If you try with larger model(like Q4_1 or Q5_0), the perplexity score will be infinite, which is super huge that system can't be represented properly. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
I try to use a Xiaomi phone to run perplexity by CLBLAST based on GPU , the model is tested on MacBook , but the perplexity scores are really huge, like [1]2717.5986,[2]3794.2774,
I am confused , and anyone can give some hints about this issue?
The text was updated successfully, but these errors were encountered: