-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Misc. bug: Speed degradation in bin-win-cpu-x64
compared to bin-win-avx2-x64
on Intel Core i7-12700H
#13664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Does it happen with fewer threads? Try running |
No speed degradation for Results of
build: 4245e62 (5432) |
Don't know if this is related, but I've noticed a noticeable decrease in generation speed since I updated my local build to the current version (can't pinpoint the exact previous commit, but it was about 3-4 days past). So I ran (dev-venv) ilintar@LinuksowaJaskinia:/mnt/win/k/models/unsloth/Qwen3-30B-A3B-GGUF$ llama-bench -fa 1 -ctk q8_0 -ctv q8_0 -m Qwen3-30B-A3B-UD-Q4_K_XL.gguf -ot "(up_exps|down_exps)=CPU" -t 2,4,6,8 -p 512 -n 512 -r 10 -d 4096
|
It's not the same issue, this is related to a change in the windows releases. Open a new issue and try to find which commit introduced your issue. |
Okay, I'll try to do a binary search tomorrow to narrow it down and add an issue afterwards. |
The latest release should have fixed this issue. |
Unfortunately, the latest release (build b5476: 17fc817) does not work for me. When running
Note, that the following line as in previous releases is missing even though the file
When using
|
The latest release uses OpenMP, so if you don't have it installed it may fail to load. Apparently this library is not included in the VC redistributable. I have added it now and should be bundled with the next llama.cpp release. |
I confirm that this issue is fixed in the current release b5478 ( Results of
build: f5cd27b (5478) For the fastest speed, it seems best to run |
Name and Version
First bad version:
Current version, still affected:
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-cli
Command line
llama-cli -m gemma-3-1b-it-Q4_K_M.gguf -no-cnv -p "Tell me about the capital of France" --seed 3
Problem description & steps to reproduce
I see a degradation of the eval time on Windows 11 with an Intel Core i7-12700H CPU since commit 9f2da58 with
llama-b5276-bin-win-cpu-x64.zip
(release b5276): the eval time is now about ten times slower than in the previous build b5275 withllama-b5275-bin-win-avx2-x64.zip
.Also in the current release b5432, providing a
llama-b5432-bin-win-cpu-x64.zip
and not a...-win-avx2-x64.zip
the eval time is slow.Tested with
gemma-3-1b-it-Q4_K_M.gguf
(also verified withQwen3-30B-A3B-UD-Q2_K_XL.gguf
and other models):llama-b5275-bin-win-avx2-x64: eval time = 61.90 ms per token
llama-b5276-bin-win-cpu-x64: eval time = 1043.79 ms per token
llama-b5432-bin-win-cpu-x64: eval time = 1132.96 ms per token
First Bad Commit
9f2da58
Relevant log output
The text was updated successfully, but these errors were encountered: