You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Starting from b5450 to latest version, token generation rate for model Qwen3-30B-A3B is reduced to ~5 tok/s. While from b5449 or earlier version,the token generation rate is about 22 tok/s. I'm using Windows Vulkan x64 binary, My notebook PC platform: Lenovo ThinkBook 14 G7+ IAH, Intel Core Ultra 7 255H CPU,Intel ARC 140T iGPU,32GB RAM,Windows 11 24H2.
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered:
JohannesGaessler
changed the title
Misc. bug:
Misc. bug: Starting from b5450 to latest version, token generation rate for model Qwen3-30B-A3B is reduced to ~5 tok/s.
May 24, 2025
Name and Version
from b5450 to latest version
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
Starting from b5450 to latest version, token generation rate for model Qwen3-30B-A3B is reduced to ~5 tok/s. While from b5449 or earlier version,the token generation rate is about 22 tok/s. I'm using Windows Vulkan x64 binary, My notebook PC platform: Lenovo ThinkBook 14 G7+ IAH, Intel Core Ultra 7 255H CPU,Intel ARC 140T iGPU,32GB RAM,Windows 11 24H2.
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: