Releases: ggml-org/llama.cpp
Release list
b8286
ggml-cuda: gdn use shared mem for HIP (#20366)
Suggested-by: Aman Gupta amangupta052@gmail.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b8285
cuda/hip: fix loop unrolling in ssm-conv (#20369)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b8281
fix op rope, add rope_back (#20293)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b8280
fix for failed UT case: ACC, L2_NORM, UPSCALE, fused_glu, unary (#20283)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b8279
model : qwen3vl reranker text support (#20332)
-
model : fix qwen3vl reranker support
-
Remove CLS_OUT
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b8278
llama-quant : correct n_attention_wv usage (#20357)
- llama-quant : correct
n_attention_wvusage
In #19770, I introduced a regression in the way the
quantize_state_impl counter values were initialized. I was
incrementing and using n_attention_wv in the same loop, when it should
have been fixed by the time we're deciding tensor types in
llama_tensor_get_type_impl (for use_more_bits).
I never observed a difference in any of my
tests
- it was only after @bartowski kindly pointed this out that I realized
it was incorrect. (Thanks!)
- simplify
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b8277
ggml : bump RPC version (#20330)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b8276
ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (#20173)
-
K quant speedup (#20)
-
Basic JIT compilation for mul_mat, get_rows, and scale (#17)
-
scale jit working
-
preliminary working jit for getrows and mulmat, needs refining
-
simplified mul_mat preprocessing switch statement
-
get_rows fixes, mul_mat refinement
-
formatted + last edits
-
removed some extraneous prints
-
fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish
-
small fix
-
some changes, working
-
get_rows and mul_mat jit fixed and working
-
Update formatting
-
formatting
-
Add header
Co-authored-by: Neha Abbas nehaabbas@ReeseLevines-MacBook-Pro.local
Co-authored-by: Reese Levine reeselevine1@gmail.com
-
Start work on all-encompassing shader library
-
refactor argmax, set_rows
-
Refactor all but flashattention, mat mul
-
no gibberish, all k quants added, merged
-
vec memory fix
-
q6_k matching metal on my machine, tests passing
-
Set tile size for q6_k separately
-
Separate out fast shaders
Co-authored-by: neha-ha 137219201+neha-ha@users.noreply.github.com
-
Move towards writeBuffer for params
-
Move away from multiple buffers for set_rows errors, remove host buffer for parameter buffers, minor cleanups
-
Remove extra file
-
Formatting
Co-authored-by: neha-ha 137219201+neha-ha@users.noreply.github.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b8272
models : fix assert in mamba2 (cont) (#20335)
-
models : fix assert in mamba2 (cont)
-
cont : add n_group mod
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b8271
server : make 2 checkpoints near the end of the prompt (#20288)
-
server : make 2 checkpoints near the end of the prompt
-
cont : adjust checkpoints
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: