Releases · ggml-org/llama.cpp

Release list

b8286

github-actions released this 12 Mar 07:47

b8286

5f91b1d

ggml-cuda: gdn use shared mem for HIP (#20366)

Suggested-by: Aman Gupta amangupta052@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 23

b8285

github-actions released this 12 Mar 07:45

b8285

9ef7523

cuda/hip: fix loop unrolling in ssm-conv (#20369)

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 23

b8281

github-actions released this 12 Mar 03:40

b8281

0cec84f

fix op rope, add rope_back (#20293)

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 23

b8280

github-actions released this 12 Mar 01:39

b8280

b2e1427

fix for failed UT case: ACC, L2_NORM, UPSCALE, fused_glu, unary (#20283)

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 23

b8279

github-actions released this 12 Mar 00:54

b8279

4d99d45

model : qwen3vl reranker text support (#20332)

model : fix qwen3vl reranker support
Remove CLS_OUT

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 23

b8278

github-actions released this 11 Mar 22:20

b8278

10e5b14

llama-quant : correct n_attention_wv usage (#20357)

llama-quant : correct n_attention_wv usage

In #19770, I introduced a regression in the way the
quantize_state_impl counter values were initialized. I was
incrementing and using n_attention_wv in the same loop, when it should
have been fixed by the time we're deciding tensor types in
llama_tensor_get_type_impl (for use_more_bits).

I never observed a difference in any of my
tests

it was only after @bartowski kindly pointed this out that I realized
it was incorrect. (Thanks!)

simplify

macOS/iOS:

Linux:

Windows:

openEuler:

Contributors

bartowski

Assets 23

b8277

github-actions released this 11 Mar 19:28

b8277

90b2731

ggml : bump RPC version (#20330)

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 23

b8276

github-actions released this 11 Mar 19:08

b8276

aa2d278

ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (#20173)

K quant speedup (#20)
Basic JIT compilation for mul_mat, get_rows, and scale (#17)
scale jit working
preliminary working jit for getrows and mulmat, needs refining
simplified mul_mat preprocessing switch statement
get_rows fixes, mul_mat refinement
formatted + last edits
removed some extraneous prints
fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish
small fix
some changes, working
get_rows and mul_mat jit fixed and working
Update formatting
formatting
Add header

Co-authored-by: Neha Abbas nehaabbas@ReeseLevines-MacBook-Pro.local
Co-authored-by: Reese Levine reeselevine1@gmail.com

Start work on all-encompassing shader library
refactor argmax, set_rows
Refactor all but flashattention, mat mul
no gibberish, all k quants added, merged
vec memory fix
q6_k matching metal on my machine, tests passing
Set tile size for q6_k separately
Separate out fast shaders

Co-authored-by: neha-ha 137219201+neha-ha@users.noreply.github.com

Move towards writeBuffer for params
Move away from multiple buffers for set_rows errors, remove host buffer for parameter buffers, minor cleanups
Remove extra file
Formatting

Co-authored-by: neha-ha 137219201+neha-ha@users.noreply.github.com

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 23

b8272

github-actions released this 11 Mar 14:00

b8272

1274fbe

models : fix assert in mamba2 (cont) (#20335)

models : fix assert in mamba2 (cont)
cont : add n_group mod

Co-authored-by: Sigbjørn Skjæret sigbjorn.skjaeret@scala.com

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 23

b8271

github-actions released this 11 Mar 13:54

b8271

a7b3dee

server : make 2 checkpoints near the end of the prompt (#20288)

server : make 2 checkpoints near the end of the prompt
cont : adjust checkpoints

macOS/iOS:

Linux:

Windows:

openEuler:

Assets 23

Uh oh!

Releases: ggml-org/llama.cpp

Release list

b8286

Uh oh!

b8285

Uh oh!

b8281

Uh oh!

b8280

Uh oh!

b8279

Uh oh!

b8278

Contributors

Uh oh!

b8277

Uh oh!

b8276

Uh oh!

b8272

Uh oh!

b8271

Uh oh!