Releases · ggml-org/llama.cpp

Release list

b9846

github-actions released this 30 Jun 11:03

b9846

f708a5b

vulkan: roll bk loop in matmul for asahi linux (#24663)

vulkan: roll bk loop in matmul for asahi linux
vulkan: fix inline comment
vulkan: revert BK-loop unroll change
vulkan: edit spirv directly for asahi roll bk loop
vulkan: remove trailing whitespace at the end of comments

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

Assets 27

b9844

github-actions released this 30 Jun 08:56

b9844

6c5de1c

ggml-webgpu: add support for NVFP4 (#25143)

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

Assets 27

b9843

github-actions released this 30 Jun 01:17

b9843

86b9470

Revert "sched : reintroduce less synchronizations during split compute (#20793)" (#25138)

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

Assets 27

b9842

github-actions released this 29 Jun 16:18

b9842

6f4f53f

common : dedup preset and cached model entries in /v1/models (#25131)

Signed-off-by: Adrien Gallouët angt@huggingface.co

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

Assets 27

b9840

github-actions released this 29 Jun 10:25

b9840

8c146a8

DeepSeek V4 (#24162)

convert: add dsv4 conversion
add basic setup
add llm_graph_input_dsv4
add save-load state
add sinkhorn eps - correction by @fairydreaming
add rope fix
cleanup dead code
fix bugs
support pro model: added by @fairydreaming
remove redundant V cache
Chat template
remove debugging leftovers
Add mechanism for inlining templates based on architecture
s/deepseek-v4-flash/deepseek4/g
s/deepseek-v4-flash/deepseek4/g continued
enable graph reuse
enable FA
fix test llama archs
rename
compatibility with antirez ds4 GGUFs
simplified set_gguf_parameters() by calling super class method, replaced moe.score_func with expert_gating_func.
reserve worst-case kv-cache
revert max split inputs
address review comments
add padding to enable FA
pad only the final value of plan.n_kv to 256
remove built-in cpp chat template
cont: remove cpp built-in template
rm outdated test
replace ggml_view_3d() with ggml_reshape_3d()

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

only support n_seq=1 for now
remove unused var
cont: remove unused var
use scale bias
use correct ptr for can_reuse
remove gen-chat-inline-templates.py
simplify graph reuse
cont: cleanup
remove unused inputs
enable partial checkpointing
add correct shape for kq_mask + set llama_model_n_swa to 0 for dsv4
precompute source_idx + add comment about dummy write
support multi-seq
remove restored_trim_pos
use split_equal when possible
fix indent
address review comments
use LLM_KV
fix ci

Co-authored-by: Piotr Wilkin piotr.wilkin@syndatis.com
Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com
Co-authored-by: Xuan Son Nguyen son@huggingface.co
Co-authored-by: fairydreaming 166155368+fairydreaming@users.noreply.github.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

Contributors

fairydreaming

Assets 27

b9839

github-actions released this 29 Jun 09:45

b9839

6cb18b2

tools/ui: restore Tailwind scanning in ignored worktrees (#24879)

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

Assets 27

b9838

github-actions released this 29 Jun 07:33

b9838

277a105

common : remove unused regex-partial (#25118)

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

Assets 27

b9837

github-actions released this 29 Jun 00:05

b9837

b3fed31

jinja, chat: add --reasoning-preserve flag (#25105)

jinja, chat: add --reasoning-preserve flag
correct help message

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

Assets 27

b9835

github-actions released this 28 Jun 19:37

b9835

7cb8576

ui: fix stop and reasoning skip in single-model mode (#25084)

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

Assets 27

b9833

github-actions released this 28 Jun 15:32

b9833

c818263

chat : implement minicpm5 parser (#24889)

Add minicpm5 tool call parser
Refactor MiniCPM5 PEG parser per review feedback
Fix jinja min/max API to match Jinja2
modify by review
MiniCPM5: use autoparser for XML tool calls and fix grammar preserved-token triggers
MiniCPM5: fix streaming tool-arg placeholder and remove alt XML markers
skip min/max attribute tests in -py mode
test-jinja: use real expected output for min/max attribute tests
MiniCPM5: revert shared mapper and history fallbacks per review

Drop streaming tool-arg placeholder workarounds from the generic PEG
mapper and restore strict tool-call argument JSON parsing so MiniCPM5
support stays limited to autoparser/diff-analyzer changes.

chat : refactor minicpm5 back to dedicated parser
cont : simplify grammar
cont : refactor
cont : fixes
cont : rename template to openbmb-MiniCPM5-1B.jinja
cont : add message delimiters
cont : fix tests

Co-authored-by: zhangtao zhangtao2@modelbest.cn
Co-authored-by: 张涛 <>

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

Assets 27

Uh oh!

Releases: ggml-org/llama.cpp

Release list

b9846

Uh oh!

b9844

Uh oh!

b9843

Uh oh!

b9842

Uh oh!

b9840

Contributors

Uh oh!

b9839

Uh oh!

b9838

Uh oh!

b9837

Uh oh!

b9835

Uh oh!

b9833

Uh oh!