Skip to content

Releases: ggml-org/llama.cpp

b9846

Choose a tag to compare

@github-actions github-actions released this 30 Jun 11:03
f708a5b

vulkan: roll bk loop in matmul for asahi linux (#24663)

  • vulkan: roll bk loop in matmul for asahi linux

  • vulkan: fix inline comment

  • vulkan: revert BK-loop unroll change

  • vulkan: edit spirv directly for asahi roll bk loop

  • vulkan: remove trailing whitespace at the end of comments

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

b9844

Choose a tag to compare

@github-actions github-actions released this 30 Jun 08:56
6c5de1c

b9843

Choose a tag to compare

@github-actions github-actions released this 30 Jun 01:17
86b9470

b9842

Choose a tag to compare

@github-actions github-actions released this 29 Jun 16:18
6f4f53f

b9840

Choose a tag to compare

@github-actions github-actions released this 29 Jun 10:25
8c146a8

DeepSeek V4 (#24162)

  • convert: add dsv4 conversion

  • add basic setup

  • add llm_graph_input_dsv4

  • add save-load state

  • add sinkhorn eps - correction by @fairydreaming

  • add rope fix

  • cleanup dead code

  • fix bugs

  • support pro model: added by @fairydreaming

  • remove redundant V cache

  • Chat template

  • remove debugging leftovers

  • Add mechanism for inlining templates based on architecture

  • s/deepseek-v4-flash/deepseek4/g

  • s/deepseek-v4-flash/deepseek4/g continued

  • enable graph reuse

  • enable FA

  • fix test llama archs

  • rename

  • compatibility with antirez ds4 GGUFs

  • simplified set_gguf_parameters() by calling super class method, replaced moe.score_func with expert_gating_func.

  • reserve worst-case kv-cache

  • revert max split inputs

  • address review comments

  • add padding to enable FA

  • pad only the final value of plan.n_kv to 256

  • remove built-in cpp chat template

  • cont: remove cpp built-in template

  • rm outdated test

  • replace ggml_view_3d() with ggml_reshape_3d()

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

  • only support n_seq=1 for now

  • remove unused var

  • cont: remove unused var

  • use scale bias

  • use correct ptr for can_reuse

  • remove gen-chat-inline-templates.py

  • simplify graph reuse

  • cont: cleanup

  • remove unused inputs

  • enable partial checkpointing

  • add correct shape for kq_mask + set llama_model_n_swa to 0 for dsv4

  • precompute source_idx + add comment about dummy write

  • support multi-seq

  • remove restored_trim_pos

  • use split_equal when possible

  • fix indent

  • address review comments

  • use LLM_KV

  • fix ci


Co-authored-by: Piotr Wilkin piotr.wilkin@syndatis.com
Co-authored-by: Stanisław Szymczyk sszymczy@gmail.com
Co-authored-by: Xuan Son Nguyen son@huggingface.co
Co-authored-by: fairydreaming 166155368+fairydreaming@users.noreply.github.com
Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

b9839

Choose a tag to compare

@github-actions github-actions released this 29 Jun 09:45
6cb18b2

b9838

Choose a tag to compare

@github-actions github-actions released this 29 Jun 07:33
277a105

b9837

Choose a tag to compare

@github-actions github-actions released this 29 Jun 00:05
b3fed31

jinja, chat: add --reasoning-preserve flag (#25105)

  • jinja, chat: add --reasoning-preserve flag

  • correct help message

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

b9835

Choose a tag to compare

@github-actions github-actions released this 28 Jun 19:37
7cb8576

b9833

Choose a tag to compare

@github-actions github-actions released this 28 Jun 15:32
c818263

chat : implement minicpm5 parser (#24889)

  • Add minicpm5 tool call parser

  • Refactor MiniCPM5 PEG parser per review feedback

  • Fix jinja min/max API to match Jinja2

  • modify by review

  • MiniCPM5: use autoparser for XML tool calls and fix grammar preserved-token triggers

  • MiniCPM5: fix streaming tool-arg placeholder and remove alt XML markers

  • skip min/max attribute tests in -py mode

  • test-jinja: use real expected output for min/max attribute tests

  • MiniCPM5: revert shared mapper and history fallbacks per review

Drop streaming tool-arg placeholder workarounds from the generic PEG
mapper and restore strict tool-call argument JSON parsing so MiniCPM5
support stays limited to autoparser/diff-analyzer changes.

  • chat : refactor minicpm5 back to dedicated parser

  • cont : simplify grammar

  • cont : refactor

  • cont : fixes

  • cont : rename template to openbmb-MiniCPM5-1B.jinja

  • cont : add message delimiters

  • cont : fix tests


Co-authored-by: zhangtao zhangtao2@modelbest.cn
Co-authored-by: 张涛 <>

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI: