Skip to content

Pull requests: ggml-org/llama.cpp

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Optimized flash attention (FA) for OpenCL backend, and add Q4/Q8 KV cache quantization with FA for Adreno GPUs. ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend
#23501 opened May 21, 2026 by wanghqc Contributor Loading…
perplexity: fix integer overflow examples merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge.
#23496 opened May 21, 2026 by fairydreaming Collaborator Loading…
opencl: batch profiling to prevent resource exhaustion ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend
#23495 opened May 21, 2026 by shaofeiqi Contributor Loading…
ggm-cpu: ARM Repack kernels for Q1_0 ggml changes relating to the ggml tensor library for machine learning
#23492 opened May 21, 2026 by pl752 Contributor Loading…
server: add margin for draft model for fit examples server
#23485 opened May 21, 2026 by am17an Contributor Loading…
GGML/llama.cpp: Add scaled GEMMs for more robust NVFP4 support ggml changes relating to the ggml tensor library for machine learning model Model specific testing Everything test related
#23484 opened May 21, 2026 by ORippler Collaborator Draft
Add missing buffer set in allreduce fallback !COMPUTE clear ggml changes relating to the ggml tensor library for machine learning
#23480 opened May 21, 2026 by TheBlueMatt Contributor Loading…
Optimize ggml_vec_dot_q4_K_q8_K_generic ggml changes relating to the ggml tensor library for machine learning
#23474 opened May 21, 2026 by pauser0000001 Loading…
common : fix state save in common_prompt_batch_decode examples testing Everything test related
#23468 opened May 21, 2026 by danbev Member Draft
ui: media attachments before text examples server/ui
#23467 opened May 21, 2026 by sfallah Contributor Loading…
vocab : keep DNA k-mer ids distinct from colliding BPE tokens merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. python python script changes
#23466 opened May 21, 2026 by kashif Contributor Loading…
[WebGPU] Check batch_compute_passes before sending passes when not doing GPU profiling ggml changes relating to the ggml tensor library for machine learning WebGPU
#23457 opened May 21, 2026 by nikhilJain17 Contributor Loading…
hexagon: apply repl optimization in flash attn softmax as #22993 ggml changes relating to the ggml tensor library for machine learning Hexagon
#23455 opened May 21, 2026 by njsyw1997 Contributor Loading…
Generalize Adreno MoE kernels on size M ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend
#23449 opened May 20, 2026 by shawngu-quic Contributor Loading…
Hip fattn expf approx ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
#23441 opened May 20, 2026 by a-huk Loading…
MoE disk offloading for Metal Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning
#23440 opened May 20, 2026 by kisasexypantera94 Draft
ggml/cpu: skip zero-scale blocks in TQ1_0 and TQ2_0 vec_dot kernels ggml changes relating to the ggml tensor library for machine learning
#23439 opened May 20, 2026 by eriirfos-eng Loading…
json-schema-to-grammar: expand PCRE shorthands in pattern strings testing Everything test related
#23436 opened May 20, 2026 by iOptimizeThings Loading…
ggml: replace fixed 1GB context pool with growable buffer in meta backend (#22404) ggml changes relating to the ggml tensor library for machine learning
#23432 opened May 20, 2026 by nonml Loading…
ProTip! Adding no:label will show everything without a label.