[pull] master from ggml-org:master #191

pull · 2025-05-21T10:12:04Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

* model : disable SWA for Phi models ggml-ci * model : update warning message * model : print warning only if n_swa > 0 * model : fix typo

* kv-cache : simplify the interface ggml-ci * context : revert llama_batch_allocr position change ggml-ci

* server : fix first message identification When using the OpenAI SDK (https://github.com/openai/openai-node/blob/master/src/lib/ChatCompletionStream.ts#L623-L626) we noticed that the expected assistant role is missing in the first streaming message. Fix this by correctly checking for the first message. Co-authored-by: Piotr Stankiewicz <[email protected]> Signed-off-by: Dorin Geman <[email protected]> * server : Fix checks for first role message for stream=True Co-authored-by: Piotr Stankiewicz <[email protected]> Signed-off-by: Dorin Geman <[email protected]> --------- Signed-off-by: Dorin Geman <[email protected]> Co-authored-by: Piotr Stankiewicz <[email protected]>

* Add the endpoints /api/tags and /api/chat Add the endpoints /api/tags and /api/chat, and improved the model metadata response * Remove trailing whitespaces * Removed code that is not needed for copilot to work.

* ggml : add ggml_gelu_na (not approximated) * fix naming order * rename na --> erf * apply review suggesions * revert naming order

Signed-off-by: Emmanuel Ferdman <[email protected]>

* switch retrieval to llama_encode * enable --no-warmup for retrieval

ggml-ci

* opencl: fix couple crashes * fix kernel launches failed on devices which do not support non-uniform work-groups. When non-uniform work-groups are not supported, set `local_work_size` to NULL (= let driver choose the work-group sizes). This patch does not cover everything - just the cases tested by test-backend-ops. * fix sub-buffer creation failed due to `cl_buffer_region::origin` not being aligned to `CL_DEVICE_MEM_BASE_ADDR_ALIGN`. * OpenCL: query non-uniform WG sizes only on OpenCL 3.0+

* opencl: Add support for multiple devices ... but limited to one platform. A platform with a GPU will be preferred. Additionally: * Filter out devices that lack capabilities needed by the backend implementation (half support, OpenCL 2.0+, etc). * Make ggml_backend_opencl_reg() thread-safe. * fixup: fix an error in sync_with_other_backends ... when there is only one OpenCL device available.

Currently on a CUDA backend to SYCL when running `GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there are two operations that throw an exception from the blocking waits during queue recording. * `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187 * `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 We've noticed that `ggml-cuda.cu` has the [check_node_graph_compatibility_and_refresh_copy_ops](https://github.com/ggml-org/llama.cpp/blob/39e73ae0d69f882d7e29cecc6dd8f5052fca6731/ggml/src/ggml-cuda/ggml-cuda.cu#L2458-L2458) method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking if a graph can be used for the operations even if a user has asked for it to be enabled.

* removes the waits in async memcpy functions

ggml-ci

model : disable SWA for Phi models (#13676)

b44890d

* model : disable SWA for Phi models ggml-ci * model : update warning message * model : print warning only if n_swa > 0 * model : fix typo

pull bot added the ⤵️ pull label May 21, 2025

kv-cache : simplify the interface (#13660)

797f2ac

* kv-cache : simplify the interface ggml-ci * context : revert llama_batch_allocr position change ggml-ci

github-actions bot added examples server labels May 21, 2025

github-actions bot added the python label May 21, 2025

R-Dson and others added 2 commits May 21, 2025 15:15

server : Add the endpoints /api/tags and /api/chat (#13659)

0d5c742

* Add the endpoints /api/tags and /api/chat Add the endpoints /api/tags and /api/chat, and improved the model metadata response * Remove trailing whitespaces * Removed code that is not needed for copilot to work.

ggml : add ggml_gelu_erf() (#13667)

cf4cb59

* ggml : add ggml_gelu_na (not approximated) * fix naming order * rename na --> erf * apply review suggesions * revert naming order

github-actions bot added ggml Apple Metal labels May 21, 2025

emmanuel-ferdman and others added 6 commits May 21, 2025 16:33

gguf-py : display the invalid gguf type (#13687)

eb0f5c2

Signed-off-by: Emmanuel Ferdman <[email protected]>

examples : switch retrieval to llama_encode (#13685)

2aa777d

* switch retrieval to llama_encode * enable --no-warmup for retrieval

convert : add qwen2vl support for unsloth merges (#13686)

c76532e

server : improve error reporting (#13680)

5fbfe38

hparams : support models for which all layers use SWA (#13682)

8e186ef

ggml-ci

releases : build CPU backend separately (windows) (#13642)

d643bb2

github-actions bot added the devops label May 21, 2025

linehill and others added 3 commits May 21, 2025 13:21

github-actions bot added the SYCL label May 22, 2025

s-Nick and others added 3 commits May 22, 2025 12:54

sycl : Remove waits from function calls (#13702)

d394a9a

* removes the waits in async memcpy functions

gguf-py : correct charsmap parameter typing (#13701)

5be24af

server : pad small embedding batches (#13692)

cc74d5b

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master #191

[pull] master from ggml-org:master #191

pull bot commented May 21, 2025 •

edited

Loading

[pull] master from ggml-org:master #191

Are you sure you want to change the base?

[pull] master from ggml-org:master #191

Conversation

pull bot commented May 21, 2025 • edited Loading

pull bot commented May 21, 2025 •

edited

Loading