Name and Version
sudo docker run ghcr.io/ggml-org/llama.cpp:server-cuda13 --version 127 ↵
version: 9404 (241cbd4)
built with GNU 14.2.0 for Linux x86_64
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
No response
Command line
llama-server --models-preset config.ini
Problem description & steps to reproduce
I am running the llama server using docker and specify the LLAMA_CACHE: "/models" which has some models already downloaded. When I run with a model preset for something like gpt-oss-120b, depending on how the preset is specified it will either recognize the already downloaded model and apply the presets to it, or it will create a new entry in the models endpoint.
This produces a duplicate:
[gpt-oss-120b]
hf = ggml-org/gpt-oss-120b-GGUF
top-p = 1.0
While this correctly recognizes the cached value:
[ggml-org/gpt-oss-120b-GGUF]
top-p = 1.0
Ideally it would not create duplicates in the model list when specifying either.
First Bad Commit
No response
Relevant log output
v1/models shows "id":"ggml-org/gpt-oss-120b-GGUF:MXFP4" and "id":"gpt-oss-120b"
Name and Version
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
No response
Command line
Problem description & steps to reproduce
I am running the llama server using docker and specify the LLAMA_CACHE: "/models" which has some models already downloaded. When I run with a model preset for something like gpt-oss-120b, depending on how the preset is specified it will either recognize the already downloaded model and apply the presets to it, or it will create a new entry in the models endpoint.
This produces a duplicate:
While this correctly recognizes the cached value:
Ideally it would not create duplicates in the model list when specifying either.
First Bad Commit
No response
Relevant log output
v1/models shows "id":"ggml-org/gpt-oss-120b-GGUF:MXFP4" and "id":"gpt-oss-120b"