You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: Ideally, we'd stream the thoughts as a reasoning_content delta (now trivial to implement), but for now we are just aiming for compatibility w/ DeepSeek's API (if --reasoning-format deepseek, which is the default).
I just tested using the official deepseek API and thoughts are separated.
Official deepseek API: "choices":[{"index":0,"delta":{"content":null,"reasoning_content":"Okay"},"logprobs":null,"finish_reason":null}]}
llama.cpp server API: "choices":[{"finish_reason":null,"index":0,"delta":{"content":"<think>Okay"}}]
The text was updated successfully, but these errors were encountered:
Oh, that's an interesting development! I can't confirm the DeepSeek API change (nor contribute any code atm), but it should be easy to modify the code to support this (uncomment the reasoning_content_delta bits and disable the reasoning_in_content behaviour)
Uh oh!
There was an error while loading. Please reload this page.
Name and Version
version: 5523 (aa6dff0)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu
Which llama.cpp modules do you know to be affected?
llama-server
Command line
Problem description & steps to reproduce
Thinking content should be separated when streaming too.
@ochafik in #12379 said:
I just tested using the official deepseek API and thoughts are separated.
Official deepseek API:
"choices":[{"index":0,"delta":{"content":null,"reasoning_content":"Okay"},"logprobs":null,"finish_reason":null}]}
llama.cpp server API:
"choices":[{"finish_reason":null,"index":0,"delta":{"content":"<think>Okay"}}]
The text was updated successfully, but these errors were encountered: