-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
Have you searched for similar requests?
Yes
Is your feature request related to a problem? If so, please describe.
I tried to use prefill to disable thinking in chat completions. First I tried start reply with, then a prefill in the CC preset. Watching the backend debug output, it seems that an EOS token is added after the assistant message regardless.
This means that both continues and prefills are just a normal message instead of a completion and in essence do nothing.
To visualize it:
user: something something
assistant: <think></think> + EOS
assistant: <think> I'm going to think anyway because the last message was over
Tested with llama.cpp server and it's all the same behavior. They have feature for continue: ggml-org/llama.cpp#13174 and prefill ggml-org/llama.cpp#13174
Describe the solution you'd like
Is this part of the spec? theroyallab/tabbyAPI#276 seems to reply there is a parameter now that would disable this behavior, at least in tabby. Dunno if the side effect is that no other assistant templating is added.
add_generation_prompt: false
Describe alternatives you've considered
Text completion works but not always available for tools and images. A damned if you do, damned if you don't situation.
Additional context
No response
Priority
Low (Nice-to-have)
Are you willing to test this on staging/unstable branch if this is implemented?
Yes