Skip to content

[FEATURE_REQUEST] Currently prefills not really possible on generic openAI chat completion. #4429

@Ph0rk0z

Description

@Ph0rk0z

Have you searched for similar requests?

Yes

Is your feature request related to a problem? If so, please describe.

I tried to use prefill to disable thinking in chat completions. First I tried start reply with, then a prefill in the CC preset. Watching the backend debug output, it seems that an EOS token is added after the assistant message regardless.

This means that both continues and prefills are just a normal message instead of a completion and in essence do nothing.

To visualize it:

user: something something
assistant: <think></think> + EOS
assistant: <think> I'm going to think anyway because the last message was over

Tested with llama.cpp server and it's all the same behavior. They have feature for continue: ggml-org/llama.cpp#13174 and prefill ggml-org/llama.cpp#13174

Describe the solution you'd like

Is this part of the spec? theroyallab/tabbyAPI#276 seems to reply there is a parameter now that would disable this behavior, at least in tabby. Dunno if the side effect is that no other assistant templating is added.
add_generation_prompt: false

Describe alternatives you've considered

Text completion works but not always available for tools and images. A damned if you do, damned if you don't situation.

Additional context

No response

Priority

Low (Nice-to-have)

Are you willing to test this on staging/unstable branch if this is implemented?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    🦄 Feature Request[ISSUE] Suggestion for new feature, update or change

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions