[FEATURE_REQUEST] Currently prefills not really possible on generic openAI chat completion.

### Have you searched for similar requests?

Yes

### Is your feature request related to a problem? If so, please describe.

I tried to use prefill to disable thinking in chat completions. First I tried start reply with, then a prefill in the CC preset. Watching the backend debug output, it seems that an EOS token is added after the assistant message regardless.

This means that both continues and prefills are just a normal message instead of a completion and in essence do nothing.

To visualize it:

```
user: something something
assistant: <think></think> + EOS
assistant: <think> I'm going to think anyway because the last message was over
```

Tested with llama.cpp server and it's all the same behavior. They have feature for continue: https://github.com/ggml-org/llama.cpp/pull/13174 and prefill https://github.com/ggml-org/llama.cpp/pull/13174



### Describe the solution you'd like

Is this part of the spec? https://github.com/theroyallab/tabbyAPI/issues/276 seems to reply there is a parameter now that would disable this behavior, at least in tabby. Dunno if the side effect is that no other assistant templating is added.
`
add_generation_prompt: false`

### Describe alternatives you've considered

Text completion works but not always available for tools and images. A damned if you do, damned if you don't situation.

### Additional context

_No response_

### Priority

Low (Nice-to-have)

### Are you willing to test this on staging/unstable branch if this is implemented?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE_REQUEST] Currently prefills not really possible on generic openAI chat completion. #4429

Have you searched for similar requests?

Is your feature request related to a problem? If so, please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Priority

Are you willing to test this on staging/unstable branch if this is implemented?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE_REQUEST] Currently prefills not really possible on generic openAI chat completion. #4429

Description

Have you searched for similar requests?

Is your feature request related to a problem? If so, please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Priority

Are you willing to test this on staging/unstable branch if this is implemented?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions