You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"If this is set, the chat will be formatted so that the final "
"message in the chat is open-ended, without any EOS tokens. The "
"model will continue this message rather than starting a new one. "
"This allows you to \"prefill\" part of the model's response for it. "
"Cannot be used at the same time as `add_generation_prompt`."
Hope llama-server support this feature too.
Motivation
This is very helpful for user-controllable generation.
When response is truncated by max_token, user can continue to generate longer response.
Possible Implementation
No response
The text was updated successfully, but these errors were encountered:
I agree that it would be nice to have such functionality, but instead of adding a separate parameter, I think it should be supported by default if the last message provided is assistant message, just like claude does.
Prerequisites
Feature Description
Both transformers and vLLM support
continue_final_message parameter
, which let model continue writing the last round message.Description from vLLM:
Hope
llama-server
support this feature too.Motivation
max_token
, user can continue to generate longer response.Possible Implementation
No response
The text was updated successfully, but these errors were encountered: