Skip to content

Feature Request: llama-server support continue_final_message #11755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
DIYer22 opened this issue Feb 8, 2025 · 3 comments
Closed
4 tasks done

Feature Request: llama-server support continue_final_message #11755

DIYer22 opened this issue Feb 8, 2025 · 3 comments
Labels
enhancement New feature or request stale

Comments

@DIYer22
Copy link

DIYer22 commented Feb 8, 2025

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Both transformers and vLLM support continue_final_message parameter, which let model continue writing the last round message.

Description from vLLM:

     "If this is set, the chat will be formatted so that the final "
     "message in the chat is open-ended, without any EOS tokens. The "
     "model will continue this message rather than starting a new one. "
     "This allows you to \"prefill\" part of the model's response for it. "
     "Cannot be used at the same time as `add_generation_prompt`."

Hope llama-server support this feature too.

Motivation

  1. This is very helpful for user-controllable generation.
  2. When response is truncated by max_token, user can continue to generate longer response.

Possible Implementation

No response

@DIYer22 DIYer22 added the enhancement New feature or request label Feb 8, 2025
@remixer-dec
Copy link

I agree that it would be nice to have such functionality, but instead of adding a separate parameter, I think it should be supported by default if the last message provided is assistant message, just like claude does.

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@remixer-dec
Copy link

implemented in #13174

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

2 participants