[llm] Add generate_from_pos API to LLM runner #11570
Open
+134
−14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As titled, this API allows us to support multi-turn conversation by passing in a
start_pos
argument togenerate_from_pos
.This pull request introduces a new feature to support text generation from a specific starting position (
generate_from_pos
) and includes updates to ensure proper error handling and functionality whenmax_new_tokens
is negative. The changes primarily focus on extending theTextLLMRunner
class and its associated methods to accommodate this new feature while maintaining backward compatibility.New Feature: Text Generation from a Specific Starting Position
Added
generate_from_pos
Method: Introduced a new methodgenerate_from_pos
inTextLLMRunner
to allow text generation starting from a specified position in the KV cache. This includes updates to the method signature, logic, and error handling. (extension/llm/runner/text_llm_runner.cpp
[1] [2] [3] [4];extension/llm/runner/text_llm_runner.h
[5]Updated Documentation: Enhanced method documentation in
TextLLMRunner
to describe the new functionality, including parameters likestart_pos
and the expected behavior. (extension/llm/runner/text_llm_runner.h
[1] [2]Error Handling Improvements
Validation for
max_new_tokens
: Added checks to ensuremax_new_tokens
is positive. If it is not, anInvalidArgument
error is returned. This prevents invalid configurations during text generation. (extension/llm/runner/text_llm_runner.cpp
extension/llm/runner/text_llm_runner.cppR129-R156)Unit Test for Negative
max_new_tokens
: Created a new test case (GenerateFromPosErrorsWithNegativeMaxNewTokens
) to verify that thegenerate_from_pos
method correctly handles scenarios wheremax_new_tokens
is negative. (extension/llm/runner/test/test_text_llm_runner.cpp
extension/llm/runner/test/test_text_llm_runner.cppR325-R379)