Skip to content

[llm] Add generate_from_pos API to LLM runner #11570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

larryliu0820
Copy link
Contributor

As titled, this API allows us to support multi-turn conversation by passing in a start_pos argument to generate_from_pos.

This pull request introduces a new feature to support text generation from a specific starting position (generate_from_pos) and includes updates to ensure proper error handling and functionality when max_new_tokens is negative. The changes primarily focus on extending the TextLLMRunner class and its associated methods to accommodate this new feature while maintaining backward compatibility.

New Feature: Text Generation from a Specific Starting Position

  • Added generate_from_pos Method: Introduced a new method generate_from_pos in TextLLMRunner to allow text generation starting from a specified position in the KV cache. This includes updates to the method signature, logic, and error handling. (extension/llm/runner/text_llm_runner.cpp [1] [2] [3] [4]; extension/llm/runner/text_llm_runner.h [5]

  • Updated Documentation: Enhanced method documentation in TextLLMRunner to describe the new functionality, including parameters like start_pos and the expected behavior. (extension/llm/runner/text_llm_runner.h [1] [2]

Error Handling Improvements

  • Validation for max_new_tokens: Added checks to ensure max_new_tokens is positive. If it is not, an InvalidArgument error is returned. This prevents invalid configurations during text generation. (extension/llm/runner/text_llm_runner.cpp extension/llm/runner/text_llm_runner.cppR129-R156)

  • Unit Test for Negative max_new_tokens: Created a new test case (GenerateFromPosErrorsWithNegativeMaxNewTokens) to verify that the generate_from_pos method correctly handles scenarios where max_new_tokens is negative. (extension/llm/runner/test/test_text_llm_runner.cpp extension/llm/runner/test/test_text_llm_runner.cppR325-R379)

As titled, this API allows us to support multi-turn conversation by
passing in a `start_pos` argument to `generate_from_pos`.
Copy link

pytorch-bot bot commented Jun 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/11570

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 4644bb9 with merge base 72a095f (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 11, 2025
@larryliu0820 larryliu0820 added the release notes: llm To capture llm specific changes in release notes label Jun 11, 2025
@facebook-github-bot
Copy link
Contributor

@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: llm To capture llm specific changes in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants