Skip to content

server: fix regression on streamed non-chat completion w/ stops #13785

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 26, 2025

Conversation

ochafik
Copy link
Collaborator

@ochafik ochafik commented May 25, 2025

Fixes #13780 (regression from #12379)

Make new message differ less strict: partial stop words aren't erased by server_context::process_token, but full stop words are (which was breaking the wrong assumption of the diffing logic)

(added slow server test for non-chat completion + stream + stop combo)

@ochafik ochafik marked this pull request as ready for review May 25, 2025 22:28
@ochafik ochafik requested a review from ngxson as a code owner May 25, 2025 23:25
@github-actions github-actions bot added examples python python script changes server labels May 25, 2025
@ochafik ochafik changed the title server: fix completion diff regression server: fix streamed completion regression May 25, 2025
@ochafik ochafik changed the title server: fix streamed completion regression server: fix streamed non-chat completion regression May 25, 2025
@ochafik ochafik changed the title server: fix streamed non-chat completion regression server: fix regression on streamed non-chat completion w/ stops May 26, 2025
@ochafik ochafik added the bugfix fixes an issue or bug label May 26, 2025
@ochafik ochafik merged commit f13847c into ggml-org:master May 26, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes an issue or bug examples python python script changes server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: terminate called after throwing an instance of 'std::runtime_error'
2 participants