This repository was archived by the owner on Jun 5, 2025. It is now read-only.
Normalize and denormalize llamacpp streaming reply #121
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
At the moment we really need the denormalizer so that the blocking
pipeline can return a stream of
ModelResponse
s and the denormalizerwould convert them to the CreateChatCompletionStreamResponse structure
that is then serialized to the client. This avoids any guessing or
special casing that would otherwise be needed in the
llamacpp_stream_generator
which currently expectedIterator[CreateChatCompletionStreamResponse]
.Another change that simplifies the logic is that the
llamacpp_stream_generator
now accepts anAsyncIterator
instead ofjust
Iterator
that the llamacpp completion hander was returning.Again, this is to simplify the logic and pass the iterator from the
blocking pipeline. On the completion side we have a simple sync-to-async
wrapper.
Fixes: #94