Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

Normalize and denormalize llamacpp streaming reply #121

Merged
merged 1 commit into from
Nov 29, 2024

Conversation

jhrozek
Copy link
Contributor

@jhrozek jhrozek commented Nov 28, 2024

At the moment we really need the denormalizer so that the blocking
pipeline can return a stream of ModelResponses and the denormalizer
would convert them to the CreateChatCompletionStreamResponse structure
that is then serialized to the client. This avoids any guessing or
special casing that would otherwise be needed in the
llamacpp_stream_generator which currently expected
Iterator[CreateChatCompletionStreamResponse].

Another change that simplifies the logic is that the
llamacpp_stream_generator now accepts an AsyncIterator instead of
just Iterator that the llamacpp completion hander was returning.
Again, this is to simplify the logic and pass the iterator from the
blocking pipeline. On the completion side we have a simple sync-to-async
wrapper.

Fixes: #94

@jhrozek jhrozek marked this pull request as ready for review November 29, 2024 08:09
Originally, I wanted to add the normalizers to convert the
`im_start`/`im_end` tags, but we worked around that by setting llamacpp
to use the OpenAI format.

We'll still need a normalizer for the vllm provider though.

At the moment we really need the denormalizer so that the blocking
pipeline can return a stream of `ModelResponse`s and the denormalizer
would convert them to the CreateChatCompletionStreamResponse structure
that is then serialized to the client. This avoids any guessing or
special casing that would otherwise be needed in the
`llamacpp_stream_generator` which currently expected
`Iterator[CreateChatCompletionStreamResponse]`.

Another change that simplifies the logic is that the
`llamacpp_stream_generator` now accepts an `AsyncIterator` instead of
just `Iterator` that the llamacpp completion hander was returning.
Again, this is to simplify the logic and pass the iterator from the
blocking pipeline. On the completion side we have a simple sync-to-async
wrapper.

Fixes: #94
@jhrozek jhrozek merged commit dc988a3 into stacklok:main Nov 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement the im_start / im_stop funkiness
2 participants