Normalize and denormalize llamacpp streaming reply #121

jhrozek · 2024-11-28T16:11:47Z

At the moment we really need the denormalizer so that the blocking
pipeline can return a stream of ModelResponses and the denormalizer
would convert them to the CreateChatCompletionStreamResponse structure
that is then serialized to the client. This avoids any guessing or
special casing that would otherwise be needed in the
llamacpp_stream_generator which currently expected
Iterator[CreateChatCompletionStreamResponse].

Another change that simplifies the logic is that the
llamacpp_stream_generator now accepts an AsyncIterator instead of
just Iterator that the llamacpp completion hander was returning.
Again, this is to simplify the logic and pass the iterator from the
blocking pipeline. On the completion side we have a simple sync-to-async
wrapper.

Fixes: #94

Originally, I wanted to add the normalizers to convert the `im_start`/`im_end` tags, but we worked around that by setting llamacpp to use the OpenAI format. We'll still need a normalizer for the vllm provider though. At the moment we really need the denormalizer so that the blocking pipeline can return a stream of `ModelResponse`s and the denormalizer would convert them to the CreateChatCompletionStreamResponse structure that is then serialized to the client. This avoids any guessing or special casing that would otherwise be needed in the `llamacpp_stream_generator` which currently expected `Iterator[CreateChatCompletionStreamResponse]`. Another change that simplifies the logic is that the `llamacpp_stream_generator` now accepts an `AsyncIterator` instead of just `Iterator` that the llamacpp completion hander was returning. Again, this is to simplify the logic and pass the iterator from the blocking pipeline. On the completion side we have a simple sync-to-async wrapper. Fixes: #94

jhrozek force-pushed the llama_pipeline branch from 80718e6 to 1d647ea Compare November 29, 2024 08:09

jhrozek marked this pull request as ready for review November 29, 2024 08:09

jhrozek force-pushed the llama_pipeline branch from 1d647ea to 79d68ce Compare November 29, 2024 08:10

lukehinds approved these changes Nov 29, 2024

View reviewed changes

jhrozek merged commit dc988a3 into stacklok:main Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Normalize and denormalize llamacpp streaming reply #121

Normalize and denormalize llamacpp streaming reply #121

Uh oh!

jhrozek commented Nov 28, 2024

Uh oh!

Uh oh!

Normalize and denormalize llamacpp streaming reply #121

Normalize and denormalize llamacpp streaming reply #121

Uh oh!

Conversation

jhrozek commented Nov 28, 2024

Uh oh!

Uh oh!