Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

Add input processing pipeline + codegate-version pipeline step #91

Merged
merged 3 commits into from
Nov 27, 2024

Conversation

jhrozek
Copy link
Contributor

@jhrozek jhrozek commented Nov 25, 2024

This adds a pipeline processing before the completion is ran where
the request is either change or can be shortcut. This pipeline consists
of steps, for now we implement a single step CodegateVersion that
responds with the codegate version if the verbatim codegate-version
string is found in the input.

The pipeline also passes along a context, for now that is unused but I
thought this would be where we store extracted code snippets etc.

To avoid import loops, we also move the BaseCompletionHandler class to
a new completion package.

Since the shortcut replies are more or less simple strings, we add yet
another package providers/formatting whose responsibility is to
convert the string returned by the shortcut response to the format
expected by the client, meaning either a reply or a stream of replies in
the LLM-specific format. We use the BaseCompletionHandler as a way to
convert to the LLM-specific format.

Fixes: #93
Related: #45

@lukehinds
Copy link

Nice work @jhrozek !

@jhrozek jhrozek force-pushed the pipeline branch 2 times, most recently from 0717a9c to c74841e Compare November 26, 2024 12:17
@jhrozek jhrozek marked this pull request as ready for review November 26, 2024 12:17
Optional[tuple[str, int]]: A tuple containing the message content and
its index, or None if no user message is found
"""
if request.get("messages") is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of llama.cpp provider, Continue plugin is not using OpenAI request format with "messages". Instead it sends a request with "prompt". The prompt string contains all messages that are separated by tokens (im_start, im_end).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I found out. That's why I added the condition that just skips the chat request in case there is no messages attribute. I'm now working on improvements to the pipeline that would convert the request to the OpenAI format and then the augmented OpenAI request back to the model format

pass


class PipelineProcessor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is sequential pipeline processor, we can call it: SequentialPipelineProcessor.
In future, we can implement: ParallelPipelineProcessor, GraphPipelineProcessor, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you good idea. I will do the rename.

@@ -12,6 +12,7 @@ async def sse_stream_generator(stream: AsyncIterator[Any]) -> AsyncIterator[str]
"""OpenAI-style SSE format"""
try:
async for chunk in stream:
print(chunk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this print?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry of course not that was a leftover debugging.

This adds a pipeline processing before the completion is ran where
the request is either change or can be shortcut. This pipeline consists
of steps, for now we implement a single step `CodegateVersion` that
responds with the codegate version if the verbatim `codegate-version`
string is found in the input.

The pipeline also passes along a context, for now that is unused but I
thought this would be where we store extracted code snippets etc.

To avoid import loops, we also move the `BaseCompletionHandler` class to
a new `completion` package.

Since the shortcut replies are more or less simple strings, we add yet
another package `providers/formatting` whose responsibility is to
convert the string returned by the shortcut response to the format
expected by the client, meaning either a reply or a stream of replies in
the LLM-specific format. We use the `BaseCompletionHandler` as a way to
convert to the LLM-specific format.
@lukehinds lukehinds merged commit f0c0b38 into stacklok:main Nov 27, 2024
@lukehinds lukehinds deleted the pipeline branch November 27, 2024 10:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce generic top level handler
3 participants