-
Notifications
You must be signed in to change notification settings - Fork 83
Add input processing pipeline + codegate-version pipeline step #91
Conversation
Nice work @jhrozek ! |
0717a9c
to
c74841e
Compare
Optional[tuple[str, int]]: A tuple containing the message content and | ||
its index, or None if no user message is found | ||
""" | ||
if request.get("messages") is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of llama.cpp provider, Continue plugin is not using OpenAI request format with "messages". Instead it sends a request with "prompt". The prompt string contains all messages that are separated by tokens (im_start, im_end).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I found out. That's why I added the condition that just skips the chat request in case there is no messages attribute. I'm now working on improvements to the pipeline that would convert the request to the OpenAI format and then the augmented OpenAI request back to the model format
src/codegate/pipeline/base.py
Outdated
pass | ||
|
||
|
||
class PipelineProcessor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is sequential pipeline processor, we can call it: SequentialPipelineProcessor.
In future, we can implement: ParallelPipelineProcessor, GraphPipelineProcessor, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you good idea. I will do the rename.
@@ -12,6 +12,7 @@ async def sse_stream_generator(stream: AsyncIterator[Any]) -> AsyncIterator[str] | |||
"""OpenAI-style SSE format""" | |||
try: | |||
async for chunk in stream: | |||
print(chunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this print?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry of course not that was a leftover debugging.
This adds a pipeline processing before the completion is ran where the request is either change or can be shortcut. This pipeline consists of steps, for now we implement a single step `CodegateVersion` that responds with the codegate version if the verbatim `codegate-version` string is found in the input. The pipeline also passes along a context, for now that is unused but I thought this would be where we store extracted code snippets etc. To avoid import loops, we also move the `BaseCompletionHandler` class to a new `completion` package. Since the shortcut replies are more or less simple strings, we add yet another package `providers/formatting` whose responsibility is to convert the string returned by the shortcut response to the format expected by the client, meaning either a reply or a stream of replies in the LLM-specific format. We use the `BaseCompletionHandler` as a way to convert to the LLM-specific format.
This adds a pipeline processing before the completion is ran where
the request is either change or can be shortcut. This pipeline consists
of steps, for now we implement a single step
CodegateVersion
thatresponds with the codegate version if the verbatim
codegate-version
string is found in the input.
The pipeline also passes along a context, for now that is unused but I
thought this would be where we store extracted code snippets etc.
To avoid import loops, we also move the
BaseCompletionHandler
class toa new
completion
package.Since the shortcut replies are more or less simple strings, we add yet
another package
providers/formatting
whose responsibility is toconvert the string returned by the shortcut response to the format
expected by the client, meaning either a reply or a stream of replies in
the LLM-specific format. We use the
BaseCompletionHandler
as a way toconvert to the LLM-specific format.
Fixes: #93
Related: #45