Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

Copilot DB integration. Keep DB objects in context to record at the end. #331

Merged
merged 4 commits into from
Dec 14, 2024

Conversation

aponcedeleonch
Copy link
Contributor

@aponcedeleonch aponcedeleonch commented Dec 13, 2024

The main purpose of this is to:

  1. Use the pipeline context to store the objects that are then going to be stored in DB. This should make the integration with Copilot simple
  2. Record the request at the end of the input pipeline. With this we wouldn't store the secrets included in the message request

Closes: #322, #297, #281

@aponcedeleonch aponcedeleonch marked this pull request as draft December 13, 2024 13:06
@aponcedeleonch aponcedeleonch marked this pull request as ready for review December 14, 2024 00:16
@aponcedeleonch aponcedeleonch changed the title WIP: Record the prompt at the end of pipeline. Keep DB objects in context Record the prompt at the end of pipeline. Keep DB objects in context Dec 14, 2024
Comment on lines 310 to 317
# Create the input request at the end so we make sure the secrets are obfuscated
self.context.add_input_request(
current_request, is_fim_request=self.is_fim, provider=provider
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit creates the request object that is going to be stored in DB at the end of the pipeline steps. Like this we wouldn't store the secrets incoming in the request message

Comment on lines +119 to +156
def add_input_request(
self, normalized_request: ChatCompletionRequest, is_fim_request: bool, provider: str
) -> None:
try:
if self.prompt_id is None:
self.prompt_id = str(uuid.uuid4())

request_str = json.dumps(normalized_request)

self.input_request = Prompt(
id=self.prompt_id,
timestamp=datetime.datetime.now(datetime.timezone.utc),
provider=provider,
type="fim" if is_fim_request else "chat",
request=request_str,
)
logger.debug(f"Added input request to context: {self.input_request}")
except Exception as e:
logger.warning(f"Failed to serialize input request: {normalized_request}", error=str(e))

def add_output(self, model_response: ModelResponse) -> None:
try:
if self.prompt_id is None:
logger.warning(f"Tried to record output without response: {model_response}")
return

if isinstance(model_response, BaseModel):
output_str = model_response.model_dump_json(exclude_none=True, exclude_unset=True)
else:
output_str = json.dumps(model_response)

self.output_responses.append(
Output(
id=str(uuid.uuid4()),
prompt_id=self.prompt_id,
timestamp=datetime.datetime.now(datetime.timezone.utc),
output=output_str,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moves storing of DB objects to the context. Notably, if an id is not created when adding an alert it will create one. This would enable to create alerts before the actual Prompt object

Comment on lines 349 to 354
# This should work for recording FIM and Chat to DB. Now we keep the objects that are going
# to be written to DB in the pipeline `context`. Copilot also uses pipelines and `context`.
# For some reason is only working for FIM. Investigatig and enabling on future PR.
# finally:
# if self.context_tracking:
# await self._db_recorder.record_context(self.context_tracking)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since now the pipeline is storing the DB objects Prompt, Output and Alert we only would need to execute the following at the end of the Copilot provider to get everything in DB:

self._db_recorder.record_context(self.context_tracking)

However, is not working as expected. Will more look into it later

@aponcedeleonch aponcedeleonch force-pushed the move-db-to-pipeline branch 2 times, most recently from 564d287 to b2bd4bc Compare December 14, 2024 07:58
With this change the objects that are going to be stored in DB are
kept in the `context` of the pipeline. The pipeline and its `context`
are used by all providers, including Copilot. We would need to find
a good place in Copilot provider to record the context in DB, e.g.
when all the chunks have been transmitted and the stream is about to
be closed.
Comment on lines +143 to +153
# **Needed for Copilot**. This is a hacky way of recording in DB the context
# when we see the last chunk. Ideally this should be done in a `finally` or
# `StopAsyncIteration` but Copilot streams in an infite while loop so is not
# possible
if len(chunk.choices) > 0 and chunk.choices[0].get("finish_reason", "") == "stop":
await self._record_to_db()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is what is allowing recording to DB copilot chats

@aponcedeleonch aponcedeleonch changed the title Record the prompt at the end of pipeline. Keep DB objects in context Copilot DB integration. Keep DB objects in context to record at the end. Dec 14, 2024
@aponcedeleonch aponcedeleonch merged commit 7154d67 into main Dec 14, 2024
3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redact secrets in database
1 participant