Skip to content

Conversation

@jhrozek
Copy link
Owner

@jhrozek jhrozek commented Jan 9, 2025

  • Add invokehttp to dummy data - We'd like to write unit tests that would ideally work equally well with the released docker container as well as for local development. Let's add a well-known malicious library to the data set for testing.

  • Add data types for pipeline routing, add route for OpenAI - We just had a simple if-else for pipeline routing. Instead, let's add a data type that includes the paths and optionally a target_url. We can use that to add a routing for OpenAI which we'll use for testing the Copilot provider. In the future we can make the routings pluggable.

  • When processing forwarded HTTP request, don't process until we have received the whole body - It might happen that the proxied request arrives in two chunks - first just the headers and then the body. In that case we would have sent just the headers with an empty body through the pipeline which might trigger errors like "400 Client Error", then the body would arrive in the next chunk. This patch changes the logic to keep the whole body in the buffer until it is complete and can be processed by the pipeline and sent off to the server.

  • Add integration tests for the copilot provider - Since the copilot provider is a proxy, we add a "requester" module that depending on the provider makes a request either using raw python requests like earlier or by setting a proxy and using a CA cert file.
    To be able to add more tests, we also add more kinds of checks, in addition to the existing one which makes sure the reply is like the expected one using cosine distance, we also add checks that make sure the LLM reply contains or doesn't contain a string.

We use those to add a test that ensures that the copilot provider chat works and that the copilot chat refuses to generate code snippet with a malicious package.

To be able to run a subset of tests, we also add the ability to select a subset of tests based on a provider (codegate_providers) or the test name (codegate_test_names)

These serve as the base for further integration tests.

To run them, call:

CODEGATE_PROVIDERS=copilot \
CA_CERT_FILE=/Users/you/devel/codegate/codegate_volume/certs/ca.crt \
ENV_COPILOT_KEY=your-openapi-key \
python tests/integration/integration_tests.py

Related: #402
Fixes: #517

lukehinds and others added 30 commits December 14, 2024 08:02
Placeholder needed for UI Cert Download
With this change the objects that are going to be stored in DB are
kept in the `context` of the pipeline. The pipeline and its `context`
are used by all providers, including Copilot. We would need to find
a good place in Copilot provider to record the context in DB, e.g.
when all the chunks have been transmitted and the stream is about to
be closed.
Copilot DB integration. Keep DB objects in context to record at the end.
When querying Copilot, the LLM requires certain headers to be explicitly
set to accept a request. We were already parsing them and storing in
context and even using in the input pipeline, but forgot to use them in
the output pipeline to extract packages from the inbound (LLM-to-client)
snippets.

Fixes: #335
Pass in copilot extra headers when analyzing output packages
Allows certificate download via the dashboard
Create models directory under codegate_volume
This is not a real fix but a hotfix. The part that is needed and will be
reused later is that we need to select the proper pipeline for
FIM/non-FIM. What makes it a hotfix is that we shortcut for FIM requests
as if we had no pipeline - this is OK for now because our FIM output
pipeline is empty anyway, but we'll have to fix it. For some reason,
piping the FIM output through the output pipeline throws errors.

I'll look into that a bit more, but if we want to get the FIM output
unblocked and without errors, this helps.

Related: #351
A hotfix for the FIM pipeline
Remove unnecessary lazy loading mechanism from signatures since they are always
loaded during server startup when CodegateSecrets is created. This simplifies
the code and makes the initialization flow singular, which will reduce
uneeded lazy loading on every call
Bumps [uvicorn](https://github.com/encode/uvicorn) from 0.32.1 to 0.34.0.
- [Release notes](https://github.com/encode/uvicorn/releases)
- [Changelog](https://github.com/encode/uvicorn/blob/master/CHANGELOG.md)
- [Commits](Kludex/uvicorn@0.32.1...0.34.0)

---
updated-dependencies:
- dependency-name: uvicorn
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.24.0 to 0.25.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](pytest-dev/pytest-asyncio@v0.24.0...v0.25.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
lukehinds and others added 29 commits January 7, 2025 12:18
fix: invert the condition for certs output
You may need to run `poetry update` and `poetry install`

To set your env:

`poetry env use /your/path/to/python/3.12.1/bin/python3`
Reduce false positive matches by reducing the similarity distance
This will result in the following tags:

:latest
:v0
:v0.0
:v0.0.5

Along with the existing ref tag

Closes: #508
Fix copilot errors which cause client to hangup during FIM
feat: add action to publish openapi.json
Closes: #512

This PR drops sqlc related-files and adds the needed code to use
directly SQLAlchemy. It most parts of the code we were already using it,
hence the changes are not so drastic. See #512 issue for the reasoining
on why to remove `sqlc`.
fix: add monitoring for idle connections and close them
Fix issue causing copilot to hang after creating multiple sessions
Revert "fix: add monitoring for idle connections and close them"
We'd like to write unit tests that would ideally work equally well with
the released docker container as well as for local development. Let's
add a well-known malicious library to the data set for testing.
We just had a simple if-else for pipeline routing. Instead, let's add a
data type that includes the paths and optionally a `target_url`. We can
use that to add a routing for OpenAI which we'll use for testing the
Copilot provider.

In the future we can make the routings pluggable.
…eceived the whole body

It might happen that the proxied request arrives in two chunks - first
just the headers and then the body. In that case we would have sent just
the headers with an empty body through the pipeline which might trigger
errors like "400 Client Error", then the body would arrive in the next
chunk.

This patch changes the logic to keep the whole body in the buffer until
it is complete and can be processed by the pipeline and sent off to the
server.

Fixes: #517
Since the copilot provider is a proxy, we add a "requester" module that
depending on the provider makes a request either using raw python
requests like earlier or by setting a proxy and using a CA cert file.

To be able to add more tests, we also add more kinds of checks, in
addition to the existing one which makes sure the reply is like the
expected one using cosine distance, we also add checks that make sure
the LLM reply contains or doesn't contain a string.

We use those to add a test that ensures that the copilot provider chat
works and that the copilot chat refuses to generate code snippet with a
malicious package.

To be able to run a subset of tests, we also add the ability to select a
subset of tests based on a provider (`codegate_providers`) or the test name (`codegate_test_names`)

These serve as the base for further integration tests.

To run them, call:
```
CODEGATE_PROVIDERS=copilot \
CA_CERT_FILE=/Users/you/devel/codegate/codegate_volume/certs/ca.crt \
ENV_COPILOT_KEY=your-openapi-key \
python tests/integration/integration_tests.py
```

Related: #402
@jhrozek jhrozek closed this Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants