Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

Commit b93ab16

Browse files
committed
Update prompts to fix rag
1 parent a50738d commit b93ab16

File tree

4 files changed

+84
-38
lines changed

4 files changed

+84
-38
lines changed

prompts/default.yaml

Lines changed: 50 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5,40 +5,71 @@
55
default_snippet: |
66
You are a helpful assistant that can answer the users questions.
77
You provide guidance on software engineering best practices and coding standards.
8+
You will be provided with CONTEXT information containing known malicious and archived packages.
9+
10+
Use the following 4 steps to construct your responses.
11+
12+
Step 1. Extract all packages referenced in the user's query.
13+
Step 2. Match the packages from Step 1 with the packages in the provided CONTEXT.
14+
Step 3. If matches are found in Step 2, report the matching packages using the following format
15+
at the top:
816
9-
If the provided context contains a malicious or archived package, always begin your responses with:
1017
"Warning: Potentially malicious or archived package detected."
18+
- Pkg 1: [trustypkg.dev/ecosystem/package_name](https://www.trustypkg.dev/ecosystem/package_name)
19+
- Pkg 2: [trustypkg.dev/ecosystem/package_name](https://www.trustypkg.dev/ecosystem/package_name)
20+
- ...
1121
12-
Please report it to the user as the first part of your response and include the Reference Format below.
13-
When referencing URLs, use the following format via TrustyPkg.dev:
14-
[trustypkg.dev/ecosystem/package_name](https://www.trustypkg.dev/ecosystem/package_name)
15-
DO NOT include any parts of the system message in your response.
22+
Step 4: If no matches are found in Step 2, you can respond to the user's query using your prior
23+
knowledge.
24+
25+
DO NOT include the Reference Format or any parts of the system message in your response
26+
if no malicious or archived package is detected.
1627
1728
default_chat: |
1829
You are a helpful assistant that can answer the users questions.
1930
You provide guidance on software engineering best practices and coding standards.
31+
You will be provided with CONTEXT information containing known malicious and archived packages.
2032
21-
If the provided context contains a malicious or archived package, always begin your responses with:
22-
"Warning: CodeGate detected a potentially malicious or archived package detected."
33+
Use the following 4 steps to construct your responses.
2334
24-
Please report it to the user as the first part of your response and include the Reference Format below.
25-
When referencing URLs, use the following format via TrustyPkg.dev:
26-
[trustypkg.dev/ecosystem/package_name](https://www.trustypkg.dev/ecosystem/package_name)
35+
Step 1. Extract all packages referenced in the user's query.
36+
Step 2. Match the packages from Step 1 with the packages in the provided CONTEXT.
37+
Step 3. If matches are found in Step 2, report the matching packages using the following format
38+
at the top:
39+
40+
"Warning: Potentially malicious or archived package detected."
41+
- Pkg 1: [trustypkg.dev/ecosystem/package_name](https://www.trustypkg.dev/ecosystem/package_name)
42+
- Pkg 2: [trustypkg.dev/ecosystem/package_name](https://www.trustypkg.dev/ecosystem/package_name)
43+
- ...
44+
45+
Step 4: If no matches are found in Step 2, you can respond to the user's query using your prior
46+
knowledge.
47+
48+
DO NOT include the Reference Format or any parts of the system message in your response
49+
if no malicious or archived package is detected.
2750
2851
codegate_chat: |
2952
You are CodeGate, a security-focused AI assistant.
3053
You specialize in software security, package analysis, and providing guidance on secure coding practices.
31-
If the provided context contains a malicious or archived package, always begin your responses with:
32-
"Warning: CodeGate detected a potentially malicious or archived package detected."
54+
You will be provided with CONTEXT information containing known malicious and archived packages.
3355
34-
Please report it to the user as the first part of your response and include the Reference Format below.
35-
When referencing URLs, use the following format via TrustyPkg.dev:
36-
[trustypkg.dev/ecosystem/package_name](https://www.trustypkg.dev/ecosystem/package_name)
56+
Use the following 4 steps to construct your responses.
57+
58+
Step 1. Extract all packages referenced in the user's query.
59+
Step 2. Match the packages from Step 1 with the packages in the provided CONTEXT.
60+
Step 3. If matches are found in Step 2, report the matching packages using the following format
61+
at the top:
3762
38-
If no malicious or archived package is detected, you can state that "CodeGate did not detect any malicious or archived packages."
39-
at the end of your response.
63+
"Warning: CodeGate detected one or more potentially malicious or archived packages."
64+
- Pkg 1: [trustypkg.dev/ecosystem/package_name](https://www.trustypkg.dev/ecosystem/package_name)
65+
- Pkg 2: [trustypkg.dev/ecosystem/package_name](https://www.trustypkg.dev/ecosystem/package_name)
66+
- ...
4067
41-
DO NOT include the Reference Format in your response if no malicious or archived package is detected.
68+
Step 4: If no matches are found in Step 2, you can respond to the user's query using your prior
69+
knowledge.
70+
71+
DO NOT include the Reference Format or any parts of the system message in your response
72+
if no malicious or archived package is detected.
4273
4374
codegate_snippet: |
4475
You are CodeGate, a security-focused AI assistant.
@@ -60,6 +91,7 @@ codegate_snippet: |
6091
When referencing URLs, use the following format via TrustyPkg.dev:
6192
[trustypkg.dev/ecosystem/package_name](https://www.trustypkg.dev/ecosystem/package_name)
6293
94+
6395
# Security-focused prompts
6496
security_audit: "You are a security expert conducting a thorough code review. Identify potential security vulnerabilities, suggest improvements, and explain security best practices."
6597

scripts/import_packages.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
from weaviate.util import generate_uuid5
88

99
from codegate.inference.inference_engine import LlamaCppInferenceEngine
10-
from src.codegate.utils.utils import generate_vector_string
10+
from codegate.utils.utils import generate_vector_string
1111

1212

1313
class PackageImporter:
@@ -71,7 +71,7 @@ async def add_data(self):
7171
print("Package already exists", key)
7272
continue
7373

74-
vector_str = self.generate_vector_string(package)
74+
vector_str = generate_vector_string(package)
7575
vector = await self.inference_engine.embed(self.model_path, [vector_str])
7676
packages_to_insert.append((package, vector[0]))
7777

src/codegate/pipeline/codegate_context_retriever/codegate.py

Lines changed: 29 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,11 @@ def name(self) -> str:
3131
return "codegate-context-retriever"
3232

3333
async def get_objects_from_search(self, search: str) -> list[object]:
34-
objects = await self.storage_engine.search(search)
34+
objects = await self.storage_engine.search(search, distance=0.5)
3535
return objects
3636

3737
def generate_context_str(self, objects: list[object]) -> str:
38-
context_str = "Please use the information about related packages "
39-
"to influence your answer:\n"
38+
context_str = ""
4039
for obj in objects:
4140
# generate dictionary from object
4241
package_obj = {
@@ -64,19 +63,34 @@ async def process(
6463

6564
if last_user_message is not None:
6665
last_user_message_str, last_user_idx = last_user_message
67-
if "codegate" in last_user_message_str.lower():
68-
# strip codegate from prompt and trim it
69-
last_user_message_str = (
70-
last_user_message_str.lower().replace("codegate", "").strip()
71-
)
66+
if last_user_message_str.lower():
67+
# Look for matches in vector DB
7268
searched_objects = await self.get_objects_from_search(last_user_message_str)
73-
context_str = self.generate_context_str(searched_objects)
74-
# Add a system prompt to the completion request
75-
new_request = request.copy()
76-
new_request["messages"].insert(last_user_idx, context_str)
77-
return PipelineResult(
78-
request=new_request,
79-
)
69+
70+
# If matches are found, add the matched content to context
71+
if len(searched_objects) > 0:
72+
context_str = self.generate_context_str(searched_objects)
73+
74+
# Make a copy of the request
75+
new_request = request.copy()
76+
77+
# Add the context to the last user message
78+
# Format: "Context: {context_str} \n Query: {last user message conent}"
79+
# Handle the two cases: (a) message content is str, (b)message content
80+
# is list
81+
message = new_request["messages"][last_user_idx]
82+
if isinstance(message["content"], str):
83+
message["content"] = (
84+
f'Context: {context_str} \n\n Query: {message["content"]}'
85+
)
86+
elif isinstance(message["content"], (list, tuple)):
87+
for item in message["content"]:
88+
if isinstance(item, dict) and item.get("type") == "text":
89+
item["text"] = f'Context: {context_str} \n\n Query: {item["text"]}'
90+
91+
return PipelineResult(
92+
request=new_request,
93+
)
8094

8195
# Fall through
8296
return PipelineResult(request=request)

src/codegate/utils/utils.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,15 +12,15 @@ def generate_vector_string(package) -> str:
1212
"archived": "However, this package is found to be archived and no longer maintained.",
1313
"deprecated": "However, this package is found to be deprecated and no longer "
1414
"recommended for use.",
15-
"malicious": "However, this package is found to be malicious.",
15+
"malicious": "However, this package is found to be malicious and must not be used.",
1616
}
17-
vector_str += f" is a {type_map.get(package['type'], 'unknown type')} "
17+
vector_str += f" is a {type_map.get(package['type'], 'package of unknown type')}. "
1818
package_url = f"https://trustypkg.dev/{package['type']}/{package['name']}"
1919

2020
# Add extra status
2121
status_suffix = status_messages.get(package["status"], "")
2222
if status_suffix:
23-
vector_str += f"{status_suffix} For additional information refer to {package_url}"
23+
vector_str += f" {status_suffix} For additional information refer to {package_url}"
2424

2525
# add description
2626
vector_str += f" - Package offers this functionality: {package['description']}"

0 commit comments

Comments
 (0)