Skip to content
This repository was archived by the owner on Jun 5, 2025. It is now read-only.

On-the-fly secret scanning #119

Merged
merged 4 commits into from
Nov 28, 2024
Merged

On-the-fly secret scanning #119

merged 4 commits into from
Nov 28, 2024

Conversation

lukehinds
Copy link

This change introduces secrets on the fly scanning. It still requires

  • Response interception, e.g. outout from the LLM
  • Validation with an end to end IDE <> LLM flow

Luke Hinds added 4 commits November 28, 2024 15:23
This change introduces secrets on the fly scanning. It still requires

* Response interception, e.g. outout from the LLM
* Validation with an end to end IDE <> LLM flow
@lukehinds
Copy link
Author

Note, this will fail when #120 merges as I can't push formatted code and not keep getting caught up in format rebases from merge conflicts.

@lukehinds lukehinds linked an issue Nov 28, 2024 that may be closed by this pull request
Comment on lines +19 to +23
def translate_completion_input_params(self, kwargs: Dict) -> Optional[ChatCompletionRequest]:
try:
return ChatCompletionRequest(**kwargs)
except Exception as e:
raise ValueError(f"Invalid completion parameters: {str(e)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this functionality moved to the LLamaCppInputNormalizer and is not needed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, there was a lot of movement in the core engine and not enough docs. Let's merge this and clean up in a follow up. We have too many PRs in flight now.

Copy link
Contributor

@aponcedeleonch aponcedeleonch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, just left one comment :)


def generate_session_key(self, session_id):
"""Generates a session key with an associated timestamp."""
key = os.urandom(32) # Generate a 256-bit key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question, is this the recommended way in python to get random data? I see that we are importing cryptography, does it provide some function to securely generate randomness? (I know this is urandom and won't block so this is OK, just asking)

self.session_keys[session_id] = (key, time.time())
return key

def get_session_key(self, session_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one caller currently and it uses None as the session ID. Were you thinking about using the chat ID as the session? I guess it should be something the client doesn't set?


def decrypt_token(self, encrypted_token, session_id):
"""Decrypts a token and validates its timestamp to prevent replay attacks."""
key = self.get_session_key(session_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I was asking about the session ID was that I was wondering if we could make the session key single-use and delete after use to help against replay attacks.

Returns:
Absolute position in text
"""
lines = text.split("\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and now I understand why you asked me about line splits in the normalization..

so we don't handle those ourselves. I think we should test on windows how are the snippets sent from the client to the server..

if isinstance(request["messages"], list):
for msg in request["messages"]:
if msg.get("role") == "user" and msg.get("content") == extracted_string:
msg["content"] = protected_string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but get_last_user_message returns (msg, index), so you could just replace the message at that index instead of looking for it.
Let's not change it now and maybe file an issue, I think it would be nice to change get_last_user_message to return a named tuple instead so that it's more obvious

def _ensure_signatures_loaded(cls) -> None:
"""Ensure signatures are loaded before use."""
if not cls._signatures_loaded:
with cls._instance_lock:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I will borrow this pattern for the snippet extraction :-)

Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments inline. We should merge this and iterate.

@lukehinds
Copy link
Author

Left a couple of comments inline. We should merge this and iterate.

Sounds good, I am going to review the crypto code a few times again, I had to do a last minute refactor to a different library as py crypto is no longer maintained

@lukehinds lukehinds merged commit 906e84d into main Nov 28, 2024
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Secret System: Build Obfuscate on-the-fly secret protects
3 participants