Assess fidelity of secret pattern matching #209
Description
A considerable number of the patterns in signatures.yml
will match on secret variable names, such as OPENAI_API_KEY
in addition to or in place of the secret key or token itself. This is an artefact of the previous secret blocking implementation.
- Amazon:
- Access Key: (?:A3T[A-Z0-9]|AKIA|AGPA|AIDA|AROA|AIPA|ANPA|ANVA|ASIA|ABIA|ACCA)[A-Z0-9]{16}
- Secret Access Key Variable: (?i)(amazon|amz|aws)[-_]{0,1}(secret)[-_]{0,1}((access)[-_]{0,1}){0,1}key
# - Cognito User Pool ID: (?i)us-[a-z]{2,}-[a-z]{4,}-\d{1,}
- RDS Password: (?i)(rds\-master\-password|db\-password)
- S3 Private Key Variable: (?i)AWS_S3_PRIVATE_KEY|s3_key|S3_PRIVATE_KEY
- Security Token Header Variable: (?i)X-Amz-Security-Token
- API Gateway Key Source Header Variable: (?i)x-amazon-apigateway-api-key-source
- S3 Bucket: (?i)AWS_S3_BUCKET|s3_bucket
- SNS Confirmation URL: (?i)https:\/\/sns\.[a-z0-9-]+\.amazonaws\.com\/?Action=ConfirmSubscription&Token=[a-zA-Z0-9-=_]+
- SES SMTP Password Variable: (?i)ses_smtp_password
- AWS Private Key Variable: (?i)ec2\-private\-key|EC2_PRIVATE_KEY
- MWS Token: (amzn\.mws\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})
- AppSync GraphQL Key: \bda2-[a-z0-9]{26}
- Microsoft:
- Azure API Key Variable: (?i)Ocp-Apim-Subscription-Key
- Azure Functions Key Header Variable: (?i)x-functions-key
Now with on-the-fly encryption we must be precise with the strings which are encrypted - if we obfuscate an entire line in a user's code prompt, including the variable name, it could cause the LLM to produce mangled output. We also want to avoid adding spurious claims of encrypting x amount of nonexistent secrets to the response.
This task will focus on assessing the changes needed to the way the patterns are matched in order to improve the matching fidelity. E.g. We can still detect on OPENAI_API_KEY : <key>
, but the key itself should be within a separate matching group so it can be extracted and encrypted exclusively.