Skip to content

fix(redteam): redact harmful generation logs#8657

Open
ianw-oai wants to merge 8 commits intomainfrom
jimothy/redact-harmful-generation-logs
Open

fix(redteam): redact harmful generation logs#8657
ianw-oai wants to merge 8 commits intomainfrom
jimothy/redact-harmful-generation-logs

Conversation

@ianw-oai
Copy link
Copy Markdown
Contributor

Stops harmful generation debug logs from dumping request content.

Copilot AI review requested due to automatic review settings April 12, 2026 00:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents harmful generation debug logs from including sensitive request content (e.g., purpose, config prompt, user email) by logging only non-sensitive metadata.

Changes:

  • Replaces the harmful-generation debug log body dump with a metadata-only log payload (lengths/flags/keys).
  • Adds a focused unit test ensuring secret request content is not present in debug logs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/providers/promptfoo.ts Introduces a helper to log only safe metadata and updates the debug statement to use it.
test/providers/promptfoo.test.ts Adds a regression test verifying secrets (purpose/config/email) are not emitted via logger.debug.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@promptfoo-scanner promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 All Clear

I reviewed the changes to the Promptfoo harmful generation provider and its tests, focusing on LLM data flows, logging, and potential execution paths. The PR replaces detailed request logging with redacted metadata, reducing exposure of sensitive inputs like prompts and emails. No new LLM capabilities, inputs, or execution sinks were introduced. Overall, this change improves security posture and I did not find LLM security vulnerabilities introduced by this PR.

Minimum severity threshold: 🟡 Medium | To re-scan after changes, comment @promptfoo-scanner
Learn more


Was this helpful?  👍 Yes  |  👎 No 

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

📝 Walkthrough

Walkthrough

The changes add structured logging to the PromptfooHarmfulCompletionProvider to reduce exposure of sensitive request data. A new helper function getHarmfulGenerationLogMetadata() extracts non-sensitive fields (remote URL, harm category, count, purpose length, email presence, version, config presence, and config keys) from the request body. The debug logging call is updated to use this structured metadata instead of stringifying the entire request body. A corresponding unit test verifies that sensitive values (purpose text, config prompt, email) are excluded from debug output while expected metadata fields remain present.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding redaction to harmful generation logs to prevent dumping sensitive request content.
Description check ✅ Passed The description directly relates to the changeset, explaining that the PR stops harmful generation debug logs from dumping request content, which aligns with the code changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jimothy/redact-harmful-generation-logs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/providers/promptfoo.test.ts (1)

101-107: Optional: assert structured debug args directly for stronger test precision.

Stringifying all debug calls can become noisy over time. Consider asserting message/context arguments from the targeted call.

💡 Suggested test assertion refinement
-      const debugLogs = JSON.stringify(debugSpy.mock.calls);
-      expect(debugLogs).toContain('[HarmfulCompletionProvider] Calling generate harmful API');
-      expect(debugLogs).toContain('purposeLength');
-      expect(debugLogs).toContain('configKeys');
-      expect(debugLogs).not.toContain('secret-purpose-sentinel');
-      expect(debugLogs).not.toContain('secret-config-sentinel');
-      expect(debugLogs).not.toContain('test@example.com');
+      expect(debugSpy).toHaveBeenCalledWith(
+        '[HarmfulCompletionProvider] Calling generate harmful API',
+        expect.objectContaining({
+          purposeLength: 'secret-purpose-sentinel'.length,
+          configKeys: ['prompt'],
+        }),
+      );
+      const loggedContext = debugSpy.mock.calls[0]?.[1];
+      const contextStr = JSON.stringify(loggedContext);
+      expect(contextStr).not.toContain('secret-purpose-sentinel');
+      expect(contextStr).not.toContain('secret-config-sentinel');
+      expect(contextStr).not.toContain('test@example.com');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/providers/promptfoo.test.ts` around lines 101 - 107, Replace the
stringified-wide assertions on debugSpy in test/providers/promptfoo.test.ts with
targeted assertions on the specific debug call: find the call in
debugSpy.mock.calls that includes the '[HarmfulCompletionProvider] Calling
generate harmful API' message, assert that that call's context/args contain keys
like purposeLength and configKeys and do not contain sensitive values
'secret-purpose-sentinel', 'secret-config-sentinel' or 'test@example.com'; use
debugSpy, the literal message '[HarmfulCompletionProvider] Calling generate
harmful API' and the keys purposeLength/configKeys to locate and assert the
structured args rather than checking the full JSON string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/providers/promptfoo.test.ts`:
- Around line 101-107: Replace the stringified-wide assertions on debugSpy in
test/providers/promptfoo.test.ts with targeted assertions on the specific debug
call: find the call in debugSpy.mock.calls that includes the
'[HarmfulCompletionProvider] Calling generate harmful API' message, assert that
that call's context/args contain keys like purposeLength and configKeys and do
not contain sensitive values 'secret-purpose-sentinel', 'secret-config-sentinel'
or 'test@example.com'; use debugSpy, the literal message
'[HarmfulCompletionProvider] Calling generate harmful API' and the keys
purposeLength/configKeys to locate and assert the structured args rather than
checking the full JSON string.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e1c97205-49e6-4784-9160-88669d6c29b6

📥 Commits

Reviewing files that changed from the base of the PR and between 3517c0f and cd624ae.

📒 Files selected for processing (2)
  • src/providers/promptfoo.ts
  • test/providers/promptfoo.test.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants