fix(redteam): redact harmful generation logs by ianw-oai · Pull Request #8657 · promptfoo/promptfoo

ianw-oai · 2026-04-12T00:00:52Z

Stops harmful generation debug logs from dumping request content.

Copilot

Pull request overview

This PR prevents harmful generation debug logs from including sensitive request content (e.g., purpose, config prompt, user email) by logging only non-sensitive metadata.

Changes:

Replaces the harmful-generation debug log body dump with a metadata-only log payload (lengths/flags/keys).
Adds a focused unit test ensuring secret request content is not present in debug logs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`src/providers/promptfoo.ts`	Introduces a helper to log only safe metadata and updates the debug statement to use it.
`test/providers/promptfoo.test.ts`	Adds a regression test verifying secrets (purpose/config/email) are not emitted via `logger.debug`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

promptfoo-scanner

👍 All Clear

I reviewed the changes to the Promptfoo harmful generation provider and its tests, focusing on LLM data flows, logging, and potential execution paths. The PR replaces detailed request logging with redacted metadata, reducing exposure of sensitive inputs like prompts and emails. No new LLM capabilities, inputs, or execution sinks were introduced. Overall, this change improves security posture and I did not find LLM security vulnerabilities introduced by this PR.

_{Minimum severity threshold: 🟡 Medium | To re-scan after changes, comment @promptfoo-scanner}
_{Learn more}

_{Was this helpful? 👍 Yes | 👎 No}

coderabbitai · 2026-04-12T00:03:54Z

📝 Walkthrough

Walkthrough

The changes add structured logging to the PromptfooHarmfulCompletionProvider to reduce exposure of sensitive request data. A new helper function getHarmfulGenerationLogMetadata() extracts non-sensitive fields (remote URL, harm category, count, purpose length, email presence, version, config presence, and config keys) from the request body. The debug logging call is updated to use this structured metadata instead of stringifying the entire request body. A corresponding unit test verifies that sensitive values (purpose text, config prompt, email) are excluded from debug output while expected metadata fields remain present.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding redaction to harmful generation logs to prevent dumping sensitive request content.
Description check	✅ Passed	The description directly relates to the changeset, explaining that the PR stops harmful generation debug logs from dumping request content, which aligns with the code changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch jimothy/redact-harmful-generation-logs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

test/providers/promptfoo.test.ts (1)

101-107: Optional: assert structured debug args directly for stronger test precision.

Stringifying all debug calls can become noisy over time. Consider asserting message/context arguments from the targeted call.

💡 Suggested test assertion refinement

-      const debugLogs = JSON.stringify(debugSpy.mock.calls);
-      expect(debugLogs).toContain('[HarmfulCompletionProvider] Calling generate harmful API');
-      expect(debugLogs).toContain('purposeLength');
-      expect(debugLogs).toContain('configKeys');
-      expect(debugLogs).not.toContain('secret-purpose-sentinel');
-      expect(debugLogs).not.toContain('secret-config-sentinel');
-      expect(debugLogs).not.toContain('test@example.com');
+      expect(debugSpy).toHaveBeenCalledWith(
+        '[HarmfulCompletionProvider] Calling generate harmful API',
+        expect.objectContaining({
+          purposeLength: 'secret-purpose-sentinel'.length,
+          configKeys: ['prompt'],
+        }),
+      );
+      const loggedContext = debugSpy.mock.calls[0]?.[1];
+      const contextStr = JSON.stringify(loggedContext);
+      expect(contextStr).not.toContain('secret-purpose-sentinel');
+      expect(contextStr).not.toContain('secret-config-sentinel');
+      expect(contextStr).not.toContain('test@example.com');

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/providers/promptfoo.test.ts` around lines 101 - 107, Replace the
stringified-wide assertions on debugSpy in test/providers/promptfoo.test.ts with
targeted assertions on the specific debug call: find the call in
debugSpy.mock.calls that includes the '[HarmfulCompletionProvider] Calling
generate harmful API' message, assert that that call's context/args contain keys
like purposeLength and configKeys and do not contain sensitive values
'secret-purpose-sentinel', 'secret-config-sentinel' or 'test@example.com'; use
debugSpy, the literal message '[HarmfulCompletionProvider] Calling generate
harmful API' and the keys purposeLength/configKeys to locate and assert the
structured args rather than checking the full JSON string.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/providers/promptfoo.test.ts`:
- Around line 101-107: Replace the stringified-wide assertions on debugSpy in
test/providers/promptfoo.test.ts with targeted assertions on the specific debug
call: find the call in debugSpy.mock.calls that includes the
'[HarmfulCompletionProvider] Calling generate harmful API' message, assert that
that call's context/args contain keys like purposeLength and configKeys and do
not contain sensitive values 'secret-purpose-sentinel', 'secret-config-sentinel'
or 'test@example.com'; use debugSpy, the literal message
'[HarmfulCompletionProvider] Calling generate harmful API' and the keys
purposeLength/configKeys to locate and assert the structured args rather than
checking the full JSON string.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e1c97205-49e6-4784-9160-88669d6c29b6

📥 Commits

Reviewing files that changed from the base of the PR and between 3517c0f and cd624ae.

📒 Files selected for processing (2)

src/providers/promptfoo.ts
test/providers/promptfoo.test.ts

…l-generation-logs

fix(redteam): redact harmful generation logs

cd624ae

Copilot AI review requested due to automatic review settings April 12, 2026 00:00

ianw-oai requested review from mldangelo-oai and zcrab-oai as code owners April 12, 2026 00:00

Copilot started reviewing on behalf of ianw-oai April 12, 2026 00:01 View session

Copilot AI reviewed Apr 12, 2026

View reviewed changes

promptfoo-scanner bot reviewed Apr 12, 2026

View reviewed changes

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

mldangelo-oai added 7 commits April 12, 2026 15:43

fix(redteam): redact harmful generation errors

5c73539

Merge remote-tracking branch 'origin/main' into jimothy/redact-harmfu…

47cb609

…l-generation-logs

fix(redteam): harden harmful generation log metadata

a339d08

Merge remote-tracking branch 'origin/main' into jimothy/redact-harmfu…

2a1ed4f

…l-generation-logs

test(providers): cover harmful generation redaction edges

699bf16

Merge remote-tracking branch 'origin/main' into mdangelo/codex/docs-8657

22c7a84

docs(site): clarify redacted debug logs

0859b29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(redteam): redact harmful generation logs#8657

fix(redteam): redact harmful generation logs#8657
ianw-oai wants to merge 8 commits intomainfrom
jimothy/redact-harmful-generation-logs

ianw-oai commented Apr 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

promptfoo-scanner bot left a comment

Uh oh!

coderabbitai bot commented Apr 12, 2026

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ianw-oai commented Apr 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Apr 12, 2026

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants