Credential leakage in agent output: systemic failure to redact secrets in chat and reasoning blocks

## Summary

Agents running Hermes consistently leak credential values in chat output and reasoning/thinking blocks. Despite having credential-safety instructions, the model writes literal passwords, API keys, and tokens into responses -- especially when explaining what it "fixed."

This is a systemic, recurring failure -- not a one-time mistake.

## Root Cause

1. Credentials stored in agent memory get quoted verbatim when referenced in explanations
2. "Meta-discussion" is the most dangerous trigger -- an agent that just fixed a credential exposure will write the value again to "show what it fixed"
3. Reasoning/thinking blocks leak credentials in interfaces where they are visible (Discord, Telegram, CLI)
4. Post-hoc instructions are not sufficient -- the model violates them in the same turn when explaining past actions

## Impact

- Severity: Critical. Actual passwords, API keys, and auth tokens written into chat transcripts.
- Requires credential rotation each time it occurs.
- Erodes user trust permanently.

## Proposed Solutions

### Short-term (output pipeline)
- Pre-delivery regex scrubber: scan model output for patterns matching known credential formats and auto-redact before delivery
- Think-block redaction: apply same filtering to reasoning blocks, not just final output

### Medium-term (framework)
- Taint tracking: mark strings originating from credential sources (.env, credential pool) and refuse to emit tainted values in user-facing output

### Long-term (architecture)
- Abstract credentials: agents should never hold credential VALUES in context -- only opaque references resolved by the tool layer at execution time

## Note

Users are trying to solve this with custom skills and system prompt hardening -- none of it works because the model violates its own instructions. The fix must be in the output pipeline, not the prompt.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Credential leakage in agent output: systemic failure to redact secrets in chat and reasoning blocks #20785

Summary

Root Cause

Impact

Proposed Solutions

Short-term (output pipeline)

Medium-term (framework)

Long-term (architecture)

Note

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Credential leakage in agent output: systemic failure to redact secrets in chat and reasoning blocks #20785

Description

Summary

Root Cause

Impact

Proposed Solutions

Short-term (output pipeline)

Medium-term (framework)

Long-term (architecture)

Note

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions