Problem
Write/Edit content inspection is currently deterministic regex patterns only. While this catches common destructive patterns, secrets, and obfuscation, creative or domain-specific payloads can slip through. The deterministic layer is fast and catches the obvious cases, but has blind spots for:
- Obfuscated code that doesn't match known patterns
- Context-dependent payloads (e.g., modifying package.json scripts to run something malicious)
- Novel exfiltration techniques that avoid pattern signatures
Proposed solution
Add an optional LLM inspection layer for Write/Edit content:
- When content inspection flags something as suspicious (partial match, heuristic trigger) but not definitively malicious, route to the LLM with the content + file path + recent conversation context
- For high-risk file types (shell scripts, CI configs, package manifests, credential files), optionally always route through LLM regardless of deterministic results
- LLM prompt context: "This content is about to be written to [path]. Given the recent conversation context, should this write be allowed?"
This complements the deterministic layer — the fast path handles 95% of cases, the LLM handles the ambiguous remainder.
Context
Raised in the Show HN discussion by several commenters:
- gruez: "given that you allow npm test, it's not too hard to bypass protections by first modifying package.json so npm test runs an evil command"
- injidup: "what about simply a base64 encoded string of text dropped into the code designed to be unpacked and evaluated later... Will any of these fast scanning heuristics work against such attacks?"
- ibrahim_h: "the scariest exfiltration pattern isn't a single bad command, it's a chain of totally normal ones. Agent reads .env, writes a script that includes those values, then runs it. Every step looks fine individually."
Committed to on HN: "LLM inspection for Write/Edit: for content that's suspicious but doesn't match any deterministic pattern, route it to the LLM for a second opinion"
Problem
Write/Edit content inspection is currently deterministic regex patterns only. While this catches common destructive patterns, secrets, and obfuscation, creative or domain-specific payloads can slip through. The deterministic layer is fast and catches the obvious cases, but has blind spots for:
Proposed solution
Add an optional LLM inspection layer for Write/Edit content:
This complements the deterministic layer — the fast path handles 95% of cases, the LLM handles the ambiguous remainder.
Context
Raised in the Show HN discussion by several commenters:
Committed to on HN: "LLM inspection for Write/Edit: for content that's suspicious but doesn't match any deterministic pattern, route it to the LLM for a second opinion"