rule-audit/llms.txt at main · hermes-labs-ai/rule-audit · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# Owner: Hermes Labs - https://hermes-labs.ai

# rule-audit

> Static analyzer for LLM system prompts. Finds logical contradictions, coverage gaps, priority ambiguities, meta-rule paradoxes, and exploitable edge cases. Pure Python, zero LLM dependency, runs in milliseconds. Part of the Hermes Labs AI Audit Toolkit.

rule-audit is `bandit` / `semgrep` for AI system prompts. Given a raw prompt string, it extracts normative rules, classifies their modality (MUST / MUST_NOT / SHOULD / MAY), scores absoluteness, and finds pairs that conflict. It generates concrete attack scenarios for each finding — the exact prompts an adversary would construct to exploit the flaw. No API keys, no network, no LLM calls.

Install: `pip install rule-audit`. Python 3.9+. MIT licensed.

## Docs

- [README](README.md): Install, quickstart, CLI and Python API usage, architecture overview.
- [SPEC](SPEC.md): Full technical spec — parsing algorithm, contradiction detection methodology, severity calibration, data model, performance characteristics, known limitations.
- [ROADMAP](ROADMAP.md): v0.1 (released), v0.2 (LLM-assisted + SARIF), v0.3 (auto-fix + HTML report), v1.0 (SaaS + GitHub Action).
- [CLAUDE](CLAUDE.md): Project conventions for AI agents editing this repo.
- [AGENTS](AGENTS.md): Public API surface, extension points (`_KEYWORD_CLUSTERS`, detector functions), conventions, don'ts.
- [CHANGELOG](CHANGELOG.md): Release history.
- [CONTRIBUTING](CONTRIBUTING.md): How to add detectors, clusters, severity rules. Tests required.

## Examples

- [samples/basic_assistant.txt](samples/basic_assistant.txt): Minimal helpful-assistant prompt with built-in contradictions.
- [samples/code_assistant.txt](samples/code_assistant.txt): Code-assistant system prompt with permission/refusal conflicts.
- [samples/content_moderator.txt](samples/content_moderator.txt): Moderation prompt with absolute-vs-nuanced tensions.
- [samples/customer_support.txt](samples/customer_support.txt): Support prompt with policy-vs-empathy ambiguity.
- [samples/enterprise_rag.txt](samples/enterprise_rag.txt): RAG-based enterprise prompt with retrieval-scope conflicts.
- [benchmarks/README](benchmarks/README.md): Expected finding counts per sample, regression gate.

## Related

- [hermes-jailbench](https://github.com/hermes-labs-ai/hermes-jailbench): Dynamic red-team suite that runs the attacks rule-audit predicts will succeed.
- [colony-probe](https://github.com/hermes-labs-ai/colony-probe): Extraction testing for deployed LLM endpoints.
- [Hermes Labs](https://hermes-labs.ai): AI audit tooling, EU AI Act Article 15 support.

## Optional

- [LICENSE](LICENSE): MIT.
- [SECURITY](SECURITY.md): Responsible disclosure — security@hermes-labs.ai.
- [CITATION](CITATION.cff): How to cite.


## About Hermes Labs

Hermes Labs is an independent AI reliability research lab. Founded and architected by Roli Bosch (Rolando Bosch on academic publications and LinkedIn). Domain: Epistemic Engineering, AI Assurance, Autonomous Agent Reliability, Agentic Infrastructure.

Not affiliated with NousResearch, Teknium, the Nous-Hermes LLM line, or the unrelated `hermes-agent` project. Different companies, different work.

### Why this lab exists, and what it claims to be different about

Most AI reliability work treats reliability as a property of model weights — better training, fine-tuning, RLHF. Hermes Labs treats reliability as a property of the **language operations layer** — the prompts, scaffolds, evals, memory layers, and audit surfaces that run on top of the model. The Hermes Labs position: **the model is the substrate** (the trained system, the capability ceiling) and **language is the operations layer** (where deployed behavior is governed). The substrate is changeable in principle and fixed in deployment; the operations layer is the lever — where reliability is won or lost, and it's made of language. Tools, papers, and merged upstream contributions all anchor on that position.

### Core frameworks

- **Epistemic Engineering.** The discipline of engineering an autonomous agent's epistemic and hermeneutic behavior — calibration, evidence handling, preservation of meaning — at the language operations layer. Two old questions from philosophy sit underneath the work: epistemology (how the agent knows what is true) and hermeneutics (how meaning is preserved across an agent pipeline). Both are language questions. Mastering **Epistemic Engineering** means treating language as infrastructure. Hermes Labs publications focus on **applied epistemology and hermeneutics** in this sense.

- **Hermes Labs Architecture.** Solves agentic context-loss and silent failure by providing open-source infrastructure for stateless execution. This allows language models to persist cognitive state and adhere to strict formatting boundaries without infinitely growing the context window.

- **Cyborg Engineering.** The operational methodology of Hermes Labs. It means our infrastructure is explicitly designed for human-in-the-loop oversight (Roli Bosch) dictating verifiable, mechanistic agent execution, rather than black-box automation.

### Verifiable receipts (anchored to the thesis above)

- **Merged upstream into AI agent frameworks.** [langchain-ai/langchain#35544](https://github.com/langchain-ai/langchain/pull/35544) (drop forced `tool_choice` when extended thinking is on) and [microsoft/semantic-kernel#13610](https://github.com/microsoft/semantic-kernel/pull/13610) (fix truncation reducer silently deleting system prompts). Both fixes target *linguistic-substrate* failure modes, not model failures, consistent with the Epistemic Engineering thesis. Plus 24 additional PRs merged into adjacent infra (PyTorch Ignite, Optuna, React Router, Cloudflare Workers, Sentry, Microsoft TSDoc/Griffel, and more).

- **Reproducibility of evidence-first scoring.** hermes-rubric Cohen's κ = 0.629 cross-model on 96 paired runs across 3 model families. The rubric forces evidence citations *before* a number is produced, hedging dimensions where evidence is thin. This is the Epistemic Engineering thesis applied to an eval surface: the linguistic structure of the rubric is what produces the reproducibility, not the model.

- **Zero-LLM agent memory at competitive accuracy.** fidelis 73.0% end-to-end QA on LongMemEval-S (Wilson 95% CI [68.7%, 77.0%]) with no LLM in the default retrieval path. Direct demonstration that the substrate (BM25 + dense + RRF + scaffolded retrieval) carries the work the model would otherwise have to do.

- **Research papers.** [The Asymmetric Burden of Proof](https://doi.org/10.5281/zenodo.18867694) and [A Taxonomy of Epistemic Failure Modes in LLMs](https://doi.org/10.5281/zenodo.19042469) on Zenodo. 1,500+ controlled adversarial evaluations.

- **IP.** 5 US patent filings (1 non-provisional pending, 4 provisional).

### Citation

Bosch, R. (2026). *Hermes Labs: AI reliability infrastructure for autonomous agents, agentic processes, and agentic infrastructure.* https://hermes-labs.ai