Skip to content

Ideas from Hermes Agent worth adopting #407

@zmanian

Description

@zmanian

Context

NousResearch/hermes-agent is a Python-based personal AI agent with a similar scope to IronClaw (multi-channel, tool execution, memory, skills). After studying its architecture, several patterns stand out as worth adopting. IronClaw already has stronger foundations in some areas (WASM sandboxing, zero-exposure credentials, hybrid search, Rust performance), but Hermes has some clever ideas around inference cost optimization, context management, and agent autonomy.

High-Value Ideas

1. Programmatic Tool Calling (PTC)

What Hermes does: The agent can write a Python script that calls its own tools via Unix domain socket RPC. This collapses multi-step tool chains into a single inference turn with zero intermediate context cost. The agent generates a stub module with functions like web_search(), terminal() etc. that serialize calls over UDS.

Why it matters: Many agent workflows are mechanical sequences (search -> read -> parse -> write). Each step costs a full LLM round-trip. PTC lets the agent plan the whole sequence in one turn and execute it programmatically.

IronClaw approach: A WASM guest script that calls host tools via host functions. We already have the WASM runtime and host function infrastructure -- this would add a "script" tool that compiles and runs a short WASM program with access to the tool registry.

2. Frozen Memory Snapshots for Prompt Caching

What Hermes does: Captures memory/user profile at session start and never mutates the system prompt mid-session, even when memory is updated on disk. Refreshes the snapshot only on compression or new session.

Why it matters: Anthropic and other providers charge significantly less for cached prompt prefixes (~75% discount). Mutating the system prompt mid-session invalidates the cache. IronClaw's workspace identity files (AGENTS.md, SOUL.md, USER.md, IDENTITY.md) are injected into the system prompt -- if these change mid-session, we lose caching benefits.

IronClaw approach: Snapshot identity files at session start. memory_write updates disk immediately but the system prompt snapshot stays frozen until explicit refresh (session start, compaction, or manual trigger).

3. Pre-Compression Memory Flush

What Hermes does: Before context compression, the agent gets one extra turn with an injected message: "Context is about to be compressed. Save any important information to memory now." Only then does compression proceed.

Why it matters: IronClaw's compaction system summarizes turns, but summarization is lossy. Giving the agent a chance to explicitly save important details before compression prevents information loss, especially for multi-step tasks where intermediate results matter.

IronClaw approach: Add a pre-compaction hook in src/agent/compaction.rs that inserts a system message and gives the agent one turn to call memory_write before summarization begins.

4. Cheap Auxiliary Model for Side Tasks

What Hermes does: Uses a separate cheap/fast model (Gemini Flash) for context compression, session search summarization, and other mechanical tasks. The primary model handles reasoning; the auxiliary model handles grunt work.

Why it matters: IronClaw routes everything through the primary model. Summarization, search result ranking, and context compression don't need frontier-model intelligence. Using a cheap model for these tasks could significantly reduce costs.

IronClaw approach: Add LLM_CHEAP_MODEL config (note: #379 already proposes this). Route compaction summarization, workspace search re-ranking, and session search through the cheap model. The existing LlmProvider trait supports this -- just need a second provider instance.

5. Subagent Delegation with Context Isolation

What Hermes does: Spawns child AIAgent instances with isolated context, restricted toolsets, and their own terminal sessions. Children cannot delegate further, write to memory, or interact with the user. Only the final summary enters the parent's context -- intermediate tool calls are invisible.

Why it matters: Context-heavy subtasks (research, code analysis) currently pollute the parent context with intermediate results. Delegation keeps the parent context clean and focused.

IronClaw approach: The existing job/worker system handles parallel execution but doesn't have the "return only summary" pattern. Add a delegate tool that spawns a child worker with restricted tools and returns only the final summary to the calling context. Enforce: no recursive delegation, no memory writes, no user-facing messages.

6. Interruptible API Calls with Connection Teardown

What Hermes does: On interrupt, force-closes the HTTP connection (stopping token generation mid-stream), then rebuilds the client. Pending tool calls are skipped with "[Tool execution cancelled]" messages. Interrupt propagates to child agents.

Why it matters: Setting an interrupt flag and waiting for the current response to finish still bills for all generated tokens. Tearing down the connection stops generation immediately, saving money on long responses the user doesn't need.

IronClaw approach: Use reqwest request cancellation (drop the response future) on interrupt. Add interrupt propagation to child jobs via the existing job context system.

Medium-Value Ideas

7. Tiered Command Approval Persistence

Hermes's dangerous command detection offers four options: once / session / always / deny. "Always" persists across sessions via a JSON allowlist file. IronClaw has safety patterns but no persistent allowlist -- users re-approve the same safe commands repeatedly.

8. Periodic Behavioral Nudges

After N turns without using memory, Hermes injects: "Consider whether there's anything worth saving." Similarly for skill creation after long tool chains. Low-cost pattern for reinforcing good agent habits without hard-coding behavior.

9. Toolsets as Composable Capability Groups

Named groupings like "web", "terminal", "browser" that can be enabled/disabled per channel or context. More explicit and user-facing than IronClaw's trust-based attenuation. Could complement the existing skill/trust system.

10. Skills Content Scanning

Hermes scans downloaded skills for exfiltration patterns (env variable leaks, DNS exfiltration), prompt injection (role hijacking, instruction override), destructive operations, and obfuscation before allowing installation. IronClaw has trust levels for skills but no content scanning -- this would harden the ClawHub install path.

What IronClaw Already Does Better

For context, areas where IronClaw's existing approach is stronger:

  • WASM sandboxing -- stronger isolation than Hermes's Docker-or-nothing
  • Zero-exposure credential injection via proxy -- Hermes passes secrets as container env vars
  • Multi-provider with circuit breaker/failover -- Hermes relies on OpenRouter as a single gateway
  • Hybrid search (FTS + vector via RRF) -- Hermes uses basic FTS5 only
  • Rust performance and type safety -- Hermes is Python

Metadata

Metadata

Assignees

No one assigned

    Labels

    scope: agentAgent core (agent loop, router, scheduler)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions