Skip to content

[Bug]: Background review agent and curator can overwrite bundled/hub skills via skill_manage #20273

@DanielMaly

Description

@DanielMaly

Describe the bug

skill_manage has no code-level write guard for bundled or hub-installed skills. The only protections are _pinned_guard() (blocks pinned skills) and _security_scan_skill() (blocks dangerous content, off by default). Any agent session with access to skill_manage can freely edit, patch, or delete bundled skills.

This affects two autonomous subsystems that run without user supervision:

1. Background review agent (most urgent)

_spawn_background_review() (run_agent.py:3465) runs after every conversation turn with ≥10 tool iterations. It forks an AIAgent with enabled_toolsets=["memory", "skills"] and a prompt (_SKILL_REVIEW_PROMPT at run_agent.py:3270) that says:

"Be ACTIVE — most sessions produce at least one skill update, even if small. A pass that does nothing is a missed learning opportunity."

This prompt has no instruction to avoid bundled or hub-installed skills. Combined with the lack of a code guard, the background review agent can and does patch bundled skills during normal conversations. The main agent is unaware — the review runs in a background thread after the response is delivered.

2. Curator

The curator (agent/curator.py) runs on a 7-day schedule. Its prompt (CURATOR_REVIEW_PROMPT at line 261) does include the instruction "DO NOT touch bundled or hub-installed skills", but this is a prompt-level instruction, not an enforced boundary. A poisoned community skill could inject instructions that override it (prompt injection → persistent code modification via skill_manage).

To reproduce

  1. Have bundled skills installed in ~/.hermes/skills/
  2. Have a conversation that triggers the background review (≥10 tool iterations, configurable via skills.creation_nudge_interval)
  3. The background review agent can call skill_manage(action='patch', name='<bundled-skill>', ...) — no error is returned
  4. The bundled skill is now modified in ~/.hermes/skills/
  5. On next hermes update, the skills sync detects the hash divergence and prints ~ N user-modified (kept) — the agent's modification is silently preserved, not overwritten. There is no path back to the upstream version short of manually deleting the skill directory.

Why this is a problem

The issue is not that edits get overwritten — they don't. The skills sync (tools/skills_sync.py) uses hash-based detection: if the user copy differs from the origin hash, it's treated as "user-modified" and skipped. This means:

  1. Silent corruption — The agent patches a bundled skill, the sync preserves the patch, and neither the user nor the sync reports anything wrong. The ~ N user-modified (kept) message sounds benign, like intentional customization.
  2. No path back to upstream — Once the hash diverges, hermes update will never restore the original bundled content. The user's copy is permanently forked with whatever the agent wrote.
  3. The sync can't distinguish agent accidents from user intent — Both look like hash divergence. The agent's unintended edits are indistinguishable from deliberate user customization.

Expected behavior

skill_manage should refuse write operations (edit, patch, delete, write_file, remove_file) on bundled and hub-installed skills at the tool level, returning a clear error. The background review prompt should also include an explicit "DO NOT touch bundled or hub-installed skills" instruction as defense-in-depth.

Related issues and PRs

Additional context

  • SKILLS_GUIDANCE in agent/prompt_builder.py:176 (injected into every session where skill_manage is available) actively encourages the behavior: "When using a skill and finding it outdated, incomplete, or wrong, patch it immediately — don't wait to be asked." No caveat about bundled vs. agent-created skills.
  • The curator.auxiliary.model / curator.auxiliary.provider config options defined in hermes_cli/config.py:952 are not wired up — the curator always uses the main model config regardless of these settings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildertool/skillsSkills system (list, view, manage)type/securitySecurity vulnerability or hardening

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions