Skip to content

[Feature]: Skills lifecycle management — usage tracking, stale detection, and auto-cleanup #11425

@LehaoLin

Description

@LehaoLin

Problem or Use Case

As users install more skills over time (currently 89+ in my installation), the skills list grows unbounded with no lifecycle management:

  1. No usage tracking: There is no way to know which skills are actually used and which are dead weight. Skills installed months ago for a one-off task remain forever.

  2. No stale/unused detection: Skills that have never been loaded (or not loaded in N days) continue to appear in the system prompt index, consuming tokens on every single turn.

  3. No auto-cleanup mechanism: Users must manually audit and prune skills. With 89+ skills across 10+ categories, this is tedious and error-prone.

  4. No importance classification: Usage frequency alone cannot distinguish between a rarely-used-but-critical skill (e.g., a debugging skill used only during incidents) and a genuinely unused skill (e.g., a game server skill never needed). Any cleanup mechanism based purely on usage stats risks false positives on critical skills.

  5. Token cost is quadratic: The full skills list is injected into the system prompt every turn. At 89 skills (~50KB per feat: context-aware skills prompt and system prompt budget management #10164), this wastes thousands of tokens per API call. But even if feat: context-aware skills prompt and system prompt budget management #10164 is merged to truncate descriptions, the fundamental problem remains — unused skills should not be in the list at all.

Real-world example

My installation has skills for Minecraft servers, Pokemon emulation, Stable Diffusion, Axolotl training, Polymarket prediction, songwriting, etc. — none of which I have ever used or will ever use. Yet their names and descriptions are loaded into every API call, every turn, every session.

At the same time, I have skills like systematic-debugging or hermes-model-metadata-debug that are only used occasionally during incidents — but are absolutely critical when needed. A pure usage-frequency metric would misclassify these as candidates for removal.

Related: #10164 addresses the symptom (truncate/limit skills prompt size), but not the root cause (unused skills accumulating without lifecycle management).

Proposed Solution

Phase 1: Usage tracking (low risk, high value)

  • Track last_used_at and use_count for each skill (persist in a small JSON/SQLite alongside skills/)
  • Expose usage stats via CLI: hermes skills stats or hermes skills audit
  • Show "never used" / "last used X days ago" in skills_list

Phase 2: Importance classification

Each skill gets an importance level that protects it from automated lifecycle actions:

Level Meaning Behavior
critical Core/essential — must never be archived Immune to stale detection and archival
important Rarely used but critical when needed (e.g., debugging, incident response) Warned but never auto-archived
normal Standard skill, used occasionally Subject to stale detection and archival
low Installed for a one-off task, likely not needed First candidate for archival
  • Defaults to normal for new skills
  • Users can set importance via CLI: hermes skills set-importance <name> critical
  • Optional: Auto-classify based on heuristics (e.g., skills with debug/troubleshoot in name default to important)

Phase 3: Stale detection and archived state

Instead of deleting or disabling skills, introduce an archived state:

  • Skills not loaded in N days AND with importance normal or low are flagged as stale
  • Stale skills are moved to archived state: excluded from system prompt injection, but remain fully on disk
  • hermes skills archive <name> — manually archive a skill
  • hermes skills restore <name> — restore an archived skill
  • hermes skills audit — shows a report: used/recent, stale candidates, archived, critical
  • Archived skills are visible via hermes skills list --archived and can be restored instantly
  • This is safe and reversible — no data loss, no re-download needed

Phase 4: Optional auto-archival (opt-in)

  • Config option to auto-archive stale skills on startup or via cron
  • Only affects skills with importance low or normal (never critical or important)
  • Runs as a background check, not blocking startup
  • Always logs what was archived for user review

Config sketch

skills:
  track_usage: true              # default: true
  stale_after_days: 30           # 0 = disabled (default)
  auto_archive: false            # default: false — must be explicitly opted in
  auto_archive_after_days: 60    # only if auto_archive is true
  default_importance: normal     # default importance for newly installed skills

CLI commands sketch

hermes skills audit                        # full report: usage stats, stale candidates, archived
hermes skills set-importance <name> <level> # set importance: critical/important/normal/low
hermes skills archive <name>               # manually archive (exclude from prompt)
hermes skills restore <name>               # restore from archive
hermes skills list --archived              # list archived skills

Alternatives Considered

  1. Manual pruning: Current approach. Does not scale with 89+ skills.
  2. Category-based disable: Disable entire categories (e.g., all "gaming" skills). Useful but coarse — some skills in a category may be needed.
  3. Lazy skills loading (Feature: Lazy Tool Schema Loading — Two-Pass Tool Injection to Reduce Token Overhead #6839): Would help by not loading skill schemas upfront, but the skills index (names + descriptions) is still injected in the system prompt regardless. Complementary, not a replacement.
  4. Skills prompt truncation (feat: context-aware skills prompt and system prompt budget management #10164): Addresses token waste symptom but not root cause. Dead skills are still listed, just with shorter descriptions.
  5. Pure frequency-based cleanup: Too aggressive — would misclassify rarely-used-but-critical skills (debugging, incident response) as removable. The importance classification layer prevents this.

Feature Type

  • Performance / reliability

Scope

  • Medium (few files, < 300 lines) — usage tracking in skill manager, importance metadata, archived state, new CLI commands

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havetool/skillsSkills system (list, view, manage)type/featureNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions