feat(skills): smart ranking, usage tracking, and lifecycle management by fathah · Pull Request #4406 · NousResearch/hermes-agent

fathah · 2026-04-01T06:37:36Z

What does this PR do?

Skills in the system prompt are now ranked by usage frequency + keyword relevance to the user's message, replacing the alphabetical dump that buried the right skills.

Also adds usage tracking, opt-in token budgets, auto-archival of stale skills, and CLI commands to manage skill health.

Problem

Every skill is injected into the system prompt alphabetically with no limits. With 98 skills, ml-paper-writing sits at position 86 and systematic-debugging at 95. The LLM scans through dozens of irrelevant skills before finding the one that matches — or gives up and improvises.

The system prompt is immune to context compression, so this gets worse over time as skills accumulate.

How it works

Usage tracking — skill_usage table (schema v7) records every view, invoke, and slash command. Scored with recency-weighted frequency in a single SQL query.
Keyword relevance — Jaccard similarity between user message and skill metadata (name, description, tags), expanded with suffix stemming and a domain synonym map (tweet -> twitter, bug -> debug).
Normalized merge — both signals normalized to 0-1 before combining. Relevance weighted 3x so query-relevant skills beat daily-driver habits.
Flat output — when scores are active, skills listed in score order instead of grouped by category.

Related Issue

#4356 #4379 #4319 #4391 #4404

Type of Change

✨ New feature (non-breaking change that adds functionality)
✅ Tests (adding or improving test coverage)

Changes Made

agent/prompt_builder.py — keyword relevance scoring, suffix stemmer, synonym map, token budget, normalized merge, flat ranked output
hermes_state.py — schema v7 migration with skill_usage table, ranking/stats/last-used queries, self-cleaning purge
tools/skill_manager_tool.py — archive/restore, bundled skill detection, dedup check on create, find_archivable_skills()
tools/skills_tool.py — usage tracking on skill_view, .archive exclusion, include_archived param, archive fallback with restore hint
agent/skill_commands.py — usage tracking on slash command invocations
agent/skill_utils.py — .archive added to EXCLUDED_SKILL_DIRS
hermes_cli/config.py — skills config block (token_budget, max_prompt_skills, pinned_skills, auto_archive_days)
hermes_cli/main.py — argparse for stats/archive/restore/prune subcommands
hermes_cli/skills_config.py — CLI implementations for stats, archive, restore, prune
run_agent.py — loads skills config, computes usage scores, passes user_message to prompt builder, background auto-archive
tests/test_skills_overflow.py — 47 tests covering all new features

All config defaults preserve existing behavior (0 = unlimited/disabled). No breaking changes.

How to Test

pytest tests/test_skills_overflow.py -v — 47 tests, all pass
pytest tests/ -k skill -q — full skill test suite, 0 new regressions
Start hermes with default config — all skills appear as before
Set skills.token_budget: 4000 — skills section capped, footer shows omitted count
hermes skills stats — shows usage data after interacting with skills
hermes skills archive <name> then hermes skills restore <name>
hermes skills prune --days 90 — lists unused skills, prompts for confirmation

Benchmark (98 real skills)

Query	Before	After
"write a research paper for NeurIPS"	ml-paper-writing 86, arxiv 82	2, 3
"set up a vector database for RAG"	qdrant 73, pinecone 72, chroma 70	5, 7, 8
"post a tweet about my project"	xitter 90	2
"debug my python code that crashes"	systematic-debugging 95	9
"find a restaurant nearby"	find-nearby 27	1

Right skill in top 20: 29% -> 93%

End-to-end with gemma-3-4b: LLM picked the correct skill 6/6 vs 4/6 on alphabetical ordering.

… Skills in the system prompt are now ranked by a combination of usage frequency and keyword relevance to the user's message, replacing the previous alphabetical dump. Adds a skill_usage table (schema v7) that tracks views, invocations, and management actions — feeding a normalized scoring system that surfaces the right skill for the task. New capabilities: - Token budget and max_prompt_skills caps (opt-in, defaults unchanged) - Pinned skills that survive budget cuts - Suffix stemming and domain synonym expansion for keyword matching - Auto-archival of stale skills (background thread, opt-in) - CLI: hermes skills stats/archive/restore/prune - Deduplication warnings on skill creation - Archived skills discoverable via skills_list(include_archived=True) Benchmark on 98 real skills: correct skill in top 20 improved from 29% to 93%. Verified end-to-end with LLM picking the right skill 6/6 vs 4/6 on alphabetical ordering.

…s-agent into skills-overflow-fix

alexferrari88 · 2026-04-16T08:16:43Z

nice!

alt-glitch · 2026-05-02T04:07:55Z

Related to #11425 (skills lifecycle management feature request) and RFC #16077 (Curator background skill maintenance).

alt-glitch · 2026-05-02T04:08:22Z

Related to #11425 (skills lifecycle management feature request) and RFC #16077 (Curator background skill maintenance).

teknium1 · 2026-05-03T22:13:49Z

Thanks for the thorough write-up and benchmark, @fathah — closing this one, but the problem framing was useful.

Most of what this PR builds has since shipped via the curator (commit bc79e22, `feat(curator): background skill maintenance`):

`skill_usage` SQLite table — shipped (different schema)
Per-skill usage recording on view/invoke — shipped (`tools/skill_usage.py`, `agent/curator.py`)
Pinning — shipped as PR feat(skills): refuse skill_manage writes on pinned skills #17562, refined in Curator umbrella-skill consolidation can leave cron jobs with stale skill references #18671/fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) #18731
Agent-created scanner / classification — shipped (`ce089169d`, with `skills.guard_agent_created` config gate)

Two specific reasons not to salvage the rest:

Keyword-relevance ranking in the system prompt would break prompt caching. Hermes treats the system prompt as immutable across a session (see AGENTS.md: "Prompt Caching Integrity"). Adding `user_message` to `build_skills_system_prompt()` and re-ordering skills per turn would invalidate the cache on every message, materially raising cost for every user. Skill ranking would need a different delivery channel (e.g. a runtime skill selector) — not the system prompt.
Schema/CLI overlap with the shipped curator is now too large to rebase cleanly. A cherry-pick onto current main would conflict across all 11 files touched, and the PR's data model (separate `.archive` directory, its own schema v7) doesn't match the curator's approach.

The `hermes skills stats / archive / restore / prune` CLI surface is the one piece that's genuinely net-new and worth keeping — tracked in #19384, crediting this PR.

fathah added 3 commits April 1, 2026 10:05

Merge branch 'skills-overflow-fix' of https://github.com/fathah/herme…

ecc0760

…s-agent into skills-overflow-fix

fathah changed the title ~~feat(skills): smart ranking, usage tracking, and lifecycle management…~~ feat(skills): smart ranking, usage tracking, and lifecycle management Apr 1, 2026

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder tool/skills Skills system (list, view, manage) labels May 2, 2026

teknium1 mentioned this pull request May 3, 2026

feat(curator): user-facing CLI for skill usage stats, archive/restore, and prune #19384

Closed

teknium1 closed this May 3, 2026

elmatadorgh mentioned this pull request May 4, 2026

feat(skills): add stats/archive/restore/prune CLI subcommands #19454

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): smart ranking, usage tracking, and lifecycle management#4406

feat(skills): smart ranking, usage tracking, and lifecycle management#4406
fathah wants to merge 3 commits into
NousResearch:mainfrom
fathah:skills-overflow-fix

fathah commented Apr 1, 2026

Uh oh!

alexferrari88 commented Apr 16, 2026

Uh oh!

alt-glitch commented May 2, 2026

Uh oh!

alt-glitch commented May 2, 2026

Uh oh!

teknium1 commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fathah commented Apr 1, 2026

What does this PR do?

Problem

How it works

Related Issue

Type of Change

Changes Made

How to Test

Benchmark (98 real skills)

Uh oh!

alexferrari88 commented Apr 16, 2026

Uh oh!

alt-glitch commented May 2, 2026

Uh oh!

alt-glitch commented May 2, 2026

Uh oh!

teknium1 commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants