Skip to content

fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671)#18731

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-78d066c7
May 2, 2026
Merged

fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671)#18731
teknium1 merged 2 commits into
mainfrom
hermes/hermes-78d066c7

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

@teknium1 teknium1 commented May 2, 2026

Summary

Two fixes for the curator + cron-link silent-failure class. Closes #18671.

  1. absorbed_into on skill delete — curator reconciler stops guessing what "archived" means.
  2. Cron skill links are backed up with the snapshot and restored on rollback — rolling back a curator run actually returns cron jobs to their pre-run state.

1. absorbed_into on skill delete

Root cause

_reconcile_classification in agent/curator.py inferred consolidation vs pruning from two brittle signals: the curator's post-hoc YAML summary block, and a substring heuristic scanning sibling tool calls for the removed skill's name. Both miss in real consolidations — models forget the YAML under reasoning pressure, and the heuristic misses when the umbrella's patch content describes the absorbed behavior abstractly instead of literally naming the old slug. When both miss, the skill fell through to "no-evidence fallback" pruned, and #18253's cron-rewriter then dropped the cron ref entirely instead of mapping it to the umbrella. Same observable symptom as pre-#18253: Skill(s) not found and skipped on the next cron run.

Changes

  • tools/skill_manager_tool.pyskill_manage(action='delete') accepts absorbed_into:
    • absorbed_into='<umbrella>' → consolidated; target must exist on disk (validated)
    • absorbed_into='' → explicit prune, no forwarding target
    • missing → legacy path, reconciler falls through to heuristic/YAML (backward compat)
    • rejects absorbed_into=<self> and nonexistent targets
  • agent/curator.py — new _extract_absorbed_into_declarations() pulls declarations off llm_meta.tool_calls. _reconcile_classification accepts absorbed_declarations= and treats it as authoritative — beats YAML block and heuristic. Curator prompt updated to require the arg on every delete.

2. Cron skill links through snapshot + rollback

Root cause

snapshot_skills() captured the skills tree and .curator_backups/…/skills.tar.gz held it safely, but ~/.hermes/cron/jobs.json was never captured. After a rollback, skills bounced back to disk but cron jobs still pointed at whatever umbrellas the curator had rewritten them to. User experience: "I rolled back but my cron jobs still use the merged skills."

Changes

  • agent/curator_backup.pysnapshot_skills() additionally copies cron/jobs.json as cron-jobs.json alongside the tarball. Manifest gains a cron_jobs block (backed_up, jobs_count, optional reason/parse_warning).
  • agent/curator_backup.py — new _restore_cron_skill_links(snapshot_dir) reconciles backed-up skills into the live jobs.json surgically:
    • only skills/skill fields touched; schedule/prompt/timestamps/enabled/etc. are live state and preserved
    • matched by job id; jobs the user deleted after the snapshot are NOT resurrected; jobs the user created after are untouched
    • writes through cron.jobs.save_jobs() under the same _jobs_file_lock the scheduler uses — no race with tick()
    • failures here don't fail the overall rollback (skills tree is the core guarantee)
  • rollback() calls _restore_cron_skill_links after the skills extract succeeds; the returned message summarizes the reconciliation ("cron links: N job(s) had skill links restored, M backed-up job(s) no longer exist").
  • hermes_cli/curator.py — rollback confirm dialog shows cron-backup status from the manifest so the user knows what's about to happen.

Validation

Before After
Model consolidates, emits YAML, heuristic hits
Model consolidates, forgets YAML, heuristic misses ✗ fell through to prune, cron ref dropped absorbed_into declared → cron rewritten
Model truly prunes inferred explicit absorbed_into=""
Rollback restores skills tree
Rollback restores cron skill links ✗ jobs still point at umbrellas ✓ surgical restore; non-skill fields preserved
Rollback with pre-feature snapshot (no cron-jobs.json) n/a ✓ skills tree still restored; cron untouched; no error
User deleted a job after snapshot n/a ✓ not resurrected
User added a job after snapshot n/a ✓ untouched

Test counts

  • Targeted: 484/484 pass (tests/agent/test_curator*.py + tests/cron/ + tests/tools/test_skill_manager_tool.py).
  • 18 new tests for absorbed_into (tool contract + extractor + reconciler + mixed legacy-and-declared runs).
  • 9 new tests for cron-link backup/rollback (snapshot shapes, rollback reconciliation rules, pre-feature-snapshot compatibility, standalone unit on _restore_cron_skill_links).

E2E

  • Curator umbrella-skill consolidation can leave cron jobs with stale skill references #18671 classification repro: umbrella + 3 narrow skills, cron job referencing all 3. Model emits no YAML, heuristic misses. Delete calls carry absorbed_into. Result: PR skills correctly classified consolidated, cron rewritten to ['hermes-agent-dev'], stale-junk pruned via absorbed_into="".
  • Backward-compat classification: delete without absorbed_into, model emits YAML → routed via existing "model" source, cron still rewritten correctly. Legacy path untouched.
  • Full snapshot → curator rewrite → rollback: cron job skills=[pr-review-format, pr-review-checklist, pr-triage-salvage]. Snapshot captures cron. Curator rewrites skills to [hermes-agent-dev]. Rollback restores both the skills tree AND the cron skills list to the original three names. Non-skill cron fields (id, name, prompt) preserved across the round trip.

teknium1 added 2 commits May 2, 2026 01:18
Closes #18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.

The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
  - absorbed_into='<umbrella>'  -> consolidated, target must exist on disk
  - absorbed_into=''            -> explicit prune, no forwarding target
  - missing                     -> legacy path, falls through to heuristic/YAML

The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.

Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
  + _delete_skill. Validate target exists when non-empty. Reject
  absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
  tool calls for skill_manage(delete) with the arg. _reconcile_classification
  accepts absorbed_declarations= and treats them as authoritative. Curator
  prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
  target, empty string, nonexistent target, self-reference, whitespace,
  backward compat, dispatcher plumbing). 11 new curator tests covering
  the extractor + authoritative reconciler path + mixed-legacy-and-
  declared runs.

Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
  all 3. Model emits NO YAML block. Heuristic misses (patch prose
  doesn't name old slugs). Delete calls carry absorbed_into. Result:
  both PR skills correctly classified 'consolidated' + cron rewritten
  ['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
  ['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
  -> routed via existing 'model' source, cron still rewritten correctly.
…lback

Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.

Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.

Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.

Reconciliation rules:
- Job in backup AND live, skills differ  → skills restored.
- Job in backup AND live, skills match   → no-op.
- Job in backup, NOT in live             → skipped (user deleted it
                                              after snapshot; their choice
                                              is later than the snapshot).
- Job in live, NOT in backup             → untouched (user created it
                                              after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
                                              reports 'not captured'
                                              (older pre-feature snapshots
                                              keep working).

Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().

Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
  'cron jobs: N (will be restored for skill-link fields only)' when the
  snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
  summarizing the reconciliation outcome.

Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
  captured-as-raw, full rollback-restores-skills-and-cron, rollback
  touches only skill fields, rollback skips user-deleted jobs, rollback
  leaves user-created jobs untouched, rollback still works with
  pre-feature snapshot that has no cron-jobs.json, standalone unit test
  on _restore_cron_skill_links exercising the full report shape.

Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
  ['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
  After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
  'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
  name, prompt) preserved across the round trip.
@teknium1 teknium1 changed the title fix(curator): authoritative absorbed_into declarations on skill delete (#18671) fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) May 2, 2026
@teknium1 teknium1 merged commit 97acd66 into main May 2, 2026
10 of 11 checks passed
@teknium1 teknium1 deleted the hermes/hermes-78d066c7 branch May 2, 2026 08:30
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/skills Skills system (list, view, manage) comp/cron Cron scheduler and job management labels May 2, 2026
nickdlkk pushed a commit to nickdlkk/hermes-agent that referenced this pull request May 11, 2026
…ill links on rollback (NousResearch#18671) (NousResearch#18731)

* fix(curator): authoritative absorbed_into declarations on skill delete

Closes NousResearch#18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and NousResearch#18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-NousResearch#18253:
'Skill(s) not found and skipped' at the next cron run.

The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
  - absorbed_into='<umbrella>'  -> consolidated, target must exist on disk
  - absorbed_into=''            -> explicit prune, no forwarding target
  - missing                     -> legacy path, falls through to heuristic/YAML

The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.

Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
  + _delete_skill. Validate target exists when non-empty. Reject
  absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
  tool calls for skill_manage(delete) with the arg. _reconcile_classification
  accepts absorbed_declarations= and treats them as authoritative. Curator
  prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
  target, empty string, nonexistent target, self-reference, whitespace,
  backward compat, dispatcher plumbing). 11 new curator tests covering
  the extractor + authoritative reconciler path + mixed-legacy-and-
  declared runs.

Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E NousResearch#18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
  all 3. Model emits NO YAML block. Heuristic misses (patch prose
  doesn't name old slugs). Delete calls carry absorbed_into. Result:
  both PR skills correctly classified 'consolidated' + cron rewritten
  ['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
  ['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
  -> routed via existing 'model' source, cron still rewritten correctly.

* feat(curator): capture + restore cron skill links across snapshot/rollback

Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.

Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.

Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.

Reconciliation rules:
- Job in backup AND live, skills differ  → skills restored.
- Job in backup AND live, skills match   → no-op.
- Job in backup, NOT in live             → skipped (user deleted it
                                              after snapshot; their choice
                                              is later than the snapshot).
- Job in live, NOT in backup             → untouched (user created it
                                              after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
                                              reports 'not captured'
                                              (older pre-feature snapshots
                                              keep working).

Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().

Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
  'cron jobs: N (will be restored for skill-link fields only)' when the
  snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
  summarizing the reconciliation outcome.

Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
  captured-as-raw, full rollback-restores-skills-and-cron, rollback
  touches only skill fields, rollback skips user-deleted jobs, rollback
  leaves user-created jobs untouched, rollback still works with
  pre-feature snapshot that has no cron-jobs.json, standalone unit test
  on _restore_cron_skill_links exercising the full report shape.

Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
  ['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
  After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
  'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
  name, prompt) preserved across the round trip.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cron Cron scheduler and job management P2 Medium — degraded but workaround exists tool/skills Skills system (list, view, manage) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Curator umbrella-skill consolidation can leave cron jobs with stale skill references

2 participants