Skip to content

fix(skills): key skill-command cache per platform to prevent cross-platform leaks#14594

Closed
draix wants to merge 1 commit into
NousResearch:mainfrom
draix:fix/14536-skill-commands-platform-cache
Closed

fix(skills): key skill-command cache per platform to prevent cross-platform leaks#14594
draix wants to merge 1 commit into
NousResearch:mainfrom
draix:fix/14536-skill-commands-platform-cache

Conversation

@draix
Copy link
Copy Markdown

@draix draix commented Apr 23, 2026

Problem

Closes #14536

agent/skill_commands.py maintains a process-global _skill_commands dict. The first platform to call scan_skill_commands() seeds this cache with its own platform-specific disabled-skill view. Subsequent platforms (e.g. Discord after Telegram) hit get_skill_commands(), find the cache non-empty, and return the wrong skill set — inheriting the first platform's disabled list.

Root Cause

# Before
_skill_commands: Dict[str, Dict[str, Any]] = {}  # single global cache

def get_skill_commands():
    if not _skill_commands:   # any non-empty cache → returned as-is for ALL platforms
        scan_skill_commands()
    return _skill_commands

_get_disabled_skill_names() correctly resolves per-platform disabled lists via HERMES_SESSION_PLATFORM, but the cached result was shared across all platforms.

Fix

Before After
Cache structure Dict[str, Any] (flat) Dict[str | None, Dict[str, Any]] (per-platform)
scan_skill_commands() no platform arg accepts platform=None, resolves and uses as cache key
get_skill_commands() no platform arg accepts platform=None, checks per-platform slot
Platform resolution always from env explicit arg → HERMES_SESSION_PLATFORMHERMES_PLATFORMNone

All existing call-sites use the no-arg form and continue to work — they now automatically resolve platform from session context.

Tests Added (TestPlatformAwareCache)

  • test_different_platforms_get_independent_caches — Telegram and Discord see their own disabled lists
  • test_platform_none_uses_global_disabled_list — global disabled list respected when platform is None
  • test_rescan_updates_platform_cache_entry — rescan replaces the per-platform slot

…atform leaks

The process-global `_skill_commands` dict was seeded by whichever
platform executed `scan_skill_commands()` first.  Subsequent platforms
then got the wrong enabled/disabled skill set from `get_skill_commands()`
without re-scanning, because the guard `if not _skill_commands` treated
any non-empty cache as valid for all platforms.

Changes:
- Change `_skill_commands` from a flat dict to a platform-keyed dict
  `Dict[str | None, Dict[str, Dict[str, Any]]]`.
- Add `_resolve_platform()` helper that follows the same priority order
  as `get_disabled_skill_names`: explicit arg → HERMES_SESSION_PLATFORM
  context var → HERMES_PLATFORM env var → None.
- `scan_skill_commands(platform=None)` now accepts an optional platform,
  resolves it, and stores/returns only the per-platform slice.
- `get_skill_commands(platform=None)` resolves platform and checks only
  the per-platform key, triggering a fresh scan on a cache miss.
- All existing call-sites use the no-arg form and continue to work: they
  now automatically resolve from session context.

Fixes NousResearch#14536

Tests added (TestPlatformAwareCache):
- test_different_platforms_get_independent_caches
- test_platform_none_uses_global_disabled_list
- test_rescan_updates_platform_cache_entry
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/skills Skills system (list, view, manage) labels Apr 23, 2026
@alt-glitch
Copy link
Copy Markdown
Collaborator

Competing fix with #14570 for #14536 — this PR keys cache per-platform rather than invalidating on change. Maintainer should pick one.

@teknium1
Copy link
Copy Markdown
Contributor

teknium1 commented May 2, 2026

Thanks for working on this. Closing in favor of #18739 which salvages #14570.

Applying this PR to main and running the issue's repro returns an empty result for both platforms — scan_skill_commands() now calls _get_disabled_skill_names(resolved_platform), but that function in tools/skills_tool.py takes no argument. The resulting TypeError gets swallowed by the outer except Exception in scan_skill_commands(), so scanning silently returns empty:

telegram: []
discord: []

The architectural approach (per-platform keyed cache, passing platform through explicitly) is cleaner than the rescan strategy in #18739, but getting there would require updating _get_disabled_skill_names() to accept a platform argument in tools/skills_tool.py AND agent/skill_utils.py. Happy to revisit if you want to open a follow-up with that full change — for now shipping the less-invasive rescan fix.

@teknium1 teknium1 closed this May 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists tool/skills Skills system (list, view, manage) type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

skill_commands cache can leak one platform's disabled-skill view into another

4 participants