fix(skills): key skill-command cache per platform to prevent cross-platform leaks#14594
fix(skills): key skill-command cache per platform to prevent cross-platform leaks#14594draix wants to merge 1 commit into
Conversation
…atform leaks The process-global `_skill_commands` dict was seeded by whichever platform executed `scan_skill_commands()` first. Subsequent platforms then got the wrong enabled/disabled skill set from `get_skill_commands()` without re-scanning, because the guard `if not _skill_commands` treated any non-empty cache as valid for all platforms. Changes: - Change `_skill_commands` from a flat dict to a platform-keyed dict `Dict[str | None, Dict[str, Dict[str, Any]]]`. - Add `_resolve_platform()` helper that follows the same priority order as `get_disabled_skill_names`: explicit arg → HERMES_SESSION_PLATFORM context var → HERMES_PLATFORM env var → None. - `scan_skill_commands(platform=None)` now accepts an optional platform, resolves it, and stores/returns only the per-platform slice. - `get_skill_commands(platform=None)` resolves platform and checks only the per-platform key, triggering a fresh scan on a cache miss. - All existing call-sites use the no-arg form and continue to work: they now automatically resolve from session context. Fixes NousResearch#14536 Tests added (TestPlatformAwareCache): - test_different_platforms_get_independent_caches - test_platform_none_uses_global_disabled_list - test_rescan_updates_platform_cache_entry
|
Thanks for working on this. Closing in favor of #18739 which salvages #14570. Applying this PR to main and running the issue's repro returns an empty result for both platforms — The architectural approach (per-platform keyed cache, passing platform through explicitly) is cleaner than the rescan strategy in #18739, but getting there would require updating |
Problem
Closes #14536
agent/skill_commands.pymaintains a process-global_skill_commandsdict. The first platform to callscan_skill_commands()seeds this cache with its own platform-specific disabled-skill view. Subsequent platforms (e.g. Discord after Telegram) hitget_skill_commands(), find the cache non-empty, and return the wrong skill set — inheriting the first platform's disabled list.Root Cause
_get_disabled_skill_names()correctly resolves per-platform disabled lists viaHERMES_SESSION_PLATFORM, but the cached result was shared across all platforms.Fix
Dict[str, Any](flat)Dict[str | None, Dict[str, Any]](per-platform)scan_skill_commands()platform=None, resolves and uses as cache keyget_skill_commands()platform=None, checks per-platform slotHERMES_SESSION_PLATFORM→HERMES_PLATFORM→NoneAll existing call-sites use the no-arg form and continue to work — they now automatically resolve platform from session context.
Tests Added (
TestPlatformAwareCache)test_different_platforms_get_independent_caches— Telegram and Discord see their own disabled liststest_platform_none_uses_global_disabled_list— global disabled list respected when platform is Nonetest_rescan_updates_platform_cache_entry— rescan replaces the per-platform slot