feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails#20709
Merged
Conversation
…uardrails
Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.
Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:
1. Each working directory got its own full shadow git repo — no object
dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
/rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
without ever asking for /rollback.
Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.
Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
per-project metadata (store/projects/<hash>.json with workdir +
created_at + last_touch). On first v2 init, any pre-v2 per-directory
shadow repos are auto-migrated into legacy-<timestamp>/ so the new
store starts clean. _prune() now actually rewrites the per-project ref
to the last max_snapshots commits and runs git gc --prune=now. New
_enforce_size_cap() drops oldest commits round-robin across projects
when the store exceeds max_total_size_mb. _drop_oversize_from_index()
filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
(status / list / prune / clear / clear-legacy) for managing the store
outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
*.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
coverage for the pre-v2 prune path (per-directory shadow repos under
base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
shared store, new defaults, migration, and the new CLI;
reference/cli-commands.md documents 'hermes checkpoints'.
E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
--retention-days 0' removes its ref, index, and metadata; GC reclaims
the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
CLI without an agent running.
Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.
chris-han
pushed a commit
to chris-han/hermes-agent
that referenced
this pull request
May 6, 2026
…uardrails (NousResearch#20709) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll, *.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly.
bot-ted
added a commit
to bot-ted/hermes-agent
that referenced
this pull request
May 7, 2026
* fix(aux): trigger fallback on 429 rate-limit errors in auxiliary client
When a provider returns a 429 rate-limit error (not billing-related),
the auxiliary client's call_llm/async_call_llm previously did NOT trigger
the fallback chain. This caused auxiliary tasks like session_search to
exhaust all 3 retries against the same rate-limited endpoint, losing
session metadata that depended on the summarization completing.
Root cause: `_is_payment_error()` only matched 429s containing billing
keywords ("credits", "insufficient funds", etc.). Provider-specific
rate-limit messages like Nous's "Hold up for a bit, you've exceeded the
rate limit on your API key" didn't match, so `_is_payment_error` returned
False, `_is_connection_error` returned False, and `should_fallback` was
False — all retries hit the same rate-limited provider.
Fix:
- New `_is_rate_limit_error()` function that detects 429 + rate-limit
keywords, generic 429 without billing keywords, and OpenAI SDK
`RateLimitError` class instances (which may omit .status_code).
- Updated `should_fallback` in both `call_llm` and `async_call_llm` to
include `_is_rate_limit_error`.
- Updated the max_tokens retry path to also check for rate-limit errors.
- Updated the reason string to include "rate limit".
This complements the Nous rate guard (PR #10568) which prevents new calls
to Nous when already rate-limited — this fix handles the case where a
request is already in flight when the 429 arrives.
Related: #8023, #12554, #11034
Co-authored-by: Zeejay <zjtan1@gmail.com>
* chore: AUTHOR_MAP entry for zeejaytan
* fix(acp): preserve assistant reasoning metadata in session persistence
* chore: AUTHOR_MAP entry for Aslaaen
* feat(cli): add list_picker_providers for credential-filtered picker
The Telegram/Discord /model pickers currently call
list_authenticated_providers(), which returns every provider whose
credentials resolve locally and every model in its curated snapshot.
Two failure modes fall out:
- OpenRouter rows can include IDs the live catalog no longer carries.
- Provider rows can surface with zero callable models (e.g. a slug
whose credential pool entry exists but has nothing behind it).
list_picker_providers() wraps the base function and post-processes the
result so the interactive picker only shows models the user can
actually select:
- OpenRouter's models come from fetch_openrouter_models() (live-catalog
filtered against the curated OPENROUTER_MODELS snapshot).
- Rows with an empty models list are dropped, except custom endpoints
(is_user_defined=True with an api_url) where the user may enter
model ids manually.
- All other fields pass through unchanged.
The gateway /model handler switches to the new helper for the
interactive picker payload only. Typed /model <name> and the text
fallback list stay on list_authenticated_providers() so nothing is
hidden from power users or platforms without a picker.
Covered by nine focused unit tests in
tests/hermes_cli/test_list_picker_providers.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: AUTHOR_MAP entry for Tkander1715
* feat(tui): remove /provider alias for /model (#20358)
/model is the canonical command; /provider was a redundant alias that
dispatched to the same ModelPicker overlay. Drop the alias, the regex
branch in useCompletion, and the alias-coverage test.
* fix: resolve lazy session creation regressions (#18370 fallout) (#20363)
Fix three regressions introduced by PR #18370 (lazy session creation):
1. _finalize_session() uses stale session_key after compression (#20001)
2. session_key not synced after auto-compression in run_conversation (#20001)
3. pending_title ValueError leaves title wedged forever (#19029)
4. Gateway silently swallows null responses when agent did work (#18765)
5. One-time cleanup for accumulated ghost compression continuations (#20001)
Changes:
- tui_gateway/server.py: _finalize_session() now uses agent.session_id
(falls back to session_key when agent is None). Refactor
_sync_session_key_after_compress() with clear_pending_title and
restart_slash_worker policy flags. Call it post-run_conversation()
to sync session_key after auto-compression. Add ValueError handler
to pending_title flush.
- gateway/run.py: Extract _normalize_empty_agent_response() helper that
consolidates failed/partial/null response handling. Surfaces user-facing
error when agent did work (api_calls > 0) but returned no text.
- hermes_state.py: Add finalize_orphaned_compression_sessions() — marks
ghost continuation sessions as ended (non-destructive, preserves data).
- cli.py: One-time startup migration for orphaned compression sessions.
Test changes:
- tests/test_tui_gateway_server.py: Update pending_title ValueError test
for post-#18370 architecture (title applied post-message, not at create).
- tests/test_lazy_session_regressions.py: 14 new regression tests covering
all fixed paths.
* docs(web_tools): correct web_extract summarizer timeout comment
The comment at tools/web_tools.py:700-702 stated the runtime default for
auxiliary.web_extract.timeout is 360s. The actual runtime default is 30s
(_DEFAULT_AUX_TIMEOUT in agent/auxiliary_client.py:3140), used by
_get_task_timeout when no auxiliary.web_extract.timeout key is present in
config.yaml.
The 360s figure is the config template default written by
hermes_cli/config.py:697 into freshly-generated config.yaml files. It only
takes effect when that key exists in the user's config — not as a fallback.
Users on configs that predate commit 20b4060d (Apr 5, 2026), or who removed
the key, fall through to the 30s _DEFAULT_AUX_TIMEOUT runtime default.
The comment was introduced in 20b4060d alongside the template-default bump
from 30 to 360. The runtime default in auxiliary_client.py was not changed
in that commit and has remained 30s since 839d9d74 (Mar 28, 2026).
* docs(config): fix fallback provider config paths
* docs(prompt): clarify supported customization surfaces
* chore: AUTHOR_MAP entry for Beandon13
* docs: remove dead reference links in flash-attention skill
* docs: remove dead papers.md link from saelens references
* docs: fix broken nix-setup anchor for container-aware CLI
* fix(telegram): keep DM topic typing scoped
* refactor(telegram): make typing thread-id resolver symmetric with send
Mirror _message_thread_id_for_typing() with _message_thread_id_for_send():
both now map the General forum topic (thread id "1") to None upfront.
That removes the need for the retry-without-thread fallback in send_typing()
entirely — if _message_thread_id_for_typing() returns a non-None value, it's
a real user-created topic and falling back to the root chat is never correct.
If Telegram rejects the typing action (e.g. topic deleted mid-session), we
swallow it at debug level instead of bleeding the indicator into All Messages.
Updates the General-topic typing regression test to assert the new single-call
contract.
* docs(tts): document per-provider max_text_length caps
PR #13743 replaced the global MAX_TEXT_LENGTH=4000 with a per-provider
table and a user-override 'max_text_length:' key, but the user-guide
TTS page documented no length behaviour at all. Users hitting truncation
had no way to discover the new caps or the override.
Add an 'Input length limits' subsection after the existing Configuration
YAML block: provider default caps (Edge 5000 / OpenAI 4096 / xAI 15000 /
MiniMax 10000 / Mistral 4000 / Gemini 5000 / ElevenLabs model-aware /
NeuTTS,KittenTTS 2000), ElevenLabs model_id -> cap table (5k-40k), an
override example, and the validation rules (non-positive / non-integer /
boolean values fall through to the provider default).
* docs(skill/hermes-agent): sync slash commands + add durable-systems section
Mirrors the AGENTS.md #20226 additions (Toolsets / Delegation / Curator /
Cron / Kanban) into the user-facing hermes-agent skill, and closes the
drift in the in-session slash command list.
User report (wxrrior in Discord): the skill did not mention /goal, so a
brand-new session answering "/hermes-agent do you have any info on /goal"
confidently said it did not exist. Cross-check against the CommandDef
registry found 16 commands missing from the static list: /goal, /agents,
/busy, /copy, /curator, /debug, /footer, /gquota, /indicator, /kanban,
/redraw, /reload, /reload-skills, /snapshot, /steer, /topic.
Changes:
- Slash Commands header now tells the reader to run /help or check the
live docs reference as the source of truth, and names the registry
of record (hermes_cli/commands.py) so future drift gets flagged
honestly instead of answered confidently wrong.
- Added all 16 missing commands, slotted into existing subsections
(/goal and /steer in Session; /busy + /indicator + /footer in
Configuration; /curator + /kanban + /reload-skills + /reload in
Tools & Skills; /topic in Gateway; /copy in Utility; /gquota +
/debug in Info).
- Toolsets table updated to the authoritative 30-key list from
toolsets.py (added kanban, yuanbao, spotify, safe, debugging, video,
feishu_doc, feishu_drive, discord, discord_admin, clarify; previously
stopped at 20 keys).
- New "Durable & Background Systems" section before Troubleshooting
covers Delegation, Cron, Curator, Kanban - each with a short rundown
of CLI verbs, key invariants, and a pointer to the user-facing docs.
Mirrors AGENTS.md #20226 but in the skill's user-facing register.
- Bumped version 2.0.0 -> 2.1.0.
* docs(cli): add --deliver-only flag to hermes webhook subscribe
PR #12473 (merged 2026-04-19) added a new --deliver-only flag to
`hermes webhook subscribe` for zero-LLM direct delivery, but
website/docs/reference/cli-commands.md options table did not
reference it. Add the row so CLI users can discover the flag from
the reference page instead of having to read the source.
* perf(ui-tui): narrow overlay subscriptions to focused selectors
Subscribe overlay components to computed theme/session selectors instead of the full UI store so unrelated UI state updates trigger fewer overlay renders.
* docs(cli): add skills reset subcommand to CLI reference
PR #11468 added `hermes skills reset` but cli-commands.md was not
updated. Adds the subcommand to the table and usage examples.
Closes #11543
* feat(kanban): generic diagnostics engine for task distress signals (#20332)
* feat(kanban): generic diagnostics engine for task distress signals
Replaces the hallucination-specific ``warnings`` / ``RecoverySection``
surface (shipped in PR #20232) with a reusable diagnostic-rule engine
that covers five distress kinds in v1 and can be extended without
touching UI code. The "something's wrong with this task" signal is
no longer limited to phantom card ids.
Closes the follow-up from #20232 discussion.
New module
----------
``hermes_cli/kanban_diagnostics.py`` — stateless, no-side-effect rule
engine. Each rule is a pure function of
``(task, events, runs, now, config) -> list[Diagnostic]``. Registry
is a simple list; adding a new distress kind is one function + one
import, no UI or API changes required.
v1 rule set
-----------
* ``hallucinated_cards`` (error) — folds the existing
``completion_blocked_hallucination`` event into the new surface.
* ``prose_phantom_refs`` (warning) — folds
``suspected_hallucinated_references``.
* ``repeated_spawn_failures`` (error → critical at 2x threshold) —
fires when ``tasks.spawn_failures >= 3``; suggests
``hermes -p <profile> doctor`` / ``auth``.
* ``repeated_crashes`` (error → critical) — fires after N consecutive
``crashed`` run outcomes with no successful completion between;
suggests ``hermes kanban log <id>``.
* ``stuck_in_blocked`` (warning) — fires after 24h in ``blocked``
state with no comments / unblock attempts; suggests commenting.
Every diagnostic carries structured ``actions`` (reclaim, reassign,
unblock, cli_hint, comment, open_docs) that render consistently in
both CLI and dashboard. Suggested actions are highlighted; generic
recovery actions (reclaim / reassign) are available on every kind as
fallbacks.
Diagnostics auto-clear when the underlying failure resolves — a
clean ``completed``/``edited`` event drops hallucination diagnostics,
a successful run drops crash diagnostics, a comment drops
stuck-blocked diagnostics. Audit events persist; the badge goes away.
API
---
``plugin_api.py``:
* ``/board`` now attaches ``diagnostics`` (full list) and
``warnings`` (compact summary with ``highest_severity``) per task.
* ``/tasks/{id}`` attaches diagnostics so the drawer's Diagnostics
section auto-opens on flagged tasks.
* NEW ``/diagnostics`` endpoint — fleet-wide listing, filterable by
severity, sorted critical-first.
CLI
---
* NEW ``hermes kanban diagnostics [--severity X] [--task id]
[--json]`` — fleet view or single-task view, matches dashboard rule
output so CLI users see the same picture.
* ``hermes kanban show <id>`` now renders a Diagnostics section near
the top with severity markers + suggested actions.
Dashboard
---------
* Card badge is severity-coloured (⚠ amber warning, !! orange error,
!!! red critical) using ``warnings.highest_severity``.
* Attention strip above the toolbar counts EVERY task with active
diagnostics (not just hallucinations), severity-coloured, lists
affected tasks with Open buttons when expanded.
* Drawer's old ``RecoverySection`` replaced with generic
``DiagnosticsSection`` rendering a card per active diagnostic:
title + detail + structured data (task-id chips when payload keys
look like id lists) + action buttons. Reassign profile picker is
inline per-diagnostic. Clipboard fallback uses ``.catch()`` for
environments where writeText rejects.
* Three-rung severity palette; amber for warning, orange for error,
red for critical. Uses CSS variables so theming is straightforward.
Tests
-----
* NEW ``tests/hermes_cli/test_kanban_diagnostics.py`` — 14 unit tests
covering each rule's positive/negative/threshold paths, severity
sorting, broken-rule isolation, and sqlite3.Row integration.
* Dashboard plugin tests extended: ``/diagnostics`` endpoint (empty,
populated, severity-filtered), ``/board`` exposes both diagnostic
list and compact summary with ``highest_severity``.
* Existing hallucination-specific test (``test_board_surfaces_
warnings_field_for_hallucinated_completions``) updated to reflect
the new contract: warning summary keys by diagnostic kind
(``hallucinated_cards``) not event kind.
379 kanban-suite tests pass (+16 net from this PR).
Live verification
-----------------
Seeded all 5 diagnostic kinds + one clean + one plain-running task
(7 total) into an isolated HERMES_HOME, spun up the dashboard, and
verified:
* Attention strip: shows ``!! 5 tasks need attention`` in the
error-severity orange; Show expands to a list of 5 rows ordered
critical > error > warning.
* Card badges: error tasks render ``!!`` orange, warning tasks
render ``⚠`` amber, clean and plain-running tasks render no badge.
* Each of the 5 rules opens a correctly-coloured, correctly-styled
diagnostic card in the drawer with its specific suggested action.
* Live reassign from a diagnostic card flipped
``broken-ml-worker → alice`` and the drawer refreshed with the
new assignee + the same diagnostic still firing (correct:
spawn_failures counter hasn't reset yet).
* CLI ``hermes kanban diagnostics`` prints all 5 in severity order;
``--severity error`` narrows to 3; ``kanban show <id>`` includes
the Diagnostics block at the top with suggested action hint.
Migration note
--------------
The old ``warnings`` shape (``{count, kinds, latest_at}``) is
preserved on the API but ``kinds`` now keys by diagnostic kind
(``hallucinated_cards``) instead of event kind
(``completion_blocked_hallucination``). ``highest_severity`` is a
new required field. The dashboard was the only consumer and has
been updated in the same commit; external API consumers of the
``warnings`` field will need to update their kind-match logic.
* feat(kanban/diagnostics): lead titles with the actual error text
The generic 'Worker crashed N runs in a row' / 'Worker failed to spawn
N times' titles buried the actual cause in the data section. Operators
had to open logs or expand the diagnostic to see WHY the worker is
stuck — rate-limit vs insufficient quota vs bad auth vs context
overflow vs network blip all looked identical at a glance.
New titles:
Agent crashed 3x: openai: 429 Too Many Requests - rate limit reached
Agent crashed 3x: anthropic: 402 insufficient_quota - credit balance
Agent crashed 3x: provider auth error: 401 Unauthorized
Agent spawn failed 4x: insufficient_quota: You exceeded your current
Detail keeps the full error snippet (capped at 500 chars + ellipsis
for tracebacks). Title takes the first line capped at 160 chars.
Fallback title if no error recorded stays honest ('no error recorded').
Tests: 4 new cases covering 429/billing/spawn/truncation. 383 total
pass (+4).
Live-verified on dashboard with 6 seeded scenarios
(rate-limit, billing, auth, context, network, spawn-billing) —
each card title leads with the actionable error text.
* docs(agent): remove stale BuiltinMemoryProvider references from memory module docstrings
The BuiltinMemoryProvider class was removed from the codebase but its
name lingered in the module-level docstrings of memory_manager.py and
memory_provider.py, creating false expectations:
- memory_manager.py docstring showed example code doing
add_provider(BuiltinMemoryProvider(...)) which ImportError at runtime
- memory_provider.py docstring listed BuiltinMemoryProvider as
'always present, not removable' — misleading for new contributors
The regression test (test_memory_user_id.py) already passes without
any reference to BuiltinMemoryProvider; it uses RecordingProvider
instances directly. The stale references were docs-only drift.
Update both docstrings to reflect the actual current architecture:
MemoryManager accepts external plugin providers only (one at a time).
Closes #14402
* docs(plugins): document ctx.dispatch_tool() in plugin capabilities table
* docs(guide): add Dispatch tools from slash commands section
* docs(cron): add context_from chaining section
Resolved merge against current main (new No-agent mode section added in parallel).
Co-authored-by: Tony Simons <tony@tonysimons.dev>
* chore: AUTHOR_MAP entry for asimons81
* feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path
Introduces providers/ package — single source of truth for every
inference provider. Adding a simple api-key provider now requires one
providers/<name>.py file with zero edits anywhere else.
What this PR ships:
- providers/ package (ProviderProfile ABC + 33 profiles across 4 api_modes)
- ProviderProfile declarative fields: name, api_mode, aliases, display_name,
env_vars, base_url, models_url, auth_type, fallback_models, hostname,
default_headers, fixed_temperature, default_max_tokens, default_aux_model
- 4 overridable hooks: prepare_messages, build_extra_body,
build_api_kwargs_extras, fetch_models
- chat_completions.build_kwargs: profile path via _build_kwargs_from_profile,
legacy flag path retained for lmstudio/tencent-tokenhub (which have
session-aware reasoning probing that doesn't map cleanly to hooks yet)
- run_agent.py: profile path for all registered providers; legacy path
variable scoping fixed (all flags defined before branching)
- Auto-wires: auth.PROVIDER_REGISTRY, models.CANONICAL_PROVIDERS,
doctor health checks, config.OPTIONAL_ENV_VARS, model_metadata._URL_TO_PROVIDER
- GeminiProfile: thinking_config translation (native + openai-compat nested)
- New tests/providers/ (79 tests covering profile declarations, transport
parity, hook overrides, e2e kwargs assembly)
Deltas vs original PR (salvaged onto current main):
- Added profiles: alibaba-coding-plan, azure-foundry, minimax-oauth
(were added to main since original PR)
- Skipped profiles: lmstudio, tencent-tokenhub stay on legacy path (their
reasoning_effort probing has no clean hook equivalent yet)
- Removed lmstudio alias from custom profile (it's a separate provider now)
- Skipped openrouter/custom from PROVIDER_REGISTRY auto-extension
(resolve_provider special-cases them; adding breaks runtime resolution)
- runtime_provider: profile.api_mode only as fallback when URL detection
finds nothing (was breaking minimax /v1 override)
- Preserved main's legacy-path improvements: deepseek reasoning_content
preserve, gemini Gemma skip, OpenRouter response caching, Anthropic 1M
beta recovery, etc.
- Kept agent/copilot_acp_client.py in place (rejected PR's relocation —
main has 7 fixes landed since; relocation would revert them)
- _API_KEY_PROVIDER_AUX_MODELS alias kept for backward compat with existing
test imports
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #14418
* feat(providers): make all 33 providers pluggable under plugins/model-providers/
Every provider profile is now a self-contained plugin under
plugins/model-providers/<name>/, mirroring the plugins/platforms/
pattern established for IRC and Teams. The ProviderProfile ABC
stays in providers/; the per-provider profile data moves out.
- plugins/model-providers/<name>/__init__.py calls register_provider()
- plugins/model-providers/<name>/plugin.yaml declares kind: model-provider
- providers/__init__.py._discover_providers() lazily scans bundled plugins
then $HERMES_HOME/plugins/model-providers/<name>/ (user override path)
- User plugins with the same name override bundled ones (last-writer-wins
in register_provider)
- Legacy providers/<name>.py layout still supported for back-compat with
out-of-tree editable installs
- Hermes PluginManager: new kind=model-provider; skipped like memory
plugins (providers/ discovery owns them); standalone plugins with
register_provider+ProviderProfile in their __init__.py auto-coerce to
this kind (same heuristic as memory providers)
- skip_names extended to include 'model-providers' so the general
PluginManager doesn't double-scan the category
- 4 new tests in tests/providers/test_plugin_discovery.py covering
bundled discovery, user override, and general-loader isolation
- Docs updated: website/docs/developer-guide/adding-providers.md,
provider-runtime.md, providers/README.md, plugins/model-providers/README.md
No API break: auth.py / config.py / doctor.py / models.py / runtime_provider.py /
model_metadata.py / auxiliary_client.py / chat_completions.py / run_agent.py
all still consume providers via get_provider_profile() / list_providers() —
they just now see plugin-discovered entries instead of pkgutil-iterated ones.
Third parties can now drop a single directory into
~/.hermes/plugins/model-providers/<name>/ to add or override an inference
provider without touching the repo.
* docs(cli): expand hermes import reference — add description, warning, and examples
* docs(bedrock): fix IAM permissions, add quickstart entry, add fallback provider, fix deployment section
* docs: fix Camofox Docker setup instructions
* docs(providers): Together/Groq/Perplexity cookbook via custom_providers
Three worked recipes for OpenAI-compatible cloud providers, plus the
Copilot HTTP 401 auto-recovery info block and the GMI Cloud row in the
compatible providers table. All three additions were on the original
docs/custom-providers-cookbook branch but its merge base predated 1186
main commits, making the rebase impractical (84k+ line conflict).
Replays just the providers.md additions onto current main.
* fix(tui): close slash parity gaps with CLI (#20339)
* fix(tui): close slash parity gaps with CLI
Route unsupported /skills subcommands through slash.exec, support /new <name>
titles, and handle /redraw natively so TUI behavior matches classic CLI. Also
filter gateway-only commands out of the TUI catalog while keeping /status
discoverable.
* fix(tui): run remaining CLI parity paths natively
Forward chat launch flags into the TUI runtime and handle live-session status
and skill reloads in the gateway process so TUI state no longer depends on the
slash worker's stale CLI instance.
* fix(tui): block stale snapshot restores
Prevent snapshot restore from running through the isolated slash worker because
it mutates disk state without refreshing the live TUI agent.
* chore: uptick
* fix(tui): guard async session title updates
Handle failures from the fire-and-forget session.title RPC so title-setting errors do not surface as unhandled promise rejections while preserving session-scoped messaging.
* docs(gemini): add Google Gemini guide
* chore: AUTHOR_MAP entry for jethac
* docs: align terminal-backend count and naming across docs and code
README:24 claimed "Six terminal backends" while tools/environments/ exposes
seven top-level backend choices through TERMINAL_ENV: local, docker, ssh,
singularity, modal, daytona, vercel_sandbox. Modal additionally has direct
and Nous-managed modes selected via terminal.modal_mode (the
ManagedModalEnvironment class is a Modal sub-mode, not a separate top-level
backend).
The same drift appeared in five other doc and code-comment sites with
inconsistent counts (six, seven, or implicit) and varying lists. Updated
all sites to a consistent seven-backend list in canonical order. The
configuration guide also clarifies how Modal's two modes are selected so
operators do not search for a non-existent backend: managed_modal value.
CONTRIBUTING.md:160 lists six backend filenames in a code tree but does
not carry the "Six terminal" prose; left out of scope per cohesion sweep
guidance to bundle only identical wording.
Files updated:
- README.md (line 24, marketing copy)
- website/docs/index.md (line 49, landing page)
- website/docs/user-guide/configuration.md (line 86, config guide)
- tools/environments/__init__.py (lines 3-6, package docstring)
- tools/file_operations.py (line 6, module docstring)
- environments/README.md (line 43, RL training docs — TERMINAL_ENV list)
* chore: AUTHOR_MAP entry for deep-name
* docs: refresh stale platform/LOC/test counts; clarify gateway vs plugin platforms
AGENTS.md is the AI-assistant entry doc, so its counts get used as ground
truth. Several values had drifted, and the same drift had spread to a few
user-facing surfaces. Fixing all of them in one commit so the count claims
agree and clearly distinguish gateway-core from plugin-shipped platforms.
AGENTS.md:
- run_agent.py "~12k LOC" → "~14k LOC as of 2026-05-03" (actual 14,097)
- cli.py "~11k LOC" → "~12k LOC as of 2026-05-03" (actual 12,043)
- tools/environments/ list now lists all 7 user-selectable terminal backends
in canonical order, matching tools/terminal_tool.py:2214-2215
- gateway/platforms/ list adds yuanbao and wecom_callback; the 19 names
match the user-facing list at website/docs/integrations/index.md
- plugins/ tree now mentions plugins/platforms/ (irc, teams)
- tests/ snapshot "~15k tests across ~700 files as of Apr 2026" →
"~19k tests across ~890 files as of 2026-05-03"
User-facing count claims:
- hermes_cli/tips.py:195 — "19 platforms" → "21 messaging platforms" with
IRC and Microsoft Teams added to the named list
- website/docs/index.md:49 — "6 terminal backends" → "7 terminal backends:
..., Vercel Sandbox" (also corrected by PR #19044; same edit content)
- website/docs/index.md:50 — "15+ platforms from one gateway" → "21+ messaging
platforms (19 in the gateway, plus IRC and Microsoft Teams via plugins)"
- website/docs/integrations/index.md:83-85 — "15+ messaging platforms" → "19+",
added yuanbao to the linked list. The surrounding text scopes it to "configured
through the same gateway subsystem", so plugin platforms (IRC, Teams) are
intentionally not in this list
- website/scripts/generate-llms-txt.py:205 — "15+ platforms" → "21+ messaging
platforms — 19 native to the gateway plus IRC and Microsoft Teams via plugins"
LOC and date stamps follow the existing AGENTS.md "as of <date>" convention
(line 56 already used this pattern). Source of truth for the gateway count is
gateway/config.py:130-148 (PlatformID enum); plugin platforms live in
plugins/platforms/.
Out of scope:
- RELEASE_v0.9.0.md historical "16 platforms" claim (immutable history)
- userStories.json verbatim user quotes
- Programmatic count generation from gateway/config.py + plugin manifests
is a worthwhile build-system change but separate from these content fixes
* docs(skills): explain restoring bundled skills
* docs(docker): add section on connecting to local inference servers (vLLM, Ollama)
Adds a comprehensive guide for connecting Dockerized Hermes to local
inference servers like vLLM and Ollama, covering:
- Docker Compose networking (recommended)
- Standalone Docker run with host.docker.internal / --network host
- Connectivity verification steps
- Ollama-specific example
Closes #12308
* docs(docker): document API_SERVER_* env vars for exposing the OpenAI-compatible endpoint
Salvage of #11758. The PR's original diff was stale (the Docker Compose section on main has been heavily refactored — dashboard is now an embedded side-process, not a separate service), so the useful bit (API server env var requirements) is applied as a note on the basic `docker run` example.
Co-authored-by: xiangyong <xiangyong@zspace.cn>
* chore: AUTHOR_MAP entry for CES4751
* docs(discord): fix Server Members Intent + SSRC-mapping drift; add /voice join slash Choice
Salvage of #11350. Kept:
- Code: add an explicit /voice join Choice in the slash UI (runner accepts both 'join' and 'channel' but only 'channel' was in autocomplete).
- Docs: Server Members Intent is conditional (only needed if DISCORD_ALLOWED_USERS contains usernames); SSRC → user_id mapping uses the voice websocket SPEAKING opcode, not the Members intent.
Dropped from the original PR:
- HERMES_DISCORD_VOICE_PACKET_DUMP — this env var doesn't exist on main (it was in a different PR that isn't merged).
- DISCORD_PROXY docs — already documented on current main.
- DISCORD_ALLOW_MENTION_* docs — already on main.
- "barge-in mode" rewrite — current main actually does pause the listener during TTS (VoiceReceiver.pause() at discord.py:192); there is no barge_in_guard/barge_in_rms on main.
Co-authored-by: Michel Belleau <michel.belleau@malaiwah.com>
* docs(skills): modernize Obsidian file workflows
* chore: AUTHOR_MAP entry for counterposition
* docs(kanban): document handoff evidence metadata
* chore: AUTHOR_MAP entry for Fearvox
* docs: clarify Telegram group chat troubleshooting
* docs(codex): clarify OAuth auth prerequisite
* docs(voice): add Doubao speech integration examples (TTS + STT)
* chore: AUTHOR_MAP entry for Hypnus-Yuan
* docs(faq): use messaging extra for gateway deps
* chore: AUTHOR_MAP entry for xsfX20
* fix(kanban): unify failure counter across spawn/timeout/crash outcomes (#20410)
The dispatcher's circuit breaker only protected against spawn-side
failures (profile missing, workspace mount error, exec failure).
Workers that successfully spawned but then timed out or crashed
re-queued to ``ready`` with no counter increment, so the next tick
re-spawned them — loops forever until someone noticed. Reported
externally on Twitter (Forbidden Seeds) and confirmed by walking the
kernel: ``enforce_max_runtime`` flipped the task back to ready, emitted
a ``timed_out`` event, and never touched ``spawn_failures``; same for
``detect_crashed_workers``.
Fix: unify the counter across all non-success outcomes.
Schema
------
* ``tasks.spawn_failures`` → ``tasks.consecutive_failures``
* ``tasks.last_spawn_error`` → ``tasks.last_failure_error``
* Migration renames the columns in-place on existing DBs (``ALTER
TABLE RENAME COLUMN`` — SQLite >= 3.25) so historical counter
values are preserved. Row mappers fall through to the legacy names
if both column renames and a migration somehow got out of sync.
Counter lifecycle
-----------------
New helper ``_record_task_failure(conn, task_id, error, *, outcome,
release_claim, end_run, event_payload_extra)`` is the single point
every non-success outcome funnels through:
* ``spawn_failed`` → ``_record_spawn_failure`` (kept as alias)
calls it with ``release_claim=True, end_run=True`` — transitions
running→ready, clears claim, closes run.
* ``timed_out`` → ``enforce_max_runtime`` already does the status
transition + run close + event emission, then calls
``_record_task_failure`` with ``release_claim=False, end_run=False``
just to bump the counter (and trip the breaker if needed).
* ``crashed`` → ``detect_crashed_workers`` same pattern, but the
counter increment runs after the main write_txn closes (SQLite
doesn't nest write transactions).
If the counter hits the breaker threshold (``DEFAULT_FAILURE_LIMIT=5``,
same as before), the task transitions to ``blocked`` with a ``gave_up``
event on top of whatever outcome-specific event was already emitted.
Reset semantics changed: the counter now clears only on successful
``complete_task`` (and operator ``reclaim_task`` — an explicit "I've
looked at this, try again with a fresh budget"). Previously
``_clear_spawn_failures`` ran on every successful spawn, which would
have wiped the counter before a timeout could accumulate past threshold
— exactly the loop this fix prevents.
Diagnostics
-----------
* ``_rule_repeated_spawn_failures`` → ``_rule_repeated_failures``. Now
fires regardless of which outcome is at fault. Classifies the most
recent failure (spawn_failed / timed_out / crashed) from the run
history so the title ("Agent timeout x3", "Agent crash x4", "Agent
spawn x5") and suggested action (``doctor`` for spawn, ``log`` for
timeout/crash) stay outcome-specific without N duplicate rules.
* ``_rule_repeated_crashes`` kept as a narrower early-warning at
threshold 2 (vs 3 for the unified rule), but now suppresses itself
when the unified rule would also fire — avoids double-flagging.
* Diagnostic ``data`` payload now carries
``{consecutive_failures, most_recent_outcome, last_error}`` instead
of spawn-specific keys.
CLI
---
* ``Task.consecutive_failures`` / ``Task.last_failure_error`` are the
public fields now. Existing callers that referenced the old names
get migrated (tests updated in this commit).
* Backward-compat: ``DEFAULT_SPAWN_FAILURE_LIMIT``,
``_clear_spawn_failures``, ``_record_spawn_failure`` stay as aliases.
Tests
-----
* 6 new kernel tests: timeout increments counter, 3 consecutive
timeouts trip the breaker (was the reported gap), crash increments
counter, reclaim clears counter, completion clears counter, spawn
success does NOT clear counter.
* Diagnostic tests: updated ``repeated_spawn_failures`` cases to use
the new kind name and add a timeout-loop test.
* Dashboard API test: spawn_failures column update → consecutive_failures.
389/389 kanban-suite tests pass.
Live verification
-----------------
Seeded 4 tasks in an isolated HERMES_HOME: 3 timeouts, 4 crashes,
2-spawn-failed + 2-timed-out, and a task that had prior failures but
completed successfully. Board correctly shows "!! 3 tasks need
attention" (the successful one has no badge because the counter
reset). Drawer for the timeout-loop task renders "Agent timeout x3"
with most_recent_outcome=timed_out and the "Check logs" suggested
action (not the spawn-flavoured "Verify profile"). The successful
task has zero diagnostics.
Closes the Forbidden-Seeds-reported gap.
* docs(guides): add guide for running Hermes locally with Ollama
Step-by-step guide covering Ollama installation, model selection,
Hermes configuration, speed optimization, and optional gateway bot
setup — all running on local hardware with zero API cost.
Includes hardware requirements, model comparison table with tool-call
support status, context window tuning, GPU offloading tips, fallback
provider setup, troubleshooting, and cost comparison.
* chore: AUTHOR_MAP entry for binhnt92
* docs: add Open WebUI bootstrap script
* chore: AUTHOR_MAP entry for acesjohnny
* docs(browser): document WSL-to-Windows Chrome MCP bridge
* chore: AUTHOR_MAP entry for liu-collab
* docs(i18n): add zh-Hans Tool Gateway, image gen, and Windows WSL guide
Made-with: Cursor
* docs: add Chinese (zh-CN) README translation
Closes #12954
- Add README.zh-CN.md with complete Simplified Chinese translation
- Add language switcher badge in README.md linking to Chinese version
- Add language switcher badge in README.zh-CN.md linking to English version
* chore: AUTHOR_MAP entry for zhanggttry
* docs: update VS Code setup instructions for ACP Client integration
* chore: AUTHOR_MAP entry for formulahendry
* test(kanban): cover metadata handoff round-trip
* feat(gateway): respect kanban.max_spawn config to limit concurrent tasks
The dispatch_once function already accepts a max_spawn parameter but the
gateway was calling it without passing any value, effectively ignoring
the configuration. This change reads kanban.max_spawn from config.yaml
and passes it through, allowing users to limit concurrent kanban tasks.
This prevents resource exhaustion scenarios where kanban dispatcher
spawns too many parallel workers on constrained hardware.
* guard kanban worker lifecycle by run id
* chore(release): AUTHOR_MAP entries for momowind and misery-hl
* feat(hindsight): probe API for update_mode='append' support, dedupe across processes
Mirrors the pattern already shipping in hindsight-integrations/openclaw:
probe `<api_url>/version` once per process, gate on Hindsight ≥ 0.5.0.
When supported, retains use a stable session-scoped `document_id`
(`session_id`) plus `update_mode='append'` so cross-process retains for
the same session merge into one document instead of producing
N-different-process-stamped duplicates. When unsupported (or probe
fails), fall back to the existing per-process unique
`f"{session_id}-{start_ts}"` document_id with no `update_mode` — the
resume-overwrite fix (#6654) keeps working unchanged on legacy servers.
Closes the dedup half of #20115. The proposed `document_id_strategy`
config knob isn't needed: auto-detection via the same /version probe
the OpenClaw plugin already uses gives the same outcome with no extra
config burden, and the choice is purely a function of what the server
can do.
Plumbing
--------
- Module-level helpers (`_meets_minimum_version`, `_fetch_hindsight_api_version`,
`_check_api_supports_update_mode_append`) cache the result per api_url
so every provider in the process gets one /version round-trip.
- One-time WARN logged when the API is older than 0.5.0, telling the
user to upgrade for cross-session deduplication.
- New instance helper `_resolve_retain_target(fallback_doc_id)` returns
`(document_id, update_mode)` based on cached capability. Wired into
`sync_turn` and the `on_session_switch` flush path.
- For local_embedded mode, the probe URL is taken from the running
client (`client.url`) so we hit the actual daemon port rather than
the configured default.
- `update_mode` is set on the per-item dict; `aretain_batch` already
threads `item['update_mode']` into the API call.
Tests
-----
- `TestUpdateModeAppendCapability` (5 cases): legacy fallback, modern
stable+append, per-url cache, one-time warn, flush-on-switch resolves
against the OLD session.
- Existing `_make_hindsight_provider` factory in the manager-side test
file extended to seed `_mode`/`_api_url`/`_api_key`/`_client` and stub
`_resolve_retain_target` so the bypass-init pattern keeps working.
E2E verified against installed `~/.hermes/hermes-agent`:
- Legacy probe (unreachable host) → `legacy-session-<ts>` doc_id,
no `update_mode`.
- Modern probe (live local_embedded 0.5.6 daemon) → stable
`modern-session` doc_id + `update_mode='append'`.
- `test_hermes_embedded_smoke.py` passes (90s).
* fix(api_server): SSE token batching + error handling for Open WebUI performance
Reduces SSE event rate ~500/turn → ~20/turn via 50ms text-delta batching in
_dispatch(), which eliminates markdown re-render storms on Open WebUI. Also:
- Trim tool_call.arguments in the response.completed event to 100KB
(prevents silent hangs on 848KB+ single-line SSE events).
- Catch-all exception handlers in _write_sse_responses() + _write_sse_chat_completion()
emit a proper error chunk instead of TransferEncodingError from incomplete
chunked encoding when the agent crashes mid-stream.
- MAX_REQUEST_BYTES 1MB → 10MB; pass client_max_size to aiohttp Application to
avoid silent 400s on truncated request bodies for long conversations.
Salvage of #17552 (api_server portion only). The contrib/openwebui-filter/
payload from that PR — Open WebUI Filter Function + benchmark writeup — is
a client-side user-installable add-on and doesn't need to live in the repo;
dropped here. Closes #17537.
Co-authored-by: bogerman1 <93757150+bogerman1@users.noreply.github.com>
* chore: AUTHOR_MAP entry for bogerman1
* feat(i18n): add French (fr) locale support
- Add fr.yaml with French translations for approval prompts and gateway messages
- Register 'fr' in SUPPORTED_LANGUAGES
- Add French aliases: french, français, fr-fr, fr-be, fr-ca, fr-ch
- Update locale sync comment in en.yaml
* feat(i18n): add Ukrainian locale
* chore: AUTHOR_MAP entry for olisikh
* arcee temperature + compression
* test(arcee): cover Trinity Large Thinking temperature + compression overrides
Salvage follow-up for PR #20344:
- AUTHOR_MAP entry for rob-maron (required by CI)
- 17 parametrized tests covering _is_arcee_trinity_thinking,
_fixed_temperature_for_model Trinity override, and
_compression_threshold_for_model, including sibling-model negatives
(trinity-large-preview, trinity-mini) and the OpenRouter slug form.
* fix(doctor): report Kanban worker tools as runtime-gated
* fix(kanban): accept created_cards linked as child of completing task
Widens _verify_created_cards to also accept ids that are children of the
completing task in task_links. Previously we only accepted cards where
created_by matched the completing task's assignee, which was too strict
for legitimate orchestrator flows: a specifier creates a card (so
created_by=specifier, not worker), then a worker picks it up and passes
parents=[current_task] to kanban_create. The explicit link proves the
relationship and should be trusted.
Salvaged from #20022 @LeonSGP43 (full PR superseded by #20232 +
this patch; the linked-children relaxation was the portable
improvement).
* fix(kanban): measure max runtime from current run
* test(kanban): backdate task_runs.started_at alongside tasks.started_at
After #19473 landed (enforce_max_runtime reads from task_runs.started_at
rather than tasks.started_at), a regression test added earlier still
only backdated the tasks column. Backdate both so the test is robust
regardless of which column the enforcer reads from.
* fix(kanban): prevent child task dispatch when parent is not done
Add parent dependency guard to _set_status_direct so dragging
a task to the ready column is rejected (409) when its parents
are not all done. Previously the guard only existed in
recompute_ready, allowing direct status writes via the
dashboard API to bypass the dependency engine.
Root cause: after reclaiming stale workers, both T3 and T4
were set to ready via dashboard status writes in quick
succession, causing the writer to be spawned while the analyst
was blocked — upstream work wasn't done yet.
* feat(kanban): surface task_runs.summary on dashboard cards + ``kanban show``
The kanban-worker skill (built into the gateway dispatcher's spawn
prompt) instructs every worker to hand off via
``kanban_complete(summary=..., metadata=...)``. That writes the summary
onto the closing ``task_runs`` row, NOT onto ``tasks.result`` — the
latter is left NULL unless the caller passes ``result=`` explicitly.
Result: a glance at the dashboard or ``hermes kanban show <id>`` shows
a blank "Result:" section even when the worker did real work, which
on 2026-05-05 caused a Mac false-alarm ("Hermes did nothing") on a
task that had a 10-line completion summary on its run.
This patch surfaces the latest non-null run summary as
``latest_summary`` so the worker's actual handoff lands in front of
operators.
* New helpers ``kanban_db.latest_summary(conn, task_id)`` and
``kanban_db.latest_summaries(conn, task_ids)``. The batch variant
uses a single window-function SELECT so the dashboard board endpoint
doesn't pay an N+1 cost on multi-hundred-task boards.
* CLI ``hermes kanban show <id>`` prints a "Latest summary:" block
when ``tasks.result`` is empty but a run has produced a summary
(the existing "Result:" section still wins when populated, so the
back-compat path for hand-edited results is untouched). JSON output
gains a top-level ``latest_summary`` field.
* Dashboard ``/board`` and ``/tasks/{id}`` now include a
``latest_summary`` field on every task. Cards on /board carry a
200-character preview (cheap to render, plenty for "what did this
worker do?" at a glance); the drawer/detail endpoint returns the
full summary.
* Five new tests cover: empty-runs case, post-complete surface,
newest-of-multiple selection, empty-string skip, batch with
missing tasks + empty input.
Smoke-tested locally against the live profile DB on the three
acceptance-criterion targets (t_f08fef91 cron-hygiene-audit,
t_007b7f1c EMA-analysis, t_05746fa4 self-assessment) — all three now
return their populated summaries via both ``latest_summary`` and
``latest_summaries``.
Test plan: 255/255 kanban tests pass + 91/91 dashboard plugin tests
pass. No regression on tasks where ``tasks.result`` is explicitly
populated (the existing "Result:" branch is preserved).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(kanban): wire dependency selects
* chore(release): AUTHOR_MAP entries for suncokret12 and mioimotoai-lgtm
* feat(i18n): add Turkish (tr) locale
- Add locales/tr.yaml with Turkish translations for all approval.* and gateway.* keys
- Register 'tr' in SUPPORTED_LANGUAGES
- Add Turkish aliases: turkish, türkçe, tr-tr
* fix: add Turkish locale references in config, tests, and docs
- hermes_cli/config.py: add tr to supported languages comment
- locales/en.yaml: add tr to locale file list comment
- tests/agent/test_i18n.py: add Turkish alias tests + explicit lang test
- website/docs/user-guide/configuration.md: add tr to supported values
* docs: document custom model aliases for /model command (#20475)
User-defined model aliases (config.yaml model_aliases: and
model.aliases.*) have worked since early versions but were entirely
undocumented. Add a dedicated 'Custom model aliases' section to
slash-commands.md covering both YAML config formats and the
'hermes config set' shell form, mirror a shorter version into the
configuring-models 'Alternative methods' section, and cross-link from
the two /model table rows.
Flagged by @weehowe on Twitter — he wasn't aware the feature existed.
* feat(models): add deepseek/deepseek-v4-pro to OpenRouter + Nous Portal curated lists (#20495)
Endpoint re-tested over 6 conversational turns (9 API calls, 3 tool calls)
and an 8-request burst — no rate limits, no errors, ~2-3s latency. The
historical rate-limit issues that caused its removal are gone.
- hermes_cli/models.py: add to OPENROUTER_MODELS and _PROVIDER_MODELS['nous']
- website/static/api/model-catalog.json: regenerated via build_model_catalog.py
* feat(models): add x-ai/grok-4.3 to OpenRouter + Nous Portal curated lists (#20497)
Endpoint validated over 6 conversational turns with tool calls (9 API
calls, 3 tool calls, 0 failures) and an 8-request burst (8/8 ok,
0 rate limits). Latency ~5-10s/call — slower than grok-4.20 but
expected for a reasoning model.
- hermes_cli/models.py: add to OPENROUTER_MODELS and _PROVIDER_MODELS['nous']
- website/static/api/model-catalog.json: regenerated
* fix: salvage batch — compaction guidance, memory authority, cache eviction after compression
- Fix /compact → /compress in context-overflow tips (closes #20020)
- Evict cached agent after session hygiene and /compress so system
prompt refreshes with current SOUL.md, memory, and skills
- Restore memory authority across compaction: change 'informational
background data' to 'authoritative reference data' in memory block
and SUMMARY_PREFIX, with backward-compatible regex
Based on:
- PR #20027 by @LeonSGP43
- PR #18767 by @MacroAnarchy
- PR #17380 by @vominh1919
PR #17121 boundary marker fix already merged to main (2eef395e1).
PR #9262 user-message anchoring already on main via _ensure_last_user_message_in_tail().
* feat(browser): add Lightpanda engine support with automatic Chrome fallback
Add Lightpanda as an optional browser engine for local mode.
Lightpanda is a headless browser built from scratch in Zig -- faster
navigation than Chrome with significantly less memory.
One config line to enable:
browser:
engine: lightpanda
New functions in browser_tool.py:
- _get_browser_engine() -- config/env reader with validation + caching
- _should_inject_engine() -- only inject in local non-cloud mode
- _needs_lightpanda_fallback() -- detect empty/failed LP results
- _chrome_fallback_screenshot() -- temporary Chrome session for screenshots
- Engine injection in _run_browser_command (--engine flag)
- browser_vision pre-routes screenshots to Chrome when engine=lightpanda
Config:
- browser.engine in DEFAULT_CONFIG (auto/lightpanda/chrome)
- AGENT_BROWSER_ENGINE in OPTIONAL_ENV_VARS
- /browser status shows engine info in local mode
Rebased from PR #7144 onto current main. All existing code preserved --
pure additions only (+520/-2).
25 new tests + 81 total browser tests pass (0 failures).
* fix(browser): surface Lightpanda Chrome fallback warnings
* feat(tui): collapsible sections in startup banner (skills, system prompt, MCP)
The TUI SessionPanel banner now uses collapsible \u25b8/\u25be toggle
sections matching the existing Chevron convention used for runtime
agent details. Skills, system prompt, and MCP server lists are
collapsed by default; tools remain expanded as the most actionable
info.
- tui_gateway/server.py: _session_info() now passes agent._cached_system_prompt
through to the TUI frontend
- ui-tui/src/types.ts: added system_prompt?: string to SessionInfo
- ui-tui/src/components/branding.tsx: rewrote SessionPanel with
CollapseToggle helper + per-section useState toggles
Default states: tools=open, skills=collapsed, system=collapsed,
mcp=collapsed. Clicking any \u25b8/\u25be header toggles that section.
* fix(tui): collapse long system messages in transcript with expand toggle
System messages over 400 chars (system prompt, AGENTS.md, etc.) now
render as a collapsed \u25b8/\u25be toggle line in the transcript, matching
the Chevron convention used for runtime details. The summary shows
the first line + char count; clicking expands to full content.
* fix(browser): tighten Lightpanda fallback edge cases
* fix(gateway): preserve model picker current context
* fix(update): drop pip --quiet so slow installs don't look hung (#20679)
On Termux/Android aarch64 (and other platforms without prebuilt wheels
for some optional extras), 'pip install -e .[all]' compiles C/Rust
extensions from source. This can run for several minutes with zero
network activity and — with --quiet — zero stdout. Users report
'hermes update hangs at Updating Python dependencies', Ctrl+C it, then
re-run and see 'up to date' (because git pull already succeeded and the
pip step was still working when they interrupted).
Pip's default output is proportional to actual work (one line per
Collecting / Building wheel for X / Installing), so removing --quiet
costs nothing on fast hardware and prevents the false-hang interrupt
loop on slow hardware.
Reported via Discord on Termux/Android. Supersedes #20466 which
misdiagnosed the hang as PYTHONPATH shadowing (install.sh doesn't run
during 'hermes update', and terminal() doesn't inherit PYTHONPATH).
* fix(cli): guard logger.debug in signal handler (#13710 regression) (#20673)
CPython's logging module is not reentrant-safe. `Logger.isEnabledFor`
caches level results in `Logger._cache`; under shutdown races the cache
can be cleared (`Logger._clear_cache`, triggered by logging config changes
from another thread) or mid-mutation when a signal fires, raising
`KeyError: <level_int>` (e.g. `KeyError: 10` for DEBUG) inside the signal
handler.
When that happens, the KeyError escapes before the `raise KeyboardInterrupt()`
on the next line can fire, which bypasses prompt_toolkit's normal interrupt
unwind and surfaces as the EIO cascade originally reported in #13710.
Issue #13710 shipped two defenses (asyncio exception handler + outer
`except (KeyError, OSError)` with EIO suppression) that cover the EIO
unwind path. This patch closes the remaining escape hatch: the
`logger.debug` call at the top of `_signal_handler` itself. Wrap it in a
bare `try/except Exception: pass` so logging can never raise through a
signal handler.
Observed in the wild: debug report on 0.12.0 (commit 8163d371) shows the
exact stack — KeyError: 10 at logging/__init__.py:1742 inside the
signal handler's `logger.debug`, followed by the EIO cascade from
prompt_toolkit's emergency flush.
Tests: adds `TestSignalHandlerLoggingRace` to
`tests/hermes_cli/test_suppress_eio_on_interrupt.py` with 6 new cases:
- normal path still raises KeyboardInterrupt
- KeyError(10) from logger.debug does not escape
- any Exception from logger.debug is swallowed
- agent.interrupt still fires when logger.debug raises
- agent.interrupt raising also does not escape
- BaseException (SystemExit) is NOT swallowed — guard uses `except Exception`
deliberately so real shutdown signals still propagate
Closes #13710 regression.
* fix: harden install.sh against inherited Python env leakage
* chore: AUTHOR_MAP entry for adybag14-cyber
* fix(ui): reduce status-line jitter while scrolling
* fix(tui): stabilize FaceTicker elapsed width to prevent composer drift
* fix(tui): restore gap before duration when verb segment is hidden
The verb-padding change dropped the leading space in durationSegment on
the assumption that the verb's trailing pad always supplies the gap. But
the unicode spinner style sets showVerb=false, making verbSegment an
empty string — in that mode the output would become `{frame}· {duration}`
with no separator. Add the space back; harmless when the verb segment
is shown (its trailing pad still provides the gap).
* chore(release): map liuguangyong@hellobike -> liuguangyong93
* fix(kanban): reset code element background inside board
The Nous DS globals.css applies a global rule:
code { background: var(--midground); color: var(--background); }
This paints an opaque cream/yellow fill on every <code> element,
which hides text in the kanban drawer's event-payload, run-meta,
and worker-log panes (all rendered as <code>).
Fix: scope a reset inside .hermes-kanban so <code> elements inherit
their parent's color and stay transparent.
* fix(cli): recover classic CLI output after resize
* feat(skills): add shop-app personal shopping assistant (optional) (#20702)
Port Shop.app's upstream SKILL.md (https://shop.app/SKILL.md) into
optional-skills/productivity/shop-app/ with Hermes-native adaptations:
- Proper Hermes frontmatter (name, description<=60 chars, version,
author, license, prerequisites, metadata.hermes tags + related_skills
+ homepage + upstream)
- Swap Shop.app's bespoke 'message()' tool references for Hermes
conventions: gateway adapters handle platform formatting, so the
skill just writes markdown (no Telegram/WhatsApp/iMessage sections
referencing a tool Hermes doesn't ship)
- Name Hermes tools where relevant: curl via 'terminal', HTML policy
pages via 'web_extract', try-on via 'image_generate'
- Reframe session state as 'hold in your reasoning context for this
conversation only' and forbid writing tokens to .env / disk — matches
Hermes ephemeral-memory discipline
- Drop NO_REPLY convention (Shop-app-runtime specific)
- Trigger-first description so the skill loader picks it up when the
user wants to search products, track orders, returns, or reorder
* feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709)
Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.
Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:
1. Each working directory got its own full shadow git repo — no object
dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
/rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
without ever asking for /rollback.
Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.
Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
per-project metadata (store/projects/<hash>.json with workdir +
created_at + last_touch). On first v2 init, any pre-v2 per-directory
shadow repos are auto-migrated into legacy-<timestamp>/ so the new
store starts clean. _prune() now actually rewrites the per-project ref
to the last max_snapshots commits and runs git gc --prune=now. New
_enforce_size_cap() drops oldest commits round-robin across projects
when the store exceeds max_total_size_mb. _drop_oversize_from_index()
filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
(status / list / prune / clear / clear-legacy) for managing the store
outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
*.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
coverage for the pre-v2 prune path (per-directory shadow repos under
base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
shared store, new defaults, migration, and the new CLI;
reference/cli-commands.md documents 'hermes checkpoints'.
E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
--retention-days 0' removes its ref, index, and metadata; GC reclaims
the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
CLI without an agent running.
Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.
* fix(cli): catch OSError in _resolve_attachment_path to prevent ENAMETOOLONG dropping long slash commands
When the user pastes a long slash command like \`/goal <long prose>\` into
\`hermes chat\`, the input flows into \`_detect_file_drop()\`, whose
\`starts_like_path\` prefilter accepts anything starting with \`/\` and
forwards it to \`_resolve_attachment_path()\`. That helper calls
\`Path.exists()\` which invokes \`os.stat()\`, which raises
\`OSError(errno=ENAMETOOLONG)\` — 63 on macOS, 36 on Linux — when the
candidate exceeds NAME_MAX (typically 255 bytes).
The OSError propagates up to the broad \`except Exception\` in
\`process_loop\` (cli.py:11798), gets logged at WARNING level, and the
user's input is silently dropped. From the user's POV the chat prompt
hangs — the only signal is in agent.log:
WARNING cli: process_loop unhandled error (msg may be lost):
[Errno 63] File name too long: "/goal Drive the space board..."
This affects any slash command with prose-length arguments — \`/goal\`
in particular but also \`/skill\`, \`/cron\`, custom user commands.
Fix: wrap the \`exists()\`/\`is_file()\` calls in try/except OSError so
structurally-invalid path candidates cleanly return None. The slash-
command dispatch path downstream (cli.py:11718) then handles the
input correctly.
Tests: two new regression cases in test_cli_file_drop.py cover the
original \`/goal\` reproducer and a synthetic long path. All 35 file-
drop tests pass.
Reproducer (without the fix):
python -c "from cli import _detect_file_drop;
_detect_file_drop('/goal ' + 'a'*300)"
→ OSError: [Errno 63] File name too long
* chore(release): map cleo@edaphic.xyz → curiouscleo
Follow-up to the salvaged fix for /goal ENAMETOOLONG drop — adds
AUTHOR_MAP entry so the release script resolves the commit author to
the correct GitHub user.
* docs(wsl2): expand Windows (WSL2) guide — filesystem, networking, services, pitfalls (#20748)
Replaces the 22-line stub with a ~320-line guide covering the parts of the
Windows/WSL2 split that specifically affect Hermes users:
- Why WSL2 (and not native Windows)
- Install: distro choice, WSL1→2, systemd via /etc/wsl.conf
- Filesystem boundary: /mnt/c vs \\wsl$, perf/perms/watchers/case,
wslpath/wslview, CRLF + git core.autocrlf, clone-where guidance
- Networking in both directions:
- WSL → Windows services: links to the canonical WSL2 Networking section
in integrations/providers.md (mirrored mode, NAT + host IP, bind addr,
firewall) instead of duplicating
- Windows/LAN → Hermes in WSL: mirrored vs NAT, netsh portproxy one-liner,
firewall rule, webhook tunneling pointer
- Long-running services: systemd gateway + Task Scheduler wsl.exe --exec
'sleep infinity' to keep the VM alive at login
- GPU passthrough: NVIDIA works, AMD/Intel out of matrix
- Common pitfalls: connection refused, /mnt/c slowness, CRLF ^M,
UNC warnings, post-sleep clock drift, mirrored-mode DNS with VPN,
PATH, Defender scanning, VHDX disk reclaim
All internal links use site-absolute /docs/... form (matches the rest of
user-guide/); all seven link targets verified to exist.
* docs: pluggable surfaces coverage — model-provider guide, full plugin map, opt-in fix (#20749)
* docs(providers): add model-provider-plugin authoring guide + fix stale refs
New docs:
- website/docs/developer-guide/model-provider-plugin.md — full authoring
guide (directory layout, minimal example, ProviderProfile fields,
overridable hooks, user overrides, api_mode selection, auth types,
testing, pip distribution)
- Wired into website/sidebars.ts under 'Extending'
- Cross-references added in:
- guides/build-a-hermes-plugin.md (tip block)
- developer-guide/adding-providers.md
- developer-guide/provider-runtime.md
User guide:
- user-guide/features/plugins.md: Plugin types table grows from 3 to 4
with 'Model providers' row
Stale comment cleanup (providers/*.py → plugins/model-providers/<name>/):
- hermes_cli/main.py:_is_profile_api_key_provider docstring
- hermes_cli/doctor.py:_build_apikey_providers_list docstring
- hermes_cli/auth.py: PROVIDER_REGISTRY + alias auto-extension comments
- hermes_cli/models.py: CANONICAL_PROVIDERS auto-extension comment
AGENTS.md:
- Project-structure tree: added plugins/model-providers/ row
- New section: 'Model-provider plugins' explaining discovery, override
semantics, PluginManager integration, kind auto-coerce heuristic
Verified: docusaurus build succeeds, new page renders, all 3 cross-links
resolve. 347/347 targeted tests pass (tests/providers/,
tests/hermes_cli/test_plugins.py, tests/hermes_cli/test_runtime_provider_resolution.py,
tests/run_agent/test_provider_parity.py).
* docs(plugins): add 'pluggable interfaces at a glance' maps to plugins.md + build-a-hermes-plugin
Devs landing on either the user-guide plugin page or the build-a-plugin
guide now get an upfront table of every distinct pluggable surface with
a link to the right authoring doc. Previously they'd have to read the
full general-plugin guide to discover that model providers / platforms
/ memory / context engines are separate systems.
user-guide/features/plugins.md:
- New 'Pluggable interfaces — where to go for each' section below the
existing…
1 task
bot-ted
added a commit
to bot-ted/hermes-agent
that referenced
this pull request
May 7, 2026
* feat(browser): add Lightpanda engine support with automatic Chrome fallback
Add Lightpanda as an optional browser engine for local mode.
Lightpanda is a headless browser built from scratch in Zig -- faster
navigation than Chrome with significantly less memory.
One config line to enable:
browser:
engine: lightpanda
New functions in browser_tool.py:
- _get_browser_engine() -- config/env reader with validation + caching
- _should_inject_engine() -- only inject in local non-cloud mode
- _needs_lightpanda_fallback() -- detect empty/failed LP results
- _chrome_fallback_screenshot() -- temporary Chrome session for screenshots
- Engine injection in _run_browser_command (--engine flag)
- browser_vision pre-routes screenshots to Chrome when engine=lightpanda
Config:
- browser.engine in DEFAULT_CONFIG (auto/lightpanda/chrome)
- AGENT_BROWSER_ENGINE in OPTIONAL_ENV_VARS
- /browser status shows engine info in local mode
Rebased from PR #7144 onto current main. All existing code preserved --
pure additions only (+520/-2).
25 new tests + 81 total browser tests pass (0 failures).
* fix(browser): surface Lightpanda Chrome fallback warnings
* feat(tui): collapsible sections in startup banner (skills, system prompt, MCP)
The TUI SessionPanel banner now uses collapsible \u25b8/\u25be toggle
sections matching the existing Chevron convention used for runtime
agent details. Skills, system prompt, and MCP server lists are
collapsed by default; tools remain expanded as the most actionable
info.
- tui_gateway/server.py: _session_info() now passes agent._cached_system_prompt
through to the TUI frontend
- ui-tui/src/types.ts: added system_prompt?: string to SessionInfo
- ui-tui/src/components/branding.tsx: rewrote SessionPanel with
CollapseToggle helper + per-section useState toggles
Default states: tools=open, skills=collapsed, system=collapsed,
mcp=collapsed. Clicking any \u25b8/\u25be header toggles that section.
* fix(tui): collapse long system messages in transcript with expand toggle
System messages over 400 chars (system prompt, AGENTS.md, etc.) now
render as a collapsed \u25b8/\u25be toggle line in the transcript, matching
the Chevron convention used for runtime details. The summary shows
the first line + char count; clicking expands to full content.
* fix(browser): tighten Lightpanda fallback edge cases
* fix(gateway): preserve model picker current context
* fix(update): drop pip --quiet so slow installs don't look hung (#20679)
On Termux/Android aarch64 (and other platforms without prebuilt wheels
for some optional extras), 'pip install -e .[all]' compiles C/Rust
extensions from source. This can run for several minutes with zero
network activity and — with --quiet — zero stdout. Users report
'hermes update hangs at Updating Python dependencies', Ctrl+C it, then
re-run and see 'up to date' (because git pull already succeeded and the
pip step was still working when they interrupted).
Pip's default output is proportional to actual work (one line per
Collecting / Building wheel for X / Installing), so removing --quiet
costs nothing on fast hardware and prevents the false-hang interrupt
loop on slow hardware.
Reported via Discord on Termux/Android. Supersedes #20466 which
misdiagnosed the hang as PYTHONPATH shadowing (install.sh doesn't run
during 'hermes update', and terminal() doesn't inherit PYTHONPATH).
* fix(cli): guard logger.debug in signal handler (#13710 regression) (#20673)
CPython's logging module is not reentrant-safe. `Logger.isEnabledFor`
caches level results in `Logger._cache`; under shutdown races the cache
can be cleared (`Logger._clear_cache`, triggered by logging config changes
from another thread) or mid-mutation when a signal fires, raising
`KeyError: <level_int>` (e.g. `KeyError: 10` for DEBUG) inside the signal
handler.
When that happens, the KeyError escapes before the `raise KeyboardInterrupt()`
on the next line can fire, which bypasses prompt_toolkit's normal interrupt
unwind and surfaces as the EIO cascade originally reported in #13710.
Issue #13710 shipped two defenses (asyncio exception handler + outer
`except (KeyError, OSError)` with EIO suppression) that cover the EIO
unwind path. This patch closes the remaining escape hatch: the
`logger.debug` call at the top of `_signal_handler` itself. Wrap it in a
bare `try/except Exception: pass` so logging can never raise through a
signal handler.
Observed in the wild: debug report on 0.12.0 (commit 8163d371) shows the
exact stack — KeyError: 10 at logging/__init__.py:1742 inside the
signal handler's `logger.debug`, followed by the EIO cascade from
prompt_toolkit's emergency flush.
Tests: adds `TestSignalHandlerLoggingRace` to
`tests/hermes_cli/test_suppress_eio_on_interrupt.py` with 6 new cases:
- normal path still raises KeyboardInterrupt
- KeyError(10) from logger.debug does not escape
- any Exception from logger.debug is swallowed
- agent.interrupt still fires when logger.debug raises
- agent.interrupt raising also does not escape
- BaseException (SystemExit) is NOT swallowed — guard uses `except Exception`
deliberately so real shutdown signals still propagate
Closes #13710 regression.
* fix: harden install.sh against inherited Python env leakage
* chore: AUTHOR_MAP entry for adybag14-cyber
* fix(ui): reduce status-line jitter while scrolling
* fix(tui): stabilize FaceTicker elapsed width to prevent composer drift
* fix(tui): restore gap before duration when verb segment is hidden
The verb-padding change dropped the leading space in durationSegment on
the assumption that the verb's trailing pad always supplies the gap. But
the unicode spinner style sets showVerb=false, making verbSegment an
empty string — in that mode the output would become `{frame}· {duration}`
with no separator. Add the space back; harmless when the verb segment
is shown (its trailing pad still provides the gap).
* chore(release): map liuguangyong@hellobike -> liuguangyong93
* fix(kanban): reset code element background inside board
The Nous DS globals.css applies a global rule:
code { background: var(--midground); color: var(--background); }
This paints an opaque cream/yellow fill on every <code> element,
which hides text in the kanban drawer's event-payload, run-meta,
and worker-log panes (all rendered as <code>).
Fix: scope a reset inside .hermes-kanban so <code> elements inherit
their parent's color and stay transparent.
* fix(cli): recover classic CLI output after resize
* feat(skills): add shop-app personal shopping assistant (optional) (#20702)
Port Shop.app's upstream SKILL.md (https://shop.app/SKILL.md) into
optional-skills/productivity/shop-app/ with Hermes-native adaptations:
- Proper Hermes frontmatter (name, description<=60 chars, version,
author, license, prerequisites, metadata.hermes tags + related_skills
+ homepage + upstream)
- Swap Shop.app's bespoke 'message()' tool references for Hermes
conventions: gateway adapters handle platform formatting, so the
skill just writes markdown (no Telegram/WhatsApp/iMessage sections
referencing a tool Hermes doesn't ship)
- Name Hermes tools where relevant: curl via 'terminal', HTML policy
pages via 'web_extract', try-on via 'image_generate'
- Reframe session state as 'hold in your reasoning context for this
conversation only' and forbid writing tokens to .env / disk — matches
Hermes ephemeral-memory discipline
- Drop NO_REPLY convention (Shop-app-runtime specific)
- Trigger-first description so the skill loader picks it up when the
user wants to search products, track orders, returns, or reorder
* feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709)
Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.
Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:
1. Each working directory got its own full shadow git repo — no object
dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
/rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
without ever asking for /rollback.
Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.
Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
per-project metadata (store/projects/<hash>.json with workdir +
created_at + last_touch). On first v2 init, any pre-v2 per-directory
shadow repos are auto-migrated into legacy-<timestamp>/ so the new
store starts clean. _prune() now actually rewrites the per-project ref
to the last max_snapshots commits and runs git gc --prune=now. New
_enforce_size_cap() drops oldest commits round-robin across projects
when the store exceeds max_total_size_mb. _drop_oversize_from_index()
filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
(status / list / prune / clear / clear-legacy) for managing the store
outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
*.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
coverage for the pre-v2 prune path (per-directory shadow repos under
base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
shared store, new defaults, migration, and the new CLI;
reference/cli-commands.md documents 'hermes checkpoints'.
E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
--retention-days 0' removes its ref, index, and metadata; GC reclaims
the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
CLI without an agent running.
Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.
* fix(cli): catch OSError in _resolve_attachment_path to prevent ENAMETOOLONG dropping long slash commands
When the user pastes a long slash command like \`/goal <long prose>\` into
\`hermes chat\`, the input flows into \`_detect_file_drop()\`, whose
\`starts_like_path\` prefilter accepts anything starting with \`/\` and
forwards it to \`_resolve_attachment_path()\`. That helper calls
\`Path.exists()\` which invokes \`os.stat()\`, which raises
\`OSError(errno=ENAMETOOLONG)\` — 63 on macOS, 36 on Linux — when the
candidate exceeds NAME_MAX (typically 255 bytes).
The OSError propagates up to the broad \`except Exception\` in
\`process_loop\` (cli.py:11798), gets logged at WARNING level, and the
user's input is silently dropped. From the user's POV the chat prompt
hangs — the only signal is in agent.log:
WARNING cli: process_loop unhandled error (msg may be lost):
[Errno 63] File name too long: "/goal Drive the space board..."
This affects any slash command with prose-length arguments — \`/goal\`
in particular but also \`/skill\`, \`/cron\`, custom user commands.
Fix: wrap the \`exists()\`/\`is_file()\` calls in try/except OSError so
structurally-invalid path candidates cleanly return None. The slash-
command dispatch path downstream (cli.py:11718) then handles the
input correctly.
Tests: two new regression cases in test_cli_file_drop.py cover the
original \`/goal\` reproducer and a synthetic long path. All 35 file-
drop tests pass.
Reproducer (without the fix):
python -c "from cli import _detect_file_drop;
_detect_file_drop('/goal ' + 'a'*300)"
→ OSError: [Errno 63] File name too long
* chore(release): map cleo@edaphic.xyz → curiouscleo
Follow-up to the salvaged fix for /goal ENAMETOOLONG drop — adds
AUTHOR_MAP entry so the release script resolves the commit author to
the correct GitHub user.
* docs(wsl2): expand Windows (WSL2) guide — filesystem, networking, services, pitfalls (#20748)
Replaces the 22-line stub with a ~320-line guide covering the parts of the
Windows/WSL2 split that specifically affect Hermes users:
- Why WSL2 (and not native Windows)
- Install: distro choice, WSL1→2, systemd via /etc/wsl.conf
- Filesystem boundary: /mnt/c vs \\wsl$, perf/perms/watchers/case,
wslpath/wslview, CRLF + git core.autocrlf, clone-where guidance
- Networking in both directions:
- WSL → Windows services: links to the canonical WSL2 Networking section
in integrations/providers.md (mirrored mode, NAT + host IP, bind addr,
firewall) instead of duplicating
- Windows/LAN → Hermes in WSL: mirrored vs NAT, netsh portproxy one-liner,
firewall rule, webhook tunneling pointer
- Long-running services: systemd gateway + Task Scheduler wsl.exe --exec
'sleep infinity' to keep the VM alive at login
- GPU passthrough: NVIDIA works, AMD/Intel out of matrix
- Common pitfalls: connection refused, /mnt/c slowness, CRLF ^M,
UNC warnings, post-sleep clock drift, mirrored-mode DNS with VPN,
PATH, Defender scanning, VHDX disk reclaim
All internal links use site-absolute /docs/... form (matches the rest of
user-guide/); all seven link targets verified to exist.
* docs: pluggable surfaces coverage — model-provider guide, full plugin map, opt-in fix (#20749)
* docs(providers): add model-provider-plugin authoring guide + fix stale refs
New docs:
- website/docs/developer-guide/model-provider-plugin.md — full authoring
guide (directory layout, minimal example, ProviderProfile fields,
overridable hooks, user overrides, api_mode selection, auth types,
testing, pip distribution)
- Wired into website/sidebars.ts under 'Extending'
- Cross-references added in:
- guides/build-a-hermes-plugin.md (tip block)
- developer-guide/adding-providers.md
- developer-guide/provider-runtime.md
User guide:
- user-guide/features/plugins.md: Plugin types table grows from 3 to 4
with 'Model providers' row
Stale comment cleanup (providers/*.py → plugins/model-providers/<name>/):
- hermes_cli/main.py:_is_profile_api_key_provider docstring
- hermes_cli/doctor.py:_build_apikey_providers_list docstring
- hermes_cli/auth.py: PROVIDER_REGISTRY + alias auto-extension comments
- hermes_cli/models.py: CANONICAL_PROVIDERS auto-extension comment
AGENTS.md:
- Project-structure tree: added plugins/model-providers/ row
- New section: 'Model-provider plugins' explaining discovery, override
semantics, PluginManager integration, kind auto-coerce heuristic
Verified: docusaurus build succeeds, new page renders, all 3 cross-links
resolve. 347/347 targeted tests pass (tests/providers/,
tests/hermes_cli/test_plugins.py, tests/hermes_cli/test_runtime_provider_resolution.py,
tests/run_agent/test_provider_parity.py).
* docs(plugins): add 'pluggable interfaces at a glance' maps to plugins.md + build-a-hermes-plugin
Devs landing on either the user-guide plugin page or the build-a-plugin
guide now get an upfront table of every distinct pluggable surface with
a link to the right authoring doc. Previously they'd have to read the
full general-plugin guide to discover that model providers / platforms
/ memory / context engines are separate systems.
user-guide/features/plugins.md:
- New 'Pluggable interfaces — where to go for each' section below the
existing 4-kinds table
- 10 rows covering every register_* surface (tool, hook, slash command,
CLI subcommand, skill, model provider, platform, memory, context
engine, image-gen)
- Explicit note: TTS/STT are NOT plugin-extensible yet — documented
with a pointer to the current config.yaml 'command providers' pattern
and a note that register_tts_provider()/register_stt_provider() may
come later
guides/build-a-hermes-plugin.md:
- New :::info 'Not sure which guide you need?' map at the top so devs
see all pluggable interfaces before investing in this 737-line
general-plugin walkthrough
- Existing bottom :::tip expanded to include platform adapters alongside
model/memory/context plugins
Verified:
- All 8 cross-doc links in the new plugins.md table resolve in a
docusaurus build (SUCCESS, no new broken links)
- TTS link corrected (features/voice → features/tts; latter exists)
- Pre-existing broken links/anchors (cron-script-only, llms.txt,
adding-platform-adapters#step-by-step-checklist) are unchanged
* docs(plugins): correct TTS/STT pluggability \u2014 they ARE plugins (command-providers)
Previous commit incorrectly said TTS/STT 'aren't plugin-extensible'. They
are, via the config-driven command-provider pattern \u2014 any CLI that reads
text and writes audio (or vice versa for STT) is automatically a plugin
with zero Python. The tts.md docs cover this extensively and I missed it.
plugins.md:
- TTS row: 'Config-driven (not a Python plugin)', points at
tts.md#custom-command-providers
- STT row: points at tts.md#voice-message-transcription-stt (STT docs
live in tts.md despite the filename)
- Expanded note: TTS/STT use config-driven shell-command templates as
their plugin surface (full tts.providers.<name> registry for TTS;
HERMES_LOCAL_STT_COMMAND escape hatch for STT)
- Any CLI that reads/writes files is automatically a plugin \u2014 no Python
register_* API needed
- Future register_tts_provider()/register_stt_provider() hooks mentioned
as nice-to-have for SDK/streaming cases, not as the primary story
build-a-hermes-plugin.md:
- Same map update: TTS/STT rows explicit, footer note corrected
Verified:
- tts.md anchors (custom-command-providers, voice-message-transcription-stt)
exist and resolve in docusaurus build (SUCCESS, no new broken links)
* docs(plugins): expand pluggable interfaces table with MCP / event hooks / shell hooks / skill taps
Broadened the scope beyond Python register_* hooks. Hermes has MULTIPLE
plugin-style extension surfaces; they're now all in one table instead of
being scattered across feature docs.
Added rows for:
- **MCP servers** — config.yaml mcp_servers.<name> auto-registers external
tools from any MCP server. Huge extensibility surface, previously not
linked from the plugin map.
- **Gateway event hooks** — drop HOOK.yaml + handler.py into
~/.hermes/hooks/<name>/ to fire on gateway:startup, session:*, agent:*,
command:* events. Separate from Python plugin hooks.
- **Shell hooks** — hooks: block in config.yaml runs shell commands on
events (notifications, auditing, etc.).
- **Skill sources (taps)** — hermes skills tap add <repo> to pull in new
skill registries beyond the built-in sources.
Both docs updated:
- user-guide/features/plugins.md: table column renamed to 'How' (mixes
Python API + config-driven + drop-in-dir surfaces accurately)
- guides/build-a-hermes-plugin.md: :::info map at top mirrors the new
surfaces with a forward-link to the consolidated table
Note block rewritten: instead of singling out TTS/STT as the 'different
style' exception, now honestly describes that Hermes deliberately
supports three plugin styles — Python APIs, config-driven commands, and
drop-in manifest directories — and devs should pick the one that fits
their integration.
Not included (considered and rejected):
- Transport layer (register_transport) — internal, not user-facing
- Tool-call parsers — internal, VLLM phase-2 thing
- Cloud browser providers — hardcoded registry, not drop-in yet
- Terminal backends — hardcoded if/elif, not drop-in yet
- Skill sources (the ABC) — hardcoded list, only taps are user-extensible
Verified:
- All 5 new anchors resolve (gateway-event-hooks, shell-hooks, skills-hub,
custom-command-providers, voice-message-transcription-stt)
- Docusaurus build SUCCESS, zero new broken links
- Same 3 pre-existing broken links on main (cron-script-only, llms.txt,
adding-platform-adapters#step-by-step-checklist)
* docs(plugins): cover every pluggable surface in both the overview and how-to
Both plugins.md and build-a-hermes-plugin.md now cover every extension
surface end-to-end \u2014 general plugin APIs, specialized plugin types,
config-driven surfaces \u2014 with concrete authoring patterns for each.
plugins.md:
- 'What plugins can do' table grows from 9 rows (general ctx.register_*
only) to 14 rows covering register_platform, register_image_gen_provider,
register_context_engine, MemoryProvider subclass, register_provider
(model). Each row links to its full authoring guide.
- New 'Plugin sub-categories' section under Plugin Discovery explains
how plugins/platforms/, plugins/image_gen/, plugins/memory/,
plugins/context_engine/, plugins/model-providers/ are routed to
different loaders \u2014 PluginManager vs the per-category own-loader
systems.
- Explicit mention of user-override semantics at
~/.hermes/plugins/model-providers/ and ~/.hermes/plugins/memory/.
build-a-hermes-plugin.md:
- New '## Specialized plugin types' section (5 sub-sections):
- Model provider plugins \u2014 ProviderProfile + plugin.yaml example,
auto-wiring summary, link to full guide
- Platform plugins \u2014 BasePlatformAdapter + register_platform() skeleton
- Memory provider plugins \u2014 MemoryProvider subclass example
- Context engine plugins \u2014 ContextEngine subclass example
- Image-generation backends \u2014 ImageGenProvider + kind: backend example
- New '## Non-Python extension surfaces' section (5 sub-sections):
- MCP servers \u2014 config.yaml mcp_servers.<name> example
- Gateway event hooks \u2014 HOOK.yaml + handler.py example
- Shell hooks \u2014 hooks: block in config.yaml example
- Skill sources (taps) \u2014 hermes skills tap add example
- TTS / STT command templates \u2014 tts.providers.<name> with type: command
- Distribute via pip / NixOS promoted from ### to ## (they were orphaned
after the reorganization)
Each specialized / non-Python section has a concrete, copy-pasteable
example plus a 'Full guide:' link to the authoritative doc. Devs arriving
at the build-a-hermes-plugin guide now see every extension surface at
their disposal, not just the general tool/hook/slash-command surface.
Verified:
- Docusaurus build SUCCESS, zero new broken links
- All new cross-links (developer-guide/model-provider-plugin,
adding-platform-adapters, memory-provider-plugin, context-engine-plugin,
user-guide/features/mcp, skills#skills-hub, hooks#gateway-event-hooks,
hooks#shell-hooks, tts#custom-command-providers,
tts#voice-message-transcription-stt) resolve
- Same 3 pre-existing broken links on main (cron-script-only, llms.txt,
adding-platform-adapters#step-by-step-checklist)
* docs(plugins): fix opt-in inconsistency — not every plugin is gated
The 'Every plugin is disabled by default' statement was wrong. Several
plugin categories intentionally bypass plugins.enabled:
- Bundled platform plugins (IRC, Teams) auto-load so shipped gateway
channels are available out of the box. Activation per channel is via
gateway.platforms.<name>.enabled.
- Bundled backends (plugins/image_gen/*) auto-load so the default
backend 'just works'. Selection via <category>.provider config.
- Memory providers are all discovered; one is active via memory.provider.
- Context engines are all discovered; one is active via context.engine.
- Model providers: all 33 discovered at first get_provider_profile();
user picks via --provider / config.
The plugins.enabled allow-list specifically gates:
- Standalone plugins (general tools/hooks/slash commands)
- User-installed backends
- User-installed platforms (third-party gateway adapters)
- Pip entry-point backends
Which matches the actual code in hermes_cli/plugins.py:737 where the
bundled+backend/platform check bypasses the allow-list.
Rewrote '## Plugins are opt-in' to:
- Retitle to 'Plugins are opt-in (with a few exceptions)'
- Narrow opening claim to 'General plugins and user-installed backends
are disabled by default'
- Added 'What the allow-list does NOT gate' subsection with a full
table of which bypass the gate and how they're activated instead
- Fixed migration section wording (bundled platform/backend plugins
never needed grandfathering)
Verified: docusaurus build SUCCESS, zero new broken links.
* change: enable ruff/ty
* feat(ci): add typecheck (warnings only in CI)
* feat(skills/linear): add Documents support + Python helper script (#20752)
* feat(skills/linear): add Documents support + Python helper script
The bundled Linear skill (PR #1230) covered issues, projects, teams, and
workflow states via curl. It had no coverage for Linear's Documents API,
so fetching an RFC/doc from a linear.app URL required hand-writing
GraphQL against an underdocumented schema.
Adds:
- Documents section in SKILL.md explaining slugId extraction from URLs,
the contentState (markdown) vs contentState (ProseMirror) split, and
four canonical curl examples (fetch by slugId, fetch by UUID, list
recent, title-search).
- scripts/linear_api.py — stdlib-only Python CLI wrapping the most
common operations (whoami, list-teams, list/get/search/create/update
issues, add-comment, update-status, list/get/search documents, raw
GraphQL passthrough). Zero deps, reads LINEAR_API_KEY from env.
Auth header quirk (personal key takes bare $LINEAR_API_KEY, no Bearer
prefix) is already documented in the skill.
Found during RFC review: the existing skill's lack of document support
forced falling back to the browser (which hit Linear's login wall).
Also fixes a schema gotcha — the Document field is `contentState`, not
`contentData` (which returns 400).
Tested end-to-end against the production API:
python3 linear_api.py whoami
python3 linear_api.py get-document 38359beef67c
Both return expected payloads.
* fix(skills/linear): point LINEAR_API_KEY setup to the correct page
The org-level Settings > API page (/settings/api) only shows OAuth apps
and workspace-member keys. Personal API keys live under Account,
Security, access (/settings/account/security). Update both the setup
link in config.py (shown during hermes setup) and the setup step in
SKILL.md so users land on the page that can create a personal key.
* docs(plugins): close the gaps \u2014 image-gen-provider-plugin guide + publishing a skill tap (#20800)
Two pluggable surfaces were mentioned in the interfaces map without a
real authoring guide behind them:
1. **Image-gen backends** — only had 'See bundled examples' pointers.
Now a full developer-guide/image-gen-provider-plugin.md (270 lines)
mirroring the memory/context/model provider docs:
- How discovery works, directory structure, plugin.yaml
- ImageGenProvider ABC with every overridable method
(name, display_name, is_available, list_models, default_model,
get_setup_schema, generate)
- Full authoring walkthrough with a working MyBackendImageGenProvider
- Response-format reference (success_response / error_response)
- Handling b64 vs URL output (save_b64_image helper)
- User overrides at ~/.hermes/plugins/image_gen/<name>/
- Testing recipe + pip distribution
- Reference examples (openai, openai-codex, xai)
2. **Skill taps** — features/skills.md mentioned the CLI commands but
never explained the repo contract for publishing a tap. Added
'Publishing a custom skill tap' section under Skills Hub covering:
- Repo layout (skills/<name>/SKILL.md by default)
- Minimal working example
- Non-default path configuration (taps.json)
- Installing individual skills without subscribing
- Trust-level handling
- Full tap management CLI + in-session /skills tap commands
Wired into:
- website/sidebars.ts: image-gen-provider-plugin added to Extending group
- website/docs/user-guide/features/plugins.md: pluggable interfaces
table + 'What plugins can do' table now link to the real guides
instead of 'See bundled examples'
- website/docs/guides/build-a-hermes-plugin.md: top info map and
inline sub-sections updated, 'Full guide:' line added to
image-gen block, tap section mentions publishing
Verified: docusaurus build SUCCESS, new page renders at
/docs/developer-guide/image-gen-provider-plugin, anchor
#publishing-a-custom-skill-tap resolves from plugins.md +
build-a-hermes-plugin.md. Pre-existing zh-Hans broken links unchanged.
* fix(opencode-go): keep users on opencode-go instead of hijacking to native providers (#20802)
OpenCode Go and OpenCode Zen are flat-namespace model resellers — their
/v1/models returns bare IDs (deepseek-v4-flash, minimax-m2.7), and the
inference API rejects vendor-prefixed names with HTTP 401 'Model not
supported'. Two bugs fixed:
1. `switch_model` in hermes_cli/model_switch.py was silently switching the
user off opencode-go to native deepseek when they typed
`/model deepseek-v4-flash`. Step d found the model in opencode-go's live
catalog, but step e (detect_provider_for_model) still ran and matched
the bare name against deepseek's static catalog. Fix: track whether
the live catalog resolved it; skip step e when it did.
2. `normalize_model_for_provider` in hermes_cli/model_normalize.py only
stripped the exact `opencode-zen/` prefix, leaving arbitrary vendor
prefixes like `minimax/minimax-m2.7` (commonly copied from aggregator
slugs into fallback_model configs) intact — causing HTTP 401s when
the fallback chain activated. Fix: opencode-go/opencode-zen strip ANY
leading vendor prefix because their APIs are flat-namespace.
Tests: 11 new cases in tests/hermes_cli/test_opencode_go_flat_namespace.py
covering both normalization (prefix stripping, regression guards for
opencode-zen Claude hyphenation and openrouter vendor-prepending) and
switch_model (bare-name resolution on opencode-go's live catalog must
not trigger cross-provider hijack).
Reported by @Ufonik via Discord; Kimi K2.6 always worked because moonshotai
has no overlapping entry in a native provider's static catalog. Deepseek
and minimax failed because their v4/v2.7 names existed in the native
deepseek/minimax catalogs.
* feat(dashboard): add 'default-large' built-in theme with 18px base size (#20820)
Same Hermes Teal palette as the default theme, but with baseSize 18px,
lineHeight 1.65, and spacious density so the whole dashboard scales up.
Gives users a one-click bigger-text preset and a copyable reference for
authoring custom YAML themes with their own typography settings.
* refactor(web): per-capability backend selection for search/extract split
Introduce the foundation for independently selecting web search and
extract backends — enabling future combinations like SearXNG for
search + Firecrawl for extract.
Architecture:
- tools/web_providers/base.py: WebSearchProvider and WebExtractProvider
ABCs with normalized result contracts (mirrors CloudBrowserProvider)
- tools/web_tools.py: _get_search_backend() and _get_extract_backend()
read per-capability config keys, fall through to shared web.backend
- hermes_cli/config.py: web.search_backend and web.extract_backend in
DEFAULT_CONFIG (empty = inherit from web.backend)
Behavioral change:
- web_search_tool() now dispatches via _get_search_backend()
- web_extract_tool() now dispatches via _get_extract_backend()
- When per-capability keys are empty (default), behavior is identical
to before — _get_search_backend() falls through to _get_backend()
This is purely structural — no new backends are added. SearXNG and
other search-only/extract-only providers can now be added as simple
drop-in modules in follow-up PRs.
12 new tests, 49 existing tests pass with zero regressions.
Ref: #19198
* feat(web): add SearXNG as a native search-only backend
Adds SearXNG as a free, self-hosted web search provider. SearXNG is a
privacy-respecting metasearch engine that requires no API key — just a
running instance and SEARXNG_URL pointing at it.
## What this adds
- `tools/web_providers/searxng.py` — `SearXNGSearchProvider` implementing
`WebSearchProvider` (search only; no extract capability)
- `_is_backend_available("searxng")` — gates on SEARXNG_URL
- `_get_backend()` — accepts "searxng" as a configured value; adds it to
auto-detect candidates (lower priority than paid services)
- `web_search_tool` — dispatches to SearXNG when it is the active backend
- `check_web_api_key()` — includes SearXNG in availability check
- `OPTIONAL_ENV_VARS["SEARXNG_URL"]` — registered with tools=["web_search"]
- `tools_config.py` — SearXNG appears in the `hermes tools` provider picker
- `nous_subscription.py` — `direct_searxng` detection, web_active / web_available
- `setup.py` — SEARXNG_URL listed in the missing-credential hint
- 23 tests covering: is_configured, happy-path search, score sorting, limit,
HTTP/request errors, _is_backend_available, _get_backend, check_web_api_key
## Config
```yaml
# Use SearXNG for search, any paid provider for extract
web:
search_backend: "searxng"
extract_backend: "firecrawl"
# Or: SearXNG as the sole backend (web_extract will use the next available)
web:
backend: "searxng"
```
SearXNG is search-only — it does not implement WebExtractProvider. Users
who only configure SEARXNG_URL get web_search available; web_extract falls
back to the next available extract provider (or is unavailable if none).
Closes #19198 (Phase 2 Task 4 — SearXNG provider)
Ref: #11562 (original SearXNG PR)
* docs+skill: add searxng-search optional skill and documentation
Closes the remaining gaps from PR #11562 that weren't covered by the
core SearXNG integration landed in #20823.
- optional-skills/research/searxng-search/ — installable skill with
SKILL.md (curl-based usage, category support, Python example) and
searxng.sh helper script for health checks and instance queries
- website/docs/user-guide/configuration.md — SearXNG added to the
Web Search Backends section (5 backends, backend table, per-capability
split config example, correct search-only note)
- website/docs/reference/environment-variables.md — SEARXNG_URL row
- website/docs/reference/optional-skills-catalog.md — searxng-search entry
The core SearXNG code, OPTIONAL_ENV_VARS, hermes tools picker, and tests
were already on main via #20823. This commit is purely additive docs +
the optional skill scaffold.
Credits from #11562 salvage:
@w4rum — original _searxng_search structure
@nathansdev — tools_config.py integration
@moyomartin — category support and result formatting
@0xMihai — config/env var approach
@nicobailon — skill and documentation structure
@searxng-fan — error handling patterns
@local-first — self-hosted-first philosophy and docs
* docs: add Web Search + Extract feature page with SearXNG setup guide
* fix(feishu): keep topic replies in threads
Route Feishu topic progress, status, approval, stream, and fallback messages through threaded replies by preserving the originating message id as the reply target. Add regressions for tool progress topic metadata and Feishu metadata-driven reply routing.
* chore: follow-up cleanup for Feishu topic thread fix
- Remove dead metadata.get('reply_to') fallback in _send_raw_message;
nothing in the codebase ever sets 'reply_to' inside a metadata dict —
the key only appears as a top-level send_voice() keyword argument
- Simplify _status_thread_metadata construction in run.py to use a
single dict literal instead of create-then-mutate pattern; the
or-{} guard was dead since source.thread_id implies _progress_thread_id
is also set for Feishu
- Add yuqian@zmetasoft.com to AUTHOR_MAP for contributor attribution
* fix(kanban): avoid fragile failure-column renames
* chore: follow-up cleanup for Kanban migration fix
- Expand migration comment to name the primary failure mode (missing
column OperationalError from #20842) ahead of the secondary SQLite
schema-reparse concern; also document the stale-cols-snapshot invariant
- Add clarifying comments on from_row() legacy fallback branches noting
they are belt-and-suspenders dead code post-migration
- Add task_events comment in existing test explaining why the table is
required by the migrator
- Add test_legacy_migration_no_legacy_columns_at_all: Scenario A —
explicitly asserts the exact #20842 crash no longer occurs and that
consecutive_failures defaults to 0 on a DB that never had spawn_failures
- Add test_legacy_migration_both_columns_already_present: Scenario D —
asserts the migration is a no-op when both columns already exist,
preserving the existing counter value
* fix(tui): bound virtual history offset searches
* ci(docker): don't cancel overlapping builds, guard :latest
Switch top-level concurrency to cancel-in-progress=false so every push
to main gets its own SHA-tagged image published — no more discarded
builds when commits land back-to-back.
Guard the :latest tag with a second job that has its own concurrency
group with cancel-in-progress=true plus a git-ancestor check against
the revision label on the current :latest. Together these guarantee
:latest only ever moves forward in history: a slower run whose commit
isn't a descendant of the current :latest refuses to clobber it, and
a newer push mid-way through the move-latest job preempts the older
one before it can retag.
- Every main push publishes nousresearch/hermes-agent:sha-<commit>
with an org.opencontainers.image.revision label embedded.
- move-latest job reads that label off :latest, runs merge-base
--is-ancestor, and only retags (via buildx imagetools create,
registry-side, no rebuild) if our commit strictly descends.
- fetch-depth bumped to 1000 so merge-base has the history it needs.
- Release tag flow unchanged (unique tag, no race).
* docs(tool-gateway): rewrite as pitch-first marketing page (#20827)
Previous version read like internal API docs \u2014 leading with env var tables,
config YAML, and 'precedence' rules before ever explaining the product.
Complete rewrite inverts the structure so readers see value first,
mechanics last.
Structure now:
- Lede: 'One subscription. Every tool built in.' + pitch paragraph
- CTA: subscribe/manage button styled as a real call-to-action
- What's included: emoji-led table with expanded descriptions per tool.
Image gen lists all 9 models by name (FLUX 2 Klein/Pro, Z-Image Turbo,
Nano Banana Pro, GPT Image 1.5/2, Ideogram V3, Recraft V4 Pro, Qwen)
- Why it's here: value bullets \u2014 one bill, one signup, one key, same
quality, bring-your-own anytime
- Get started: two-command flow (hermes model \u2192 hermes status)
- Eligibility: paid-tier note with upgrade link
- Mix and match: three realistic usage patterns
- Using individual image models: ID reference table for power users
- --- separator ---
- Configuration reference (demoted): use_gateway flag, disabling,
self-hosted gateway env vars moved below the fold where they belong
- FAQ: streamlined, removed redundant content
Fact-checked against code:
- 9 FAL models confirmed from tools/image_generation_tool.py FAL_MODELS
- Status section output verified against hermes_cli/status.py
- Portal subscription URL preserved
- Self-hosted env vars (TOOL_GATEWAY_DOMAIN etc.) kept accurate
Verified: docusaurus build SUCCESS, page renders, no new broken links.
* fix(auth): fall back to global-root auth.json for providers missing in profile
Profile processes (kanban workers, cron subprocesses, delegated subagents)
read the profile's auth.json only. If a provider was authenticated at the
global root but not inside the profile, the profile's credential_pool
comes back empty and the process fails with 'No LLM provider configured'
— even though the credentials are sitting in ~/.hermes/auth.json. #18594
propagated HERMES_HOME correctly, which is what surfaced this: workers
now land in the right profile, and the profile turns out to shadow global
with no fallback.
Semantics (read-only, per-provider shadowing):
* Profile has any entries for provider X → use profile only (global ignored).
* Profile has zero entries for provider X → fall back to global.
* Writes (write_credential_pool, _save_auth_store) still target the profile.
* Classic mode (HERMES_HOME == global root) skips the fallback entirely —
_global_auth_file_path() returns None.
Also mirrors the fallback in get_provider_auth_state so OAuth singletons
(nous, minimax-oauth, openai-codex, spotify) inherit cleanly — the Nous
shared-token store (PR #19712) remains the authoritative path for Nous
OAuth rotation, this just makes the read side consistent with it.
Seat belt: _load_global_auth_store() refuses to read the real user's
~/.hermes/auth.json under PYTEST_CURRENT_TEST even when HERMES_HOME points
to a profile-shaped path. Guard uses $HOME (stable across fixtures) rather
than Path.home() (which fixtures often monkeypatch to a tmp root).
Reported by @SeedsForbidden on Twitter as the credential_pool shadowing
follow-up to the #18594 fix.
* feat(gateway): per-platform gateway_restart_notification flag
Adds an opt-out toggle on PlatformConfig that gates both restart
lifecycle pings: the "♻ Gateway restarted" message sent to the chat
that issued /restart, and the "♻️ Gateway online" home-channel
startup notification. Defaults to True so existing deployments are
unaffected.
The motivating split is operator vs. end-user surfaces: a back-channel
like Telegram should keep these pings, while a Slack workspace shared
with end users should not surface gateway lifecycle noise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(gateway): also gate pre-restart "Gateway restarting" notification
Extend the gateway_restart_notification flag to cover
_notify_active_sessions_of_shutdown — the message that fires just
before drain ("⚠️ Gateway restarting — Your current task will be
interrupted. Send any message after restart and I'll try to resume
where you left off.") sent to active sessions and home channels.
Same operator/end-user reasoning: on a Slack workspace shared with
end users, "Gateway restarting" reads as "the bot is broken" — the
operator should be able to suppress it consistently with the other
two lifecycle pings rather than having a partial opt-out.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore: add guillaumemeyer to AUTHOR_MAP
For cherry-picked commits in PR #20801.
* fix(cli): submit LF enter in thin PTYs (#20896)
* fix(tui): refresh virtual offsets after row resize (#20898)
* fix(tui): honor skin highlight colors (#20895)
* fix(tui): steady transcript scrollbar (#20917)
* fix(tui): steady transcript scrollbar
Keep the visible scrollbar tied to committed viewport position while virtual history can still prefetch against pending scroll targets, and preserve drag grab offset synchronously for native-feeling scrollbar drags.
* fix(tui): smooth precision wheel scroll
Replace the opt-scroll throttle with frame-sized coalescing so modifier wheel gestures stay line-precise without stepping.
* fix(tui): restore voice push-to-talk parity (#20897)
* fix(tui): restore classic CLI voice push-to-talk parity
(cherry picked from commit 93b9ae301bb89f5b5e01b4b9f8ac91ffa74fbd9d)
* fix(tui): harden voice push-to-talk stop flow
Address review feedback from PR #16189 by stopping the active recorder before background transcription, documenting single-shot voice capture, and covering the TUI gateway flags with regression tests.
* fix(tui): preserve silent voice strike tracking
Keep single-shot voice recording's no-speech counter alive across starts so the TUI can still emit the three-strikes auto-disable event, and bind the auto-restart state at module scope for type checking.
* fix(tui): clean up voice stop failure path
Address follow-up review by naming the TUI flow as single-shot push-to-talk and cancelling the recorder when forced stop cannot produce a WAV.
* fix(tui): report busy voice capture starts
Return explicit start state from the voice wrapper so the TUI gateway does not report recording while forced-stop transcription is still cleaning up.
* fix(tui): handle busy voice record responses
Apply the gateway busy status immediately in the TUI and route forced-stop voice events to the session that sent the stop request.
* fix(tui): clear voice recording on null response
Treat a null voice.record RPC result as a failed optimistic start so the REC badge cannot stick after gateway-side errors.
* fix(tui): count silent manual voice stops
Preserve single-shot voice no-speech strikes through forced stop transcription so empty push-to-talk captures still trigger the three-strikes guard.
---------
Co-authored-by: Montbra <montbra@gmail.com>
* fix(gateway): don't dead-end setup wizard when only system-scope unit is installed
The setup wizard dropped non-root users at a bare shell prompt when
trying to start a system-scope gateway service. Previously
_require_root_for_system_service called sys.exit(1), which the
wizard's `except Exception` guards cannot catch (SystemExit is a
BaseException). Users with a pre-existing /etc/systemd/system unit
(e.g. from an earlier `sudo hermes setup` run) hit this whenever
they re-ran `hermes setup` as a regular user.
- Convert _require_root_for_system_service to raise a typed
SystemScopeRequiresRootError (RuntimeError subclass) instead of
sys.exit(1). The direct CLI path (`hermes gateway install|start|stop|
restart|uninstall` without sudo) still exits 1 cleanly via a new
catch at the top of gateway_command, matching the existing
UserSystemdUnavailableError pattern.
- Add _system_scope_wizard_would_need_root() pre-check and
_print_system_scope_remediation() helper. Both setup wizards
(hermes_cli/setup.py and hermes_cli/gateway.py::gateway_setup) now
detect the dead-end before prompting and print actionable guidance:
either `sudo systemctl start <service>` this time, or uninstall the
system unit and install a per-user one.
- Defense-in-depth: all 5 wizard prompt sites also catch
SystemScopeRequiresRootError and fall back to the remediation
helper if the pre-check is bypassed (race, etc.).
Tests: 12 new tests in TestSystemScopeRequiresRootError,
TestSystemScopeWizardPreCheck, TestSystemScopeRemediationOutput, and
TestGatewayCommandCatchesSystemScopeError covering the exception
contract, pre-check matrix (root vs non-root, system-only vs
user-present vs none vs explicit system=True), remediation output
for each action, and the direct-CLI exit-1 path.
* fix(gateway): wait for systemd restart readiness
* fix(discord): narrow rate-limit catch and move sync state under gateway/
Two follow-ups on top of helix4u's slash-command sync hardening:
- Only suppress exceptions that are actually Discord 429 rate limits
(discord.RateLimited, HTTPException with status 429, or a clearly
rate-limit-named duck type). Arbitrary failures that happen to expose
a retry_after attribute now re-raise to the outer handler instead of
silently swallowing a cooldown.
- Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root
stops collecting ad-hoc runtime files.
Added a test verifying unrelated exceptions don't get misclassified as
rate limits.
* docs(kanban): fix orchestrator skill setup instructions (#20958)
* docs(kanban): fix worker skill setup instructions too (#20960)
Follow-up to #20958. The worker skill section had the same stale
'hermes skills install devops/kanban-worker' command — kanban-worker
is also bundled, so that command fails with 'Could not fetch from any
source.'
Replace with bundled-skill verification + restore pattern, matching
the orchestrator section. Uses <your-worker-profile> placeholder since
assignees vary (researcher, writer, ops, linguist, reviewer, etc.)
rather than a single fixed 'worker' profile.
* feat(profiles): --no-skills flag for empty profile creation (#20986)
Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.
Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.
- create_profile() gains a `no_skills` param, mutually exclusive with
`--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
`{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
re-seeds normally.
6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.
* fix: route Telegram image documents through photo handling
* chore: AUTHOR_MAP entry for mrcoferland
* test(docker): align Dockerfile contract tests with simplified TUI flow
The Dockerfile dropped the manual `@hermes/ink` materialisation gymnastics
in favour of letting npm workspaces resolve the bundled package
naturally. Two contract tests still asserted the older flow:
`test_dockerfile_installs_tui_dependencies` required:
'ui-tui/packages/hermes-ink/package-lock.json' in dockerfile_text
…but the lockfile is no longer COPIED individually \u2014 the entire
`ui-tui/packages/hermes-ink/` tree is COPIED instead (the workspace
reference from `ui-tui/package.json` is `file:` so npm needs the
real source, not just a manifest stub).
`test_dockerfile_materializes_local_tui_ink_package` required a 7-clause
conjunction matching specific `rm -rf` / `npm install --omit=dev`
`--prefix node_modules/@hermes/ink` / `rm -rf .../react` invocations
that were stripped out when the workspace resolution was simplified.
Update the assertions to pin the *contract* the image actually has to
carry rather than the *exact shell incantations* the old flow used:
* TUI deps install: ui-tui/package.json + ui-tui/package-lock.json +
ui-tui/packages/hermes-ink/ tree are all COPIED, and an npm
install/ci step runs in ui-tui.
* Bundled hermes-ink: the workspace package source is COPIED (so
`await import('@hermes/ink')` resolves at runtime).
This keeps the spirit of #15012 / #16690 (zombie reaping + bundled
workspace materialisation must continue to work) without locking the
Dockerfile into one specific implementation flavour.
Validation:
$ pytest tests/tools/test_dockerfile_pid1_reaping.py -q
6 passed in 1.43s
No production code change. Fixes the two failures observed on `main`
(run 25250051126):
`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies`
`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package`
* test(update): patch isatty on real streams to fix xdist-flaky --yes tests
Two CI tests for the new `--yes` update flag (#18261) flaked under
`pytest-xdist` on Linux/Python 3.11 even though they passed every
local run on macOS/Python 3.14.4:
FAILED tests/hermes_cli/test_update_yes_flag.py
::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty
`AssertionError: assert <MagicMock 'input'>.called is False`
FAILED tests/hermes_cli/test_update_yes_flag.py
::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting
`AssertionError: assert <MagicMock '_restore_stashed_changes'>.called is False`
Captured stdout for the first failure shows `cmd_update` taking the
"Non-interactive session \u2014 skipping config migration prompt." branch
\u2014 i.e. the `sys.stdin.isatty() and sys.stdout.isatty()` check at
`hermes_cli/main.py:7118` evaluated to `False` despite the test doing:
with patch("hermes_cli.main.sys") as mock_sys:
mock_sys.stdin.isatty.return_value = True
mock_sys.stdout.isatty.return_value = True
The whole-module mock is fragile under xdist worker reuse: a sibling
test that imports `hermes_cli.main` first can leave another `sys`
reference resolved inside the function (re-import in a helper, etc.),
and the wholesale module replacement never gets consulted.
Switch to `patch.object(_sys.stdin, "isatty", return_value=True)` (and
the same for `stdout`). That patches the *attribute on the real stream
object* \u2014 every call site, no matter how it reached `sys.stdin`,
hits the patched method. Same fix applied to the stash-restore test
(it took the "non-TTY \u2192 skip restore prompt" branch for the same reason).
Validation:
$ pytest tests/hermes_cli/test_update_yes_flag.py -q
3 passed in 5.47s
No production code change. Fixes the two failures observed on `main`
(run 25250051126):
`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty`
`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting`
Refs: #18261 (added the `--yes` flag + these tests).
* fix(web): force light color-scheme on docs iframe
The Documentation tab embeds the public Hermes Agent docs site via an
<iframe>. On any system where the browser's prefers-color-scheme
resolves to dark — the default on macOS with system dark mode, and
common on Linux/Windows too — the docs body text rendered nearly
invisible against its own background.
Cause: Docusaurus intentionally leaves <html> and <body> transparent
and relies on the browser's Canvas color to fill the viewport. Inside
our iframe, the iframe element had bg-background (the dashboard's dark
canvas) AND inherited the dashboard's dark color-scheme, so the
browser set the iframe's Canvas to a dark value. Docusaurus's
transparent body exposed that dark Canvas, and the docs body text
(tuned for a light Canvas) became near-illegible. Affects every
built-in dashboard theme.
Fix: replace bg-background on the iframe with [color-scheme:light]
(spec-blessed cross-origin override of the inherited color-scheme;
forces the iframe's Canvas to light) and bg-white (belt-and-suspenders
fallback during the brief paint window before content loads). The
docs site's own theme toggle keeps working — Docusaurus stores its
choice in localStorage and applies opaque dark backgrounds to its
layout elements that cover the white Canvas we forced.
* fix(security): close TOCTOU window when saving MCP OAuth credentials
_write_json (the persistence helper used by HermesTokenStorage for both
tokens and client_info) created the temp file via Path.write_text and
only chmod'd it to 0o600 afterward. Between create and chmod the file
existed on disk at the process umask (commonly 0o644 = world-readable),
briefly exposing MCP OAuth access/refresh tokens to other local users.
Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR
mode so the file is created atomically at 0o600, plus tighten the parent
dir to 0o700 so siblings can't traverse to the creds file. The temp name
also gains a per-process random suffix to avoid collisions between
concurrent writers and stale leftovers from a crashed prior write.
Mirrors the fix shipped for agent/google_oauth.py in #19673.
Adds a regression test asserting the resulting file mode is 0o600 and
the parent directory is 0o700 (skipped on Windows where POSIX mode bits
aren't enforced).
* chore(release): add Gutslabs to AUTHOR_MAP for PR #21148 salvage
* test(update): teach restart-mocks about the post-update survivor sweep
Issue #17648 added a post-update SIGTERM-survivor sweep to `cmd_update`:
~3s after issuing graceful/SIGTERM restarts, the code re-queries
`find_gateway_pids` and SIGKILLs anything still alive. That's the
right fix for stuck-drain gateways in production, but it broke three
unit tests that assumed `find_gateway_pids` would keep returning the
same PIDs forever:
FAILED ::TestCmdUpdateLaunchdRestart::test_update_restarts_profile_manual_gateways
AssertionError: Expected 'kill' to not have been called. Called 1 times.
Calls: [call(12345, <Signals.SIGKILL: 9>)].
FAILED ::TestCmdUpdateLaunchdRestart::test_update_profile_manual_gateway_falls_back_to_sigterm
AssertionError: Expected 'kill' to have been called once. Called 2 times.
Calls: [call(12345, SIGTERM), call(12345, SIGKILL)].
FAILED ::TestServicePidExclusion::test_update_kills_manual_pid_but_not_service_pid
assert 2 == 1
manual_kills = [call(42999, SIGTERM), call(42999, SIGKILL)]
In each test `os.kill` is mocked, so the simulated PID never actually
exits \u2014 the sweep finds it again and escalates. The production code
is correct; the tests just need to model OS behaviour properly.
Two-test fix (profile-manual restart cases): use
`side_effect=[[12345], []]` so the first `find_gateway_pids` call
returns the live PID and the second (the sweep) returns nothing, as if
the OS had reaped the process.
Service-PID-exclusion fix: track which PIDs got killed in a closure
set, and exclude them on subsequent `fake_find` calls. `os.kill`
gets a `side_effect` that records the kill instead of swallowing it
silently. Now the sweep doesn't re-find the manual PID, no SIGKILL
escalation, `manual_kills == 1`.
Validation:
$ pytest tests/hermes_cli/test_update_gateway_restart.py -q
43 passed in 4.13s
No production code change. Fixes the three failures observed on `main`
(run 25250051126):
test_update_restarts_profile_manual_gateways
test_update_profile_manual_gateway_falls_back_to_sigterm
test_update_kills_manual_pid_but_not_service_pid
Refs: #17648 (post-update survivor sweep that the tests didn't model).
* fix(image-routing): expose attached image paths in native multimodal text part
In native image mode (vision-capable models like gpt-4o, claude-sonnet-4),
build_native_content_parts() previously emitted only the user's caption
plus image_url parts. The local file path of each attached image never
appeared in the conversation text, so the model could see the pixels but
had no string handle for tools that take image_url: str (custom MCP
tools, vision_analyze on a re-look, attach-to-tracker workflows).
The text-mode path already injects an equivalent hint via
Runner._enrich_message_with_vision ("...vision_analyze using image_url:
<path>..."). This brings native mode to parity by appending one
"[Image attached at: <path>]" line per successfully attached image to
the user-text part of the multimodal turn. Skipped (unreadable) paths
are NOT advertised, so the model is never told a non-existent file is
attached.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(optional-skills): port Anthropic financial-services skills as optional finance bundle (#21180)
Adds 7 optional skills under optional-skills/finance/ adapted from
anthropics/financial-services (Apache-2.0):
excel-author — openpyxl conventions: blue/black/green cells,
formulas over hardcodes, named ranges, balance
checks, sensitivity tables. Ships recalc.py.
pptx-author — python-pptx for model-backed decks (pitch,
IC memo, earnings note) that bind every number
to a source workbook cell.
dcf-model — institutional DCF (49KB skill): projections,
WACC, terminal value, Bear/Base/Bull scenarios,
5x5 sensitivity tables. Ships validate_dcf.py.
comps-analysis — comparable company analysis: operating metrics,
multiples, statistical benchmarking.
lbo-model — leveraged buyout: S&U, debt schedule, cash
sweep, exit multiple, IRR/MOIC sensitivity.
3-statement-model — fully-integrated IS/BS/CF with balance-check
plugs. Ships references/ for formatting,
formulas, SEC filings.
merger-model — accretion/dilution analysis for M&A.
All seven are optional (not active by default). Users install via
'hermes skills install official/finance/<skill>'.
Hermesification:
- Stripped every Office JS / Office Add-in / mcp__office__*
branch — skills assume headless openpyxl only.
- Replaced Cowork MCP data-source instructions with 'MCP first (via
native-mcp), fall back to web_search/web_extract against SEC EDGAR
and user-provided data'.
- Swapped Claude tool references (Bash, Read, Write, Edit, mcp__*)
for Hermes-native equivalents and Python library calls.
- Canonical Hermes frontmatter (name/description/version/author/
license/metadata.hermes.{tags,related_skills}).
- Descriptions tightened to 187-238 chars, trigger-first.
- Attribution preserved: author field credits 'Anthropic (adapted by
Nous Research)', license: Apache-2.0, each SKILL.md links back to
the upstream source directory.
Verification:
- All 7 discovered by OptionalSkillSource with source_id='official'
- Bundle fetch includes support files (scripts, references, troubleshooting)
- related_skills cross-refs all resolve within the bundle
- No Claude product / Cowork / Office JS / /mnt/skills leakage
remains in body text (bounded mentions only in attribution blocks)
Source: https://github.com/anthropics/financial-services (Apache-2.0)
* test(skills): cover additional rescan paths in skill_commands cache (#14536)
The rescan-on-platform-change fix landed in #18739 ships one regression
test that exercises the HERMES_PLATFORM env-var path. Three other code
paths in get_skill_commands / _resolve_skill_commands_platform have no
direct coverage; this commit adds a regression test for each.
- Gateway session context (HERMES_SESSION_PLATFORM via ContextVar): the
resolver consults get_session_env after HERMES_PLATFORM, and the
gateway sets that variable through set_session_vars (a ContextVar),
not os.environ. The test uses set_session_vars / clear_session_vars
to drive the actual gateway signal, and the disabled-skill stub reads
the same value via get_session_env. A regression that swapped
get_session_env for plain os.getenv would still pass an env-var-based
test but break concurrent gateway sessions, which is the bug the
ContextVar plumbing exists to prevent.
- Returning to no-platform-scope (CLI / cron / RL rollouts after a
gateway session): the cached telegram view must be dropped and the
unfiltered scan repopulated when HERMES_PLATFORM is unset again.
- Same-platform cache hit: consecutive calls under the same platform
scope must NOT rescan. The rescan trigger is change in scope, not
"always re-resolve" — a gateway serving many consecutive telegram
requests should pay the scan cost once, not per request.
The third test wraps scan_skill_commands with a spy after the cache is
primed, so the assertion is on call_count == 0 across three subsequent
get_skill_commands() calls.
All 39 tests in tests/agent/test_skill_commands.py pass under
scripts/run_tests.sh.
* fix(gateway): translate inbound document host paths to container paths for Docker backend
When terminal.backend is docker, inbound documents uploaded via messaging
platforms (Telegram, Slack, Discord, Feishu, Email, etc.) are cached at a host
path under ~/.hermes/cache/documents, but the container sandbox only sees them
at the auto-mounted /root/.hermes/cache/documents path.
This PR adds to_agent_visible_cache_path() in tools/credential_files.py (the
natural sibling to get_cache_directory_mounts()) and calls it at the
document-context-injection site in gateway/run.py so the agent always receives
a path it can open directly, matching the mount layout already established
by get_cache_directory_mounts() (#4846).
Scope: only Docker backend for now; other backends use different mount
semantics and are left unchanged until verified.
Fixes #18787
* feat(gateway): opt-in cleanup of temporary progress bubbles (#21186)
When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress)
is true, the gateway deletes tool-progress bubbles, long-running '⏳ Still
working...' notices, and status-callback messages after the final response
is delivered successfully. Currently effective on adapters that implement
delete_message (Tel…
RationallyPrime
pushed a commit
to RationallyPrime/hermes-agent
that referenced
this pull request
May 8, 2026
…uardrails (NousResearch#20709) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll, *.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly.
gzsiang
pushed a commit
to gzsiang/hermes-agent
that referenced
this pull request
May 10, 2026
…uardrails (NousResearch#20709) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll, *.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly.
nickdlkk
pushed a commit
to nickdlkk/hermes-agent
that referenced
this pull request
May 11, 2026
…uardrails (NousResearch#20709) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll, *.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly.
5 tasks
rmulligan
pushed a commit
to rmulligan/hermes-agent
that referenced
this pull request
May 11, 2026
…uardrails (NousResearch#20709) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll, *.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly.
JinyuID
pushed a commit
to JinyuID/hermes-agent
that referenced
this pull request
May 11, 2026
…uardrails (NousResearch#20709) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll, *.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the open-ended growth of
~/.hermes/checkpoints/. Single shared shadow git store replaces per-directory repos,_prune()actually prunes, and a newhermes checkpointsCLI surfaces size + cleanup.On one active machine: 847 MB across 47 shadow repos, mostly redundant clones of the same source tree. v2 dedups objects across projects; after a size-cap pass that collapses into a fraction of the old footprint.
Root cause (three compounding problems)
_prune()was a documented no-op:max_snapshotsonly capped the/rollbacklisting. Loose objects accumulated forever.enabled: True+auto_prune: Falsemeant users paid the disk cost without ever opting into/rollback.Architecture (v2)
Migration: on first v2 init, any pre-existing per-directory shadow repos are renamed into
legacy-<timestamp>/so the new store starts clean. No in-place data migration. Users can still inspect the archive withgitif they need old/rollbackhistory;hermes checkpoints clear-legacyreclaims the space when ready.Changes
_prune()actually rewrites the ref to the lastmax_snapshotscommits and runsgit gc --prune=now. New_enforce_size_cap()drops oldest commits round-robin across projects when the store exceedsmax_total_size_mb. New_drop_oversize_from_index()filters files larger thanmax_file_size_mbout of snapshots. New helpers for CLI:store_status(),clear_all(),clear_legacy().hermes checkpointsCLI (status/list/prune/clear/clear-legacy) for managing the store outside a session.enabled: True → False,max_snapshots: 50 → 20,auto_prune: False → True. Addedmax_total_size_mb: 500andmax_file_size_mb: 10. TightenedDEFAULT_EXCLUDES(addedtarget/,*.so/*.dylib/*.dll,*.mp4/*.mov,*.zip/*.tar.gz,.worktrees/,.mypy_cache/,.ruff_cache/, etc.).AIAgentand the startup auto-prune hooks.Validation
Targeted tests:
tests/tools/test_checkpoint_manager.py— 73/73 passing.Adjacent:
tests/tools/andtests/tui_gateway/clean.tests/hermes_cli/andtests/run_agent/failures are all pre-existing onmain(confirmed via stash test).E2E (real git, real file I/O, isolated HERMES_HOME):
legacy-<ts>/max_snapshots=3actually enforced after 6 commitsmax_file_size_mb=1excludes a 2 MB binary while keeping source fileshermes checkpoints {status,prune,clear,clear-legacy}Behavior change notice for users
Anyone currently relying on
/rollbackmust now opt in explicitly:or pass
hermes chat --checkpoints. The rationale: most users never invoke/rollback, and the v1 store grew unboundedly. The feature is fully functional — just not the default.Existing
~/.hermes/checkpoints/<hash>/shadow repos are moved intolegacy-<timestamp>/on first v2 run and will be auto-pruned byauto_pruneafterretention_days(default 7).