Arcee temperature + compression by rob-maron · Pull Request #20344 · NousResearch/hermes-agent

rob-maron · 2026-05-05T18:06:14Z

No description provided.

…verrides Salvage follow-up for PR #20344: - AUTHOR_MAP entry for rob-maron (required by CI) - 17 parametrized tests covering _is_arcee_trinity_thinking, _fixed_temperature_for_model Trinity override, and _compression_threshold_for_model, including sibling-model negatives (trinity-large-preview, trinity-mini) and the OpenRouter slug form.

teknium1 · 2026-05-06T00:23:57Z

Merged via #20473. Your commit (2d4eaed) was cherry-picked onto current main with your authorship preserved in git log. Thanks @rob-maron!

…verrides Salvage follow-up for PR NousResearch#20344: - AUTHOR_MAP entry for rob-maron (required by CI) - 17 parametrized tests covering _is_arcee_trinity_thinking, _fixed_temperature_for_model Trinity override, and _compression_threshold_for_model, including sibling-model negatives (trinity-large-preview, trinity-mini) and the OpenRouter slug form.

* fix(aux): trigger fallback on 429 rate-limit errors in auxiliary client When a provider returns a 429 rate-limit error (not billing-related), the auxiliary client's call_llm/async_call_llm previously did NOT trigger the fallback chain. This caused auxiliary tasks like session_search to exhaust all 3 retries against the same rate-limited endpoint, losing session metadata that depended on the summarization completing. Root cause: `_is_payment_error()` only matched 429s containing billing keywords ("credits", "insufficient funds", etc.). Provider-specific rate-limit messages like Nous's "Hold up for a bit, you've exceeded the rate limit on your API key" didn't match, so `_is_payment_error` returned False, `_is_connection_error` returned False, and `should_fallback` was False — all retries hit the same rate-limited provider. Fix: - New `_is_rate_limit_error()` function that detects 429 + rate-limit keywords, generic 429 without billing keywords, and OpenAI SDK `RateLimitError` class instances (which may omit .status_code). - Updated `should_fallback` in both `call_llm` and `async_call_llm` to include `_is_rate_limit_error`. - Updated the max_tokens retry path to also check for rate-limit errors. - Updated the reason string to include "rate limit". This complements the Nous rate guard (PR #10568) which prevents new calls to Nous when already rate-limited — this fix handles the case where a request is already in flight when the 429 arrives. Related: #8023, #12554, #11034 Co-authored-by: Zeejay <zjtan1@gmail.com> * chore: AUTHOR_MAP entry for zeejaytan * fix(acp): preserve assistant reasoning metadata in session persistence * chore: AUTHOR_MAP entry for Aslaaen * feat(cli): add list_picker_providers for credential-filtered picker The Telegram/Discord /model pickers currently call list_authenticated_providers(), which returns every provider whose credentials resolve locally and every model in its curated snapshot. Two failure modes fall out: - OpenRouter rows can include IDs the live catalog no longer carries. - Provider rows can surface with zero callable models (e.g. a slug whose credential pool entry exists but has nothing behind it). list_picker_providers() wraps the base function and post-processes the result so the interactive picker only shows models the user can actually select: - OpenRouter's models come from fetch_openrouter_models() (live-catalog filtered against the curated OPENROUTER_MODELS snapshot). - Rows with an empty models list are dropped, except custom endpoints (is_user_defined=True with an api_url) where the user may enter model ids manually. - All other fields pass through unchanged. The gateway /model handler switches to the new helper for the interactive picker payload only. Typed /model <name> and the text fallback list stay on list_authenticated_providers() so nothing is hidden from power users or platforms without a picker. Covered by nine focused unit tests in tests/hermes_cli/test_list_picker_providers.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: AUTHOR_MAP entry for Tkander1715 * feat(tui): remove /provider alias for /model (#20358) /model is the canonical command; /provider was a redundant alias that dispatched to the same ModelPicker overlay. Drop the alias, the regex branch in useCompletion, and the alias-coverage test. * fix: resolve lazy session creation regressions (#18370 fallout) (#20363) Fix three regressions introduced by PR #18370 (lazy session creation): 1. _finalize_session() uses stale session_key after compression (#20001) 2. session_key not synced after auto-compression in run_conversation (#20001) 3. pending_title ValueError leaves title wedged forever (#19029) 4. Gateway silently swallows null responses when agent did work (#18765) 5. One-time cleanup for accumulated ghost compression continuations (#20001) Changes: - tui_gateway/server.py: _finalize_session() now uses agent.session_id (falls back to session_key when agent is None). Refactor _sync_session_key_after_compress() with clear_pending_title and restart_slash_worker policy flags. Call it post-run_conversation() to sync session_key after auto-compression. Add ValueError handler to pending_title flush. - gateway/run.py: Extract _normalize_empty_agent_response() helper that consolidates failed/partial/null response handling. Surfaces user-facing error when agent did work (api_calls > 0) but returned no text. - hermes_state.py: Add finalize_orphaned_compression_sessions() — marks ghost continuation sessions as ended (non-destructive, preserves data). - cli.py: One-time startup migration for orphaned compression sessions. Test changes: - tests/test_tui_gateway_server.py: Update pending_title ValueError test for post-#18370 architecture (title applied post-message, not at create). - tests/test_lazy_session_regressions.py: 14 new regression tests covering all fixed paths. * docs(web_tools): correct web_extract summarizer timeout comment The comment at tools/web_tools.py:700-702 stated the runtime default for auxiliary.web_extract.timeout is 360s. The actual runtime default is 30s (_DEFAULT_AUX_TIMEOUT in agent/auxiliary_client.py:3140), used by _get_task_timeout when no auxiliary.web_extract.timeout key is present in config.yaml. The 360s figure is the config template default written by hermes_cli/config.py:697 into freshly-generated config.yaml files. It only takes effect when that key exists in the user's config — not as a fallback. Users on configs that predate commit 20b4060d (Apr 5, 2026), or who removed the key, fall through to the 30s _DEFAULT_AUX_TIMEOUT runtime default. The comment was introduced in 20b4060d alongside the template-default bump from 30 to 360. The runtime default in auxiliary_client.py was not changed in that commit and has remained 30s since 839d9d74 (Mar 28, 2026). * docs(config): fix fallback provider config paths * docs(prompt): clarify supported customization surfaces * chore: AUTHOR_MAP entry for Beandon13 * docs: remove dead reference links in flash-attention skill * docs: remove dead papers.md link from saelens references * docs: fix broken nix-setup anchor for container-aware CLI * fix(telegram): keep DM topic typing scoped * refactor(telegram): make typing thread-id resolver symmetric with send Mirror _message_thread_id_for_typing() with _message_thread_id_for_send(): both now map the General forum topic (thread id "1") to None upfront. That removes the need for the retry-without-thread fallback in send_typing() entirely — if _message_thread_id_for_typing() returns a non-None value, it's a real user-created topic and falling back to the root chat is never correct. If Telegram rejects the typing action (e.g. topic deleted mid-session), we swallow it at debug level instead of bleeding the indicator into All Messages. Updates the General-topic typing regression test to assert the new single-call contract. * docs(tts): document per-provider max_text_length caps PR #13743 replaced the global MAX_TEXT_LENGTH=4000 with a per-provider table and a user-override 'max_text_length:' key, but the user-guide TTS page documented no length behaviour at all. Users hitting truncation had no way to discover the new caps or the override. Add an 'Input length limits' subsection after the existing Configuration YAML block: provider default caps (Edge 5000 / OpenAI 4096 / xAI 15000 / MiniMax 10000 / Mistral 4000 / Gemini 5000 / ElevenLabs model-aware / NeuTTS,KittenTTS 2000), ElevenLabs model_id -> cap table (5k-40k), an override example, and the validation rules (non-positive / non-integer / boolean values fall through to the provider default). * docs(skill/hermes-agent): sync slash commands + add durable-systems section Mirrors the AGENTS.md #20226 additions (Toolsets / Delegation / Curator / Cron / Kanban) into the user-facing hermes-agent skill, and closes the drift in the in-session slash command list. User report (wxrrior in Discord): the skill did not mention /goal, so a brand-new session answering "/hermes-agent do you have any info on /goal" confidently said it did not exist. Cross-check against the CommandDef registry found 16 commands missing from the static list: /goal, /agents, /busy, /copy, /curator, /debug, /footer, /gquota, /indicator, /kanban, /redraw, /reload, /reload-skills, /snapshot, /steer, /topic. Changes: - Slash Commands header now tells the reader to run /help or check the live docs reference as the source of truth, and names the registry of record (hermes_cli/commands.py) so future drift gets flagged honestly instead of answered confidently wrong. - Added all 16 missing commands, slotted into existing subsections (/goal and /steer in Session; /busy + /indicator + /footer in Configuration; /curator + /kanban + /reload-skills + /reload in Tools & Skills; /topic in Gateway; /copy in Utility; /gquota + /debug in Info). - Toolsets table updated to the authoritative 30-key list from toolsets.py (added kanban, yuanbao, spotify, safe, debugging, video, feishu_doc, feishu_drive, discord, discord_admin, clarify; previously stopped at 20 keys). - New "Durable & Background Systems" section before Troubleshooting covers Delegation, Cron, Curator, Kanban - each with a short rundown of CLI verbs, key invariants, and a pointer to the user-facing docs. Mirrors AGENTS.md #20226 but in the skill's user-facing register. - Bumped version 2.0.0 -> 2.1.0. * docs(cli): add --deliver-only flag to hermes webhook subscribe PR #12473 (merged 2026-04-19) added a new --deliver-only flag to `hermes webhook subscribe` for zero-LLM direct delivery, but website/docs/reference/cli-commands.md options table did not reference it. Add the row so CLI users can discover the flag from the reference page instead of having to read the source. * perf(ui-tui): narrow overlay subscriptions to focused selectors Subscribe overlay components to computed theme/session selectors instead of the full UI store so unrelated UI state updates trigger fewer overlay renders. * docs(cli): add skills reset subcommand to CLI reference PR #11468 added `hermes skills reset` but cli-commands.md was not updated. Adds the subcommand to the table and usage examples. Closes #11543 * feat(kanban): generic diagnostics engine for task distress signals (#20332) * feat(kanban): generic diagnostics engine for task distress signals Replaces the hallucination-specific ``warnings`` / ``RecoverySection`` surface (shipped in PR #20232) with a reusable diagnostic-rule engine that covers five distress kinds in v1 and can be extended without touching UI code. The "something's wrong with this task" signal is no longer limited to phantom card ids. Closes the follow-up from #20232 discussion. New module ---------- ``hermes_cli/kanban_diagnostics.py`` — stateless, no-side-effect rule engine. Each rule is a pure function of ``(task, events, runs, now, config) -> list[Diagnostic]``. Registry is a simple list; adding a new distress kind is one function + one import, no UI or API changes required. v1 rule set ----------- * ``hallucinated_cards`` (error) — folds the existing ``completion_blocked_hallucination`` event into the new surface. * ``prose_phantom_refs`` (warning) — folds ``suspected_hallucinated_references``. * ``repeated_spawn_failures`` (error → critical at 2x threshold) — fires when ``tasks.spawn_failures >= 3``; suggests ``hermes -p <profile> doctor`` / ``auth``. * ``repeated_crashes`` (error → critical) — fires after N consecutive ``crashed`` run outcomes with no successful completion between; suggests ``hermes kanban log <id>``. * ``stuck_in_blocked`` (warning) — fires after 24h in ``blocked`` state with no comments / unblock attempts; suggests commenting. Every diagnostic carries structured ``actions`` (reclaim, reassign, unblock, cli_hint, comment, open_docs) that render consistently in both CLI and dashboard. Suggested actions are highlighted; generic recovery actions (reclaim / reassign) are available on every kind as fallbacks. Diagnostics auto-clear when the underlying failure resolves — a clean ``completed``/``edited`` event drops hallucination diagnostics, a successful run drops crash diagnostics, a comment drops stuck-blocked diagnostics. Audit events persist; the badge goes away. API --- ``plugin_api.py``: * ``/board`` now attaches ``diagnostics`` (full list) and ``warnings`` (compact summary with ``highest_severity``) per task. * ``/tasks/{id}`` attaches diagnostics so the drawer's Diagnostics section auto-opens on flagged tasks. * NEW ``/diagnostics`` endpoint — fleet-wide listing, filterable by severity, sorted critical-first. CLI --- * NEW ``hermes kanban diagnostics [--severity X] [--task id] [--json]`` — fleet view or single-task view, matches dashboard rule output so CLI users see the same picture. * ``hermes kanban show <id>`` now renders a Diagnostics section near the top with severity markers + suggested actions. Dashboard --------- * Card badge is severity-coloured (⚠ amber warning, !! orange error, !!! red critical) using ``warnings.highest_severity``. * Attention strip above the toolbar counts EVERY task with active diagnostics (not just hallucinations), severity-coloured, lists affected tasks with Open buttons when expanded. * Drawer's old ``RecoverySection`` replaced with generic ``DiagnosticsSection`` rendering a card per active diagnostic: title + detail + structured data (task-id chips when payload keys look like id lists) + action buttons. Reassign profile picker is inline per-diagnostic. Clipboard fallback uses ``.catch()`` for environments where writeText rejects. * Three-rung severity palette; amber for warning, orange for error, red for critical. Uses CSS variables so theming is straightforward. Tests ----- * NEW ``tests/hermes_cli/test_kanban_diagnostics.py`` — 14 unit tests covering each rule's positive/negative/threshold paths, severity sorting, broken-rule isolation, and sqlite3.Row integration. * Dashboard plugin tests extended: ``/diagnostics`` endpoint (empty, populated, severity-filtered), ``/board`` exposes both diagnostic list and compact summary with ``highest_severity``. * Existing hallucination-specific test (``test_board_surfaces_ warnings_field_for_hallucinated_completions``) updated to reflect the new contract: warning summary keys by diagnostic kind (``hallucinated_cards``) not event kind. 379 kanban-suite tests pass (+16 net from this PR). Live verification ----------------- Seeded all 5 diagnostic kinds + one clean + one plain-running task (7 total) into an isolated HERMES_HOME, spun up the dashboard, and verified: * Attention strip: shows ``!! 5 tasks need attention`` in the error-severity orange; Show expands to a list of 5 rows ordered critical > error > warning. * Card badges: error tasks render ``!!`` orange, warning tasks render ``⚠`` amber, clean and plain-running tasks render no badge. * Each of the 5 rules opens a correctly-coloured, correctly-styled diagnostic card in the drawer with its specific suggested action. * Live reassign from a diagnostic card flipped ``broken-ml-worker → alice`` and the drawer refreshed with the new assignee + the same diagnostic still firing (correct: spawn_failures counter hasn't reset yet). * CLI ``hermes kanban diagnostics`` prints all 5 in severity order; ``--severity error`` narrows to 3; ``kanban show <id>`` includes the Diagnostics block at the top with suggested action hint. Migration note -------------- The old ``warnings`` shape (``{count, kinds, latest_at}``) is preserved on the API but ``kinds`` now keys by diagnostic kind (``hallucinated_cards``) instead of event kind (``completion_blocked_hallucination``). ``highest_severity`` is a new required field. The dashboard was the only consumer and has been updated in the same commit; external API consumers of the ``warnings`` field will need to update their kind-match logic. * feat(kanban/diagnostics): lead titles with the actual error text The generic 'Worker crashed N runs in a row' / 'Worker failed to spawn N times' titles buried the actual cause in the data section. Operators had to open logs or expand the diagnostic to see WHY the worker is stuck — rate-limit vs insufficient quota vs bad auth vs context overflow vs network blip all looked identical at a glance. New titles: Agent crashed 3x: openai: 429 Too Many Requests - rate limit reached Agent crashed 3x: anthropic: 402 insufficient_quota - credit balance Agent crashed 3x: provider auth error: 401 Unauthorized Agent spawn failed 4x: insufficient_quota: You exceeded your current Detail keeps the full error snippet (capped at 500 chars + ellipsis for tracebacks). Title takes the first line capped at 160 chars. Fallback title if no error recorded stays honest ('no error recorded'). Tests: 4 new cases covering 429/billing/spawn/truncation. 383 total pass (+4). Live-verified on dashboard with 6 seeded scenarios (rate-limit, billing, auth, context, network, spawn-billing) — each card title leads with the actionable error text. * docs(agent): remove stale BuiltinMemoryProvider references from memory module docstrings The BuiltinMemoryProvider class was removed from the codebase but its name lingered in the module-level docstrings of memory_manager.py and memory_provider.py, creating false expectations: - memory_manager.py docstring showed example code doing add_provider(BuiltinMemoryProvider(...)) which ImportError at runtime - memory_provider.py docstring listed BuiltinMemoryProvider as 'always present, not removable' — misleading for new contributors The regression test (test_memory_user_id.py) already passes without any reference to BuiltinMemoryProvider; it uses RecordingProvider instances directly. The stale references were docs-only drift. Update both docstrings to reflect the actual current architecture: MemoryManager accepts external plugin providers only (one at a time). Closes #14402 * docs(plugins): document ctx.dispatch_tool() in plugin capabilities table * docs(guide): add Dispatch tools from slash commands section * docs(cron): add context_from chaining section Resolved merge against current main (new No-agent mode section added in parallel). Co-authored-by: Tony Simons <tony@tonysimons.dev> * chore: AUTHOR_MAP entry for asimons81 * feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path Introduces providers/ package — single source of truth for every inference provider. Adding a simple api-key provider now requires one providers/<name>.py file with zero edits anywhere else. What this PR ships: - providers/ package (ProviderProfile ABC + 33 profiles across 4 api_modes) - ProviderProfile declarative fields: name, api_mode, aliases, display_name, env_vars, base_url, models_url, auth_type, fallback_models, hostname, default_headers, fixed_temperature, default_max_tokens, default_aux_model - 4 overridable hooks: prepare_messages, build_extra_body, build_api_kwargs_extras, fetch_models - chat_completions.build_kwargs: profile path via _build_kwargs_from_profile, legacy flag path retained for lmstudio/tencent-tokenhub (which have session-aware reasoning probing that doesn't map cleanly to hooks yet) - run_agent.py: profile path for all registered providers; legacy path variable scoping fixed (all flags defined before branching) - Auto-wires: auth.PROVIDER_REGISTRY, models.CANONICAL_PROVIDERS, doctor health checks, config.OPTIONAL_ENV_VARS, model_metadata._URL_TO_PROVIDER - GeminiProfile: thinking_config translation (native + openai-compat nested) - New tests/providers/ (79 tests covering profile declarations, transport parity, hook overrides, e2e kwargs assembly) Deltas vs original PR (salvaged onto current main): - Added profiles: alibaba-coding-plan, azure-foundry, minimax-oauth (were added to main since original PR) - Skipped profiles: lmstudio, tencent-tokenhub stay on legacy path (their reasoning_effort probing has no clean hook equivalent yet) - Removed lmstudio alias from custom profile (it's a separate provider now) - Skipped openrouter/custom from PROVIDER_REGISTRY auto-extension (resolve_provider special-cases them; adding breaks runtime resolution) - runtime_provider: profile.api_mode only as fallback when URL detection finds nothing (was breaking minimax /v1 override) - Preserved main's legacy-path improvements: deepseek reasoning_content preserve, gemini Gemma skip, OpenRouter response caching, Anthropic 1M beta recovery, etc. - Kept agent/copilot_acp_client.py in place (rejected PR's relocation — main has 7 fixes landed since; relocation would revert them) - _API_KEY_PROVIDER_AUX_MODELS alias kept for backward compat with existing test imports Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Closes #14418 * feat(providers): make all 33 providers pluggable under plugins/model-providers/ Every provider profile is now a self-contained plugin under plugins/model-providers/<name>/, mirroring the plugins/platforms/ pattern established for IRC and Teams. The ProviderProfile ABC stays in providers/; the per-provider profile data moves out. - plugins/model-providers/<name>/__init__.py calls register_provider() - plugins/model-providers/<name>/plugin.yaml declares kind: model-provider - providers/__init__.py._discover_providers() lazily scans bundled plugins then $HERMES_HOME/plugins/model-providers/<name>/ (user override path) - User plugins with the same name override bundled ones (last-writer-wins in register_provider) - Legacy providers/<name>.py layout still supported for back-compat with out-of-tree editable installs - Hermes PluginManager: new kind=model-provider; skipped like memory plugins (providers/ discovery owns them); standalone plugins with register_provider+ProviderProfile in their __init__.py auto-coerce to this kind (same heuristic as memory providers) - skip_names extended to include 'model-providers' so the general PluginManager doesn't double-scan the category - 4 new tests in tests/providers/test_plugin_discovery.py covering bundled discovery, user override, and general-loader isolation - Docs updated: website/docs/developer-guide/adding-providers.md, provider-runtime.md, providers/README.md, plugins/model-providers/README.md No API break: auth.py / config.py / doctor.py / models.py / runtime_provider.py / model_metadata.py / auxiliary_client.py / chat_completions.py / run_agent.py all still consume providers via get_provider_profile() / list_providers() — they just now see plugin-discovered entries instead of pkgutil-iterated ones. Third parties can now drop a single directory into ~/.hermes/plugins/model-providers/<name>/ to add or override an inference provider without touching the repo. * docs(cli): expand hermes import reference — add description, warning, and examples * docs(bedrock): fix IAM permissions, add quickstart entry, add fallback provider, fix deployment section * docs: fix Camofox Docker setup instructions * docs(providers): Together/Groq/Perplexity cookbook via custom_providers Three worked recipes for OpenAI-compatible cloud providers, plus the Copilot HTTP 401 auto-recovery info block and the GMI Cloud row in the compatible providers table. All three additions were on the original docs/custom-providers-cookbook branch but its merge base predated 1186 main commits, making the rebase impractical (84k+ line conflict). Replays just the providers.md additions onto current main. * fix(tui): close slash parity gaps with CLI (#20339) * fix(tui): close slash parity gaps with CLI Route unsupported /skills subcommands through slash.exec, support /new <name> titles, and handle /redraw natively so TUI behavior matches classic CLI. Also filter gateway-only commands out of the TUI catalog while keeping /status discoverable. * fix(tui): run remaining CLI parity paths natively Forward chat launch flags into the TUI runtime and handle live-session status and skill reloads in the gateway process so TUI state no longer depends on the slash worker's stale CLI instance. * fix(tui): block stale snapshot restores Prevent snapshot restore from running through the isolated slash worker because it mutates disk state without refreshing the live TUI agent. * chore: uptick * fix(tui): guard async session title updates Handle failures from the fire-and-forget session.title RPC so title-setting errors do not surface as unhandled promise rejections while preserving session-scoped messaging. * docs(gemini): add Google Gemini guide * chore: AUTHOR_MAP entry for jethac * docs: align terminal-backend count and naming across docs and code README:24 claimed "Six terminal backends" while tools/environments/ exposes seven top-level backend choices through TERMINAL_ENV: local, docker, ssh, singularity, modal, daytona, vercel_sandbox. Modal additionally has direct and Nous-managed modes selected via terminal.modal_mode (the ManagedModalEnvironment class is a Modal sub-mode, not a separate top-level backend). The same drift appeared in five other doc and code-comment sites with inconsistent counts (six, seven, or implicit) and varying lists. Updated all sites to a consistent seven-backend list in canonical order. The configuration guide also clarifies how Modal's two modes are selected so operators do not search for a non-existent backend: managed_modal value. CONTRIBUTING.md:160 lists six backend filenames in a code tree but does not carry the "Six terminal" prose; left out of scope per cohesion sweep guidance to bundle only identical wording. Files updated: - README.md (line 24, marketing copy) - website/docs/index.md (line 49, landing page) - website/docs/user-guide/configuration.md (line 86, config guide) - tools/environments/__init__.py (lines 3-6, package docstring) - tools/file_operations.py (line 6, module docstring) - environments/README.md (line 43, RL training docs — TERMINAL_ENV list) * chore: AUTHOR_MAP entry for deep-name * docs: refresh stale platform/LOC/test counts; clarify gateway vs plugin platforms AGENTS.md is the AI-assistant entry doc, so its counts get used as ground truth. Several values had drifted, and the same drift had spread to a few user-facing surfaces. Fixing all of them in one commit so the count claims agree and clearly distinguish gateway-core from plugin-shipped platforms. AGENTS.md: - run_agent.py "~12k LOC" → "~14k LOC as of 2026-05-03" (actual 14,097) - cli.py "~11k LOC" → "~12k LOC as of 2026-05-03" (actual 12,043) - tools/environments/ list now lists all 7 user-selectable terminal backends in canonical order, matching tools/terminal_tool.py:2214-2215 - gateway/platforms/ list adds yuanbao and wecom_callback; the 19 names match the user-facing list at website/docs/integrations/index.md - plugins/ tree now mentions plugins/platforms/ (irc, teams) - tests/ snapshot "~15k tests across ~700 files as of Apr 2026" → "~19k tests across ~890 files as of 2026-05-03" User-facing count claims: - hermes_cli/tips.py:195 — "19 platforms" → "21 messaging platforms" with IRC and Microsoft Teams added to the named list - website/docs/index.md:49 — "6 terminal backends" → "7 terminal backends: ..., Vercel Sandbox" (also corrected by PR #19044; same edit content) - website/docs/index.md:50 — "15+ platforms from one gateway" → "21+ messaging platforms (19 in the gateway, plus IRC and Microsoft Teams via plugins)" - website/docs/integrations/index.md:83-85 — "15+ messaging platforms" → "19+", added yuanbao to the linked list. The surrounding text scopes it to "configured through the same gateway subsystem", so plugin platforms (IRC, Teams) are intentionally not in this list - website/scripts/generate-llms-txt.py:205 — "15+ platforms" → "21+ messaging platforms — 19 native to the gateway plus IRC and Microsoft Teams via plugins" LOC and date stamps follow the existing AGENTS.md "as of <date>" convention (line 56 already used this pattern). Source of truth for the gateway count is gateway/config.py:130-148 (PlatformID enum); plugin platforms live in plugins/platforms/. Out of scope: - RELEASE_v0.9.0.md historical "16 platforms" claim (immutable history) - userStories.json verbatim user quotes - Programmatic count generation from gateway/config.py + plugin manifests is a worthwhile build-system change but separate from these content fixes * docs(skills): explain restoring bundled skills * docs(docker): add section on connecting to local inference servers (vLLM, Ollama) Adds a comprehensive guide for connecting Dockerized Hermes to local inference servers like vLLM and Ollama, covering: - Docker Compose networking (recommended) - Standalone Docker run with host.docker.internal / --network host - Connectivity verification steps - Ollama-specific example Closes #12308 * docs(docker): document API_SERVER_* env vars for exposing the OpenAI-compatible endpoint Salvage of #11758. The PR's original diff was stale (the Docker Compose section on main has been heavily refactored — dashboard is now an embedded side-process, not a separate service), so the useful bit (API server env var requirements) is applied as a note on the basic `docker run` example. Co-authored-by: xiangyong <xiangyong@zspace.cn> * chore: AUTHOR_MAP entry for CES4751 * docs(discord): fix Server Members Intent + SSRC-mapping drift; add /voice join slash Choice Salvage of #11350. Kept: - Code: add an explicit /voice join Choice in the slash UI (runner accepts both 'join' and 'channel' but only 'channel' was in autocomplete). - Docs: Server Members Intent is conditional (only needed if DISCORD_ALLOWED_USERS contains usernames); SSRC → user_id mapping uses the voice websocket SPEAKING opcode, not the Members intent. Dropped from the original PR: - HERMES_DISCORD_VOICE_PACKET_DUMP — this env var doesn't exist on main (it was in a different PR that isn't merged). - DISCORD_PROXY docs — already documented on current main. - DISCORD_ALLOW_MENTION_* docs — already on main. - "barge-in mode" rewrite — current main actually does pause the listener during TTS (VoiceReceiver.pause() at discord.py:192); there is no barge_in_guard/barge_in_rms on main. Co-authored-by: Michel Belleau <michel.belleau@malaiwah.com> * docs(skills): modernize Obsidian file workflows * chore: AUTHOR_MAP entry for counterposition * docs(kanban): document handoff evidence metadata * chore: AUTHOR_MAP entry for Fearvox * docs: clarify Telegram group chat troubleshooting * docs(codex): clarify OAuth auth prerequisite * docs(voice): add Doubao speech integration examples (TTS + STT) * chore: AUTHOR_MAP entry for Hypnus-Yuan * docs(faq): use messaging extra for gateway deps * chore: AUTHOR_MAP entry for xsfX20 * fix(kanban): unify failure counter across spawn/timeout/crash outcomes (#20410) The dispatcher's circuit breaker only protected against spawn-side failures (profile missing, workspace mount error, exec failure). Workers that successfully spawned but then timed out or crashed re-queued to ``ready`` with no counter increment, so the next tick re-spawned them — loops forever until someone noticed. Reported externally on Twitter (Forbidden Seeds) and confirmed by walking the kernel: ``enforce_max_runtime`` flipped the task back to ready, emitted a ``timed_out`` event, and never touched ``spawn_failures``; same for ``detect_crashed_workers``. Fix: unify the counter across all non-success outcomes. Schema ------ * ``tasks.spawn_failures`` → ``tasks.consecutive_failures`` * ``tasks.last_spawn_error`` → ``tasks.last_failure_error`` * Migration renames the columns in-place on existing DBs (``ALTER TABLE RENAME COLUMN`` — SQLite >= 3.25) so historical counter values are preserved. Row mappers fall through to the legacy names if both column renames and a migration somehow got out of sync. Counter lifecycle ----------------- New helper ``_record_task_failure(conn, task_id, error, *, outcome, release_claim, end_run, event_payload_extra)`` is the single point every non-success outcome funnels through: * ``spawn_failed`` → ``_record_spawn_failure`` (kept as alias) calls it with ``release_claim=True, end_run=True`` — transitions running→ready, clears claim, closes run. * ``timed_out`` → ``enforce_max_runtime`` already does the status transition + run close + event emission, then calls ``_record_task_failure`` with ``release_claim=False, end_run=False`` just to bump the counter (and trip the breaker if needed). * ``crashed`` → ``detect_crashed_workers`` same pattern, but the counter increment runs after the main write_txn closes (SQLite doesn't nest write transactions). If the counter hits the breaker threshold (``DEFAULT_FAILURE_LIMIT=5``, same as before), the task transitions to ``blocked`` with a ``gave_up`` event on top of whatever outcome-specific event was already emitted. Reset semantics changed: the counter now clears only on successful ``complete_task`` (and operator ``reclaim_task`` — an explicit "I've looked at this, try again with a fresh budget"). Previously ``_clear_spawn_failures`` ran on every successful spawn, which would have wiped the counter before a timeout could accumulate past threshold — exactly the loop this fix prevents. Diagnostics ----------- * ``_rule_repeated_spawn_failures`` → ``_rule_repeated_failures``. Now fires regardless of which outcome is at fault. Classifies the most recent failure (spawn_failed / timed_out / crashed) from the run history so the title ("Agent timeout x3", "Agent crash x4", "Agent spawn x5") and suggested action (``doctor`` for spawn, ``log`` for timeout/crash) stay outcome-specific without N duplicate rules. * ``_rule_repeated_crashes`` kept as a narrower early-warning at threshold 2 (vs 3 for the unified rule), but now suppresses itself when the unified rule would also fire — avoids double-flagging. * Diagnostic ``data`` payload now carries ``{consecutive_failures, most_recent_outcome, last_error}`` instead of spawn-specific keys. CLI --- * ``Task.consecutive_failures`` / ``Task.last_failure_error`` are the public fields now. Existing callers that referenced the old names get migrated (tests updated in this commit). * Backward-compat: ``DEFAULT_SPAWN_FAILURE_LIMIT``, ``_clear_spawn_failures``, ``_record_spawn_failure`` stay as aliases. Tests ----- * 6 new kernel tests: timeout increments counter, 3 consecutive timeouts trip the breaker (was the reported gap), crash increments counter, reclaim clears counter, completion clears counter, spawn success does NOT clear counter. * Diagnostic tests: updated ``repeated_spawn_failures`` cases to use the new kind name and add a timeout-loop test. * Dashboard API test: spawn_failures column update → consecutive_failures. 389/389 kanban-suite tests pass. Live verification ----------------- Seeded 4 tasks in an isolated HERMES_HOME: 3 timeouts, 4 crashes, 2-spawn-failed + 2-timed-out, and a task that had prior failures but completed successfully. Board correctly shows "!! 3 tasks need attention" (the successful one has no badge because the counter reset). Drawer for the timeout-loop task renders "Agent timeout x3" with most_recent_outcome=timed_out and the "Check logs" suggested action (not the spawn-flavoured "Verify profile"). The successful task has zero diagnostics. Closes the Forbidden-Seeds-reported gap. * docs(guides): add guide for running Hermes locally with Ollama Step-by-step guide covering Ollama installation, model selection, Hermes configuration, speed optimization, and optional gateway bot setup — all running on local hardware with zero API cost. Includes hardware requirements, model comparison table with tool-call support status, context window tuning, GPU offloading tips, fallback provider setup, troubleshooting, and cost comparison. * chore: AUTHOR_MAP entry for binhnt92 * docs: add Open WebUI bootstrap script * chore: AUTHOR_MAP entry for acesjohnny * docs(browser): document WSL-to-Windows Chrome MCP bridge * chore: AUTHOR_MAP entry for liu-collab * docs(i18n): add zh-Hans Tool Gateway, image gen, and Windows WSL guide Made-with: Cursor * docs: add Chinese (zh-CN) README translation Closes #12954 - Add README.zh-CN.md with complete Simplified Chinese translation - Add language switcher badge in README.md linking to Chinese version - Add language switcher badge in README.zh-CN.md linking to English version * chore: AUTHOR_MAP entry for zhanggttry * docs: update VS Code setup instructions for ACP Client integration * chore: AUTHOR_MAP entry for formulahendry * test(kanban): cover metadata handoff round-trip * feat(gateway): respect kanban.max_spawn config to limit concurrent tasks The dispatch_once function already accepts a max_spawn parameter but the gateway was calling it without passing any value, effectively ignoring the configuration. This change reads kanban.max_spawn from config.yaml and passes it through, allowing users to limit concurrent kanban tasks. This prevents resource exhaustion scenarios where kanban dispatcher spawns too many parallel workers on constrained hardware. * guard kanban worker lifecycle by run id * chore(release): AUTHOR_MAP entries for momowind and misery-hl * feat(hindsight): probe API for update_mode='append' support, dedupe across processes Mirrors the pattern already shipping in hindsight-integrations/openclaw: probe `<api_url>/version` once per process, gate on Hindsight ≥ 0.5.0. When supported, retains use a stable session-scoped `document_id` (`session_id`) plus `update_mode='append'` so cross-process retains for the same session merge into one document instead of producing N-different-process-stamped duplicates. When unsupported (or probe fails), fall back to the existing per-process unique `f"{session_id}-{start_ts}"` document_id with no `update_mode` — the resume-overwrite fix (#6654) keeps working unchanged on legacy servers. Closes the dedup half of #20115. The proposed `document_id_strategy` config knob isn't needed: auto-detection via the same /version probe the OpenClaw plugin already uses gives the same outcome with no extra config burden, and the choice is purely a function of what the server can do. Plumbing -------- - Module-level helpers (`_meets_minimum_version`, `_fetch_hindsight_api_version`, `_check_api_supports_update_mode_append`) cache the result per api_url so every provider in the process gets one /version round-trip. - One-time WARN logged when the API is older than 0.5.0, telling the user to upgrade for cross-session deduplication. - New instance helper `_resolve_retain_target(fallback_doc_id)` returns `(document_id, update_mode)` based on cached capability. Wired into `sync_turn` and the `on_session_switch` flush path. - For local_embedded mode, the probe URL is taken from the running client (`client.url`) so we hit the actual daemon port rather than the configured default. - `update_mode` is set on the per-item dict; `aretain_batch` already threads `item['update_mode']` into the API call. Tests ----- - `TestUpdateModeAppendCapability` (5 cases): legacy fallback, modern stable+append, per-url cache, one-time warn, flush-on-switch resolves against the OLD session. - Existing `_make_hindsight_provider` factory in the manager-side test file extended to seed `_mode`/`_api_url`/`_api_key`/`_client` and stub `_resolve_retain_target` so the bypass-init pattern keeps working. E2E verified against installed `~/.hermes/hermes-agent`: - Legacy probe (unreachable host) → `legacy-session-<ts>` doc_id, no `update_mode`. - Modern probe (live local_embedded 0.5.6 daemon) → stable `modern-session` doc_id + `update_mode='append'`. - `test_hermes_embedded_smoke.py` passes (90s). * fix(api_server): SSE token batching + error handling for Open WebUI performance Reduces SSE event rate ~500/turn → ~20/turn via 50ms text-delta batching in _dispatch(), which eliminates markdown re-render storms on Open WebUI. Also: - Trim tool_call.arguments in the response.completed event to 100KB (prevents silent hangs on 848KB+ single-line SSE events). - Catch-all exception handlers in _write_sse_responses() + _write_sse_chat_completion() emit a proper error chunk instead of TransferEncodingError from incomplete chunked encoding when the agent crashes mid-stream. - MAX_REQUEST_BYTES 1MB → 10MB; pass client_max_size to aiohttp Application to avoid silent 400s on truncated request bodies for long conversations. Salvage of #17552 (api_server portion only). The contrib/openwebui-filter/ payload from that PR — Open WebUI Filter Function + benchmark writeup — is a client-side user-installable add-on and doesn't need to live in the repo; dropped here. Closes #17537. Co-authored-by: bogerman1 <93757150+bogerman1@users.noreply.github.com> * chore: AUTHOR_MAP entry for bogerman1 * feat(i18n): add French (fr) locale support - Add fr.yaml with French translations for approval prompts and gateway messages - Register 'fr' in SUPPORTED_LANGUAGES - Add French aliases: french, français, fr-fr, fr-be, fr-ca, fr-ch - Update locale sync comment in en.yaml * feat(i18n): add Ukrainian locale * chore: AUTHOR_MAP entry for olisikh * arcee temperature + compression * test(arcee): cover Trinity Large Thinking temperature + compression overrides Salvage follow-up for PR #20344: - AUTHOR_MAP entry for rob-maron (required by CI) - 17 parametrized tests covering _is_arcee_trinity_thinking, _fixed_temperature_for_model Trinity override, and _compression_threshold_for_model, including sibling-model negatives (trinity-large-preview, trinity-mini) and the OpenRouter slug form. * fix(doctor): report Kanban worker tools as runtime-gated * fix(kanban): accept created_cards linked as child of completing task Widens _verify_created_cards to also accept ids that are children of the completing task in task_links. Previously we only accepted cards where created_by matched the completing task's assignee, which was too strict for legitimate orchestrator flows: a specifier creates a card (so created_by=specifier, not worker), then a worker picks it up and passes parents=[current_task] to kanban_create. The explicit link proves the relationship and should be trusted. Salvaged from #20022 @LeonSGP43 (full PR superseded by #20232 + this patch; the linked-children relaxation was the portable improvement). * fix(kanban): measure max runtime from current run * test(kanban): backdate task_runs.started_at alongside tasks.started_at After #19473 landed (enforce_max_runtime reads from task_runs.started_at rather than tasks.started_at), a regression test added earlier still only backdated the tasks column. Backdate both so the test is robust regardless of which column the enforcer reads from. * fix(kanban): prevent child task dispatch when parent is not done Add parent dependency guard to _set_status_direct so dragging a task to the ready column is rejected (409) when its parents are not all done. Previously the guard only existed in recompute_ready, allowing direct status writes via the dashboard API to bypass the dependency engine. Root cause: after reclaiming stale workers, both T3 and T4 were set to ready via dashboard status writes in quick succession, causing the writer to be spawned while the analyst was blocked — upstream work wasn't done yet. * feat(kanban): surface task_runs.summary on dashboard cards + ``kanban show`` The kanban-worker skill (built into the gateway dispatcher's spawn prompt) instructs every worker to hand off via ``kanban_complete(summary=..., metadata=...)``. That writes the summary onto the closing ``task_runs`` row, NOT onto ``tasks.result`` — the latter is left NULL unless the caller passes ``result=`` explicitly. Result: a glance at the dashboard or ``hermes kanban show <id>`` shows a blank "Result:" section even when the worker did real work, which on 2026-05-05 caused a Mac false-alarm ("Hermes did nothing") on a task that had a 10-line completion summary on its run. This patch surfaces the latest non-null run summary as ``latest_summary`` so the worker's actual handoff lands in front of operators. * New helpers ``kanban_db.latest_summary(conn, task_id)`` and ``kanban_db.latest_summaries(conn, task_ids)``. The batch variant uses a single window-function SELECT so the dashboard board endpoint doesn't pay an N+1 cost on multi-hundred-task boards. * CLI ``hermes kanban show <id>`` prints a "Latest summary:" block when ``tasks.result`` is empty but a run has produced a summary (the existing "Result:" section still wins when populated, so the back-compat path for hand-edited results is untouched). JSON output gains a top-level ``latest_summary`` field. * Dashboard ``/board`` and ``/tasks/{id}`` now include a ``latest_summary`` field on every task. Cards on /board carry a 200-character preview (cheap to render, plenty for "what did this worker do?" at a glance); the drawer/detail endpoint returns the full summary. * Five new tests cover: empty-runs case, post-complete surface, newest-of-multiple selection, empty-string skip, batch with missing tasks + empty input. Smoke-tested locally against the live profile DB on the three acceptance-criterion targets (t_f08fef91 cron-hygiene-audit, t_007b7f1c EMA-analysis, t_05746fa4 self-assessment) — all three now return their populated summaries via both ``latest_summary`` and ``latest_summaries``. Test plan: 255/255 kanban tests pass + 91/91 dashboard plugin tests pass. No regression on tasks where ``tasks.result`` is explicitly populated (the existing "Result:" branch is preserved). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(kanban): wire dependency selects * chore(release): AUTHOR_MAP entries for suncokret12 and mioimotoai-lgtm * feat(i18n): add Turkish (tr) locale - Add locales/tr.yaml with Turkish translations for all approval.* and gateway.* keys - Register 'tr' in SUPPORTED_LANGUAGES - Add Turkish aliases: turkish, türkçe, tr-tr * fix: add Turkish locale references in config, tests, and docs - hermes_cli/config.py: add tr to supported languages comment - locales/en.yaml: add tr to locale file list comment - tests/agent/test_i18n.py: add Turkish alias tests + explicit lang test - website/docs/user-guide/configuration.md: add tr to supported values * docs: document custom model aliases for /model command (#20475) User-defined model aliases (config.yaml model_aliases: and model.aliases.*) have worked since early versions but were entirely undocumented. Add a dedicated 'Custom model aliases' section to slash-commands.md covering both YAML config formats and the 'hermes config set' shell form, mirror a shorter version into the configuring-models 'Alternative methods' section, and cross-link from the two /model table rows. Flagged by @weehowe on Twitter — he wasn't aware the feature existed. * feat(models): add deepseek/deepseek-v4-pro to OpenRouter + Nous Portal curated lists (#20495) Endpoint re-tested over 6 conversational turns (9 API calls, 3 tool calls) and an 8-request burst — no rate limits, no errors, ~2-3s latency. The historical rate-limit issues that caused its removal are gone. - hermes_cli/models.py: add to OPENROUTER_MODELS and _PROVIDER_MODELS['nous'] - website/static/api/model-catalog.json: regenerated via build_model_catalog.py * feat(models): add x-ai/grok-4.3 to OpenRouter + Nous Portal curated lists (#20497) Endpoint validated over 6 conversational turns with tool calls (9 API calls, 3 tool calls, 0 failures) and an 8-request burst (8/8 ok, 0 rate limits). Latency ~5-10s/call — slower than grok-4.20 but expected for a reasoning model. - hermes_cli/models.py: add to OPENROUTER_MODELS and _PROVIDER_MODELS['nous'] - website/static/api/model-catalog.json: regenerated * fix: salvage batch — compaction guidance, memory authority, cache eviction after compression - Fix /compact → /compress in context-overflow tips (closes #20020) - Evict cached agent after session hygiene and /compress so system prompt refreshes with current SOUL.md, memory, and skills - Restore memory authority across compaction: change 'informational background data' to 'authoritative reference data' in memory block and SUMMARY_PREFIX, with backward-compatible regex Based on: - PR #20027 by @LeonSGP43 - PR #18767 by @MacroAnarchy - PR #17380 by @vominh1919 PR #17121 boundary marker fix already merged to main (2eef395e1). PR #9262 user-message anchoring already on main via _ensure_last_user_message_in_tail(). * feat(browser): add Lightpanda engine support with automatic Chrome fallback Add Lightpanda as an optional browser engine for local mode. Lightpanda is a headless browser built from scratch in Zig -- faster navigation than Chrome with significantly less memory. One config line to enable: browser: engine: lightpanda New functions in browser_tool.py: - _get_browser_engine() -- config/env reader with validation + caching - _should_inject_engine() -- only inject in local non-cloud mode - _needs_lightpanda_fallback() -- detect empty/failed LP results - _chrome_fallback_screenshot() -- temporary Chrome session for screenshots - Engine injection in _run_browser_command (--engine flag) - browser_vision pre-routes screenshots to Chrome when engine=lightpanda Config: - browser.engine in DEFAULT_CONFIG (auto/lightpanda/chrome) - AGENT_BROWSER_ENGINE in OPTIONAL_ENV_VARS - /browser status shows engine info in local mode Rebased from PR #7144 onto current main. All existing code preserved -- pure additions only (+520/-2). 25 new tests + 81 total browser tests pass (0 failures). * fix(browser): surface Lightpanda Chrome fallback warnings * feat(tui): collapsible sections in startup banner (skills, system prompt, MCP) The TUI SessionPanel banner now uses collapsible \u25b8/\u25be toggle sections matching the existing Chevron convention used for runtime agent details. Skills, system prompt, and MCP server lists are collapsed by default; tools remain expanded as the most actionable info. - tui_gateway/server.py: _session_info() now passes agent._cached_system_prompt through to the TUI frontend - ui-tui/src/types.ts: added system_prompt?: string to SessionInfo - ui-tui/src/components/branding.tsx: rewrote SessionPanel with CollapseToggle helper + per-section useState toggles Default states: tools=open, skills=collapsed, system=collapsed, mcp=collapsed. Clicking any \u25b8/\u25be header toggles that section. * fix(tui): collapse long system messages in transcript with expand toggle System messages over 400 chars (system prompt, AGENTS.md, etc.) now render as a collapsed \u25b8/\u25be toggle line in the transcript, matching the Chevron convention used for runtime details. The summary shows the first line + char count; clicking expands to full content. * fix(browser): tighten Lightpanda fallback edge cases * fix(gateway): preserve model picker current context * fix(update): drop pip --quiet so slow installs don't look hung (#20679) On Termux/Android aarch64 (and other platforms without prebuilt wheels for some optional extras), 'pip install -e .[all]' compiles C/Rust extensions from source. This can run for several minutes with zero network activity and — with --quiet — zero stdout. Users report 'hermes update hangs at Updating Python dependencies', Ctrl+C it, then re-run and see 'up to date' (because git pull already succeeded and the pip step was still working when they interrupted). Pip's default output is proportional to actual work (one line per Collecting / Building wheel for X / Installing), so removing --quiet costs nothing on fast hardware and prevents the false-hang interrupt loop on slow hardware. Reported via Discord on Termux/Android. Supersedes #20466 which misdiagnosed the hang as PYTHONPATH shadowing (install.sh doesn't run during 'hermes update', and terminal() doesn't inherit PYTHONPATH). * fix(cli): guard logger.debug in signal handler (#13710 regression) (#20673) CPython's logging module is not reentrant-safe. `Logger.isEnabledFor` caches level results in `Logger._cache`; under shutdown races the cache can be cleared (`Logger._clear_cache`, triggered by logging config changes from another thread) or mid-mutation when a signal fires, raising `KeyError: <level_int>` (e.g. `KeyError: 10` for DEBUG) inside the signal handler. When that happens, the KeyError escapes before the `raise KeyboardInterrupt()` on the next line can fire, which bypasses prompt_toolkit's normal interrupt unwind and surfaces as the EIO cascade originally reported in #13710. Issue #13710 shipped two defenses (asyncio exception handler + outer `except (KeyError, OSError)` with EIO suppression) that cover the EIO unwind path. This patch closes the remaining escape hatch: the `logger.debug` call at the top of `_signal_handler` itself. Wrap it in a bare `try/except Exception: pass` so logging can never raise through a signal handler. Observed in the wild: debug report on 0.12.0 (commit 8163d371) shows the exact stack — KeyError: 10 at logging/__init__.py:1742 inside the signal handler's `logger.debug`, followed by the EIO cascade from prompt_toolkit's emergency flush. Tests: adds `TestSignalHandlerLoggingRace` to `tests/hermes_cli/test_suppress_eio_on_interrupt.py` with 6 new cases: - normal path still raises KeyboardInterrupt - KeyError(10) from logger.debug does not escape - any Exception from logger.debug is swallowed - agent.interrupt still fires when logger.debug raises - agent.interrupt raising also does not escape - BaseException (SystemExit) is NOT swallowed — guard uses `except Exception` deliberately so real shutdown signals still propagate Closes #13710 regression. * fix: harden install.sh against inherited Python env leakage * chore: AUTHOR_MAP entry for adybag14-cyber * fix(ui): reduce status-line jitter while scrolling * fix(tui): stabilize FaceTicker elapsed width to prevent composer drift * fix(tui): restore gap before duration when verb segment is hidden The verb-padding change dropped the leading space in durationSegment on the assumption that the verb's trailing pad always supplies the gap. But the unicode spinner style sets showVerb=false, making verbSegment an empty string — in that mode the output would become `{frame}· {duration}` with no separator. Add the space back; harmless when the verb segment is shown (its trailing pad still provides the gap). * chore(release): map liuguangyong@hellobike -> liuguangyong93 * fix(kanban): reset code element background inside board The Nous DS globals.css applies a global rule: code { background: var(--midground); color: var(--background); } This paints an opaque cream/yellow fill on every <code> element, which hides text in the kanban drawer's event-payload, run-meta, and worker-log panes (all rendered as <code>). Fix: scope a reset inside .hermes-kanban so <code> elements inherit their parent's color and stay transparent. * fix(cli): recover classic CLI output after resize * feat(skills): add shop-app personal shopping assistant (optional) (#20702) Port Shop.app's upstream SKILL.md (https://shop.app/SKILL.md) into optional-skills/productivity/shop-app/ with Hermes-native adaptations: - Proper Hermes frontmatter (name, description<=60 chars, version, author, license, prerequisites, metadata.hermes tags + related_skills + homepage + upstream) - Swap Shop.app's bespoke 'message()' tool references for Hermes conventions: gateway adapters handle platform formatting, so the skill just writes markdown (no Telegram/WhatsApp/iMessage sections referencing a tool Hermes doesn't ship) - Name Hermes tools where relevant: curl via 'terminal', HTML policy pages via 'web_extract', try-on via 'image_generate' - Reframe session state as 'hold in your reasoning context for this conversation only' and forbid writing tokens to .env / disk — matches Hermes ephemeral-memory discipline - Drop NO_REPLY convention (Shop-app-runtime specific) - Trigger-first description so the skill loader picks it up when the user wants to search products, track orders, returns, or reorder * feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll, *.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly. * fix(cli): catch OSError in _resolve_attachment_path to prevent ENAMETOOLONG dropping long slash commands When the user pastes a long slash command like \`/goal <long prose>\` into \`hermes chat\`, the input flows into \`_detect_file_drop()\`, whose \`starts_like_path\` prefilter accepts anything starting with \`/\` and forwards it to \`_resolve_attachment_path()\`. That helper calls \`Path.exists()\` which invokes \`os.stat()\`, which raises \`OSError(errno=ENAMETOOLONG)\` — 63 on macOS, 36 on Linux — when the candidate exceeds NAME_MAX (typically 255 bytes). The OSError propagates up to the broad \`except Exception\` in \`process_loop\` (cli.py:11798), gets logged at WARNING level, and the user's input is silently dropped. From the user's POV the chat prompt hangs — the only signal is in agent.log: WARNING cli: process_loop unhandled error (msg may be lost): [Errno 63] File name too long: "/goal Drive the space board..." This affects any slash command with prose-length arguments — \`/goal\` in particular but also \`/skill\`, \`/cron\`, custom user commands. Fix: wrap the \`exists()\`/\`is_file()\` calls in try/except OSError so structurally-invalid path candidates cleanly return None. The slash- command dispatch path downstream (cli.py:11718) then handles the input correctly. Tests: two new regression cases in test_cli_file_drop.py cover the original \`/goal\` reproducer and a synthetic long path. All 35 file- drop tests pass. Reproducer (without the fix): python -c "from cli import _detect_file_drop; _detect_file_drop('/goal ' + 'a'*300)" → OSError: [Errno 63] File name too long * chore(release): map cleo@edaphic.xyz → curiouscleo Follow-up to the salvaged fix for /goal ENAMETOOLONG drop — adds AUTHOR_MAP entry so the release script resolves the commit author to the correct GitHub user. * docs(wsl2): expand Windows (WSL2) guide — filesystem, networking, services, pitfalls (#20748) Replaces the 22-line stub with a ~320-line guide covering the parts of the Windows/WSL2 split that specifically affect Hermes users: - Why WSL2 (and not native Windows) - Install: distro choice, WSL1→2, systemd via /etc/wsl.conf - Filesystem boundary: /mnt/c vs \\wsl$, perf/perms/watchers/case, wslpath/wslview, CRLF + git core.autocrlf, clone-where guidance - Networking in both directions: - WSL → Windows services: links to the canonical WSL2 Networking section in integrations/providers.md (mirrored mode, NAT + host IP, bind addr, firewall) instead of duplicating - Windows/LAN → Hermes in WSL: mirrored vs NAT, netsh portproxy one-liner, firewall rule, webhook tunneling pointer - Long-running services: systemd gateway + Task Scheduler wsl.exe --exec 'sleep infinity' to keep the VM alive at login - GPU passthrough: NVIDIA works, AMD/Intel out of matrix - Common pitfalls: connection refused, /mnt/c slowness, CRLF ^M, UNC warnings, post-sleep clock drift, mirrored-mode DNS with VPN, PATH, Defender scanning, VHDX disk reclaim All internal links use site-absolute /docs/... form (matches the rest of user-guide/); all seven link targets verified to exist. * docs: pluggable surfaces coverage — model-provider guide, full plugin map, opt-in fix (#20749) * docs(providers): add model-provider-plugin authoring guide + fix stale refs New docs: - website/docs/developer-guide/model-provider-plugin.md — full authoring guide (directory layout, minimal example, ProviderProfile fields, overridable hooks, user overrides, api_mode selection, auth types, testing, pip distribution) - Wired into website/sidebars.ts under 'Extending' - Cross-references added in: - guides/build-a-hermes-plugin.md (tip block) - developer-guide/adding-providers.md - developer-guide/provider-runtime.md User guide: - user-guide/features/plugins.md: Plugin types table grows from 3 to 4 with 'Model providers' row Stale comment cleanup (providers/*.py → plugins/model-providers/<name>/): - hermes_cli/main.py:_is_profile_api_key_provider docstring - hermes_cli/doctor.py:_build_apikey_providers_list docstring - hermes_cli/auth.py: PROVIDER_REGISTRY + alias auto-extension comments - hermes_cli/models.py: CANONICAL_PROVIDERS auto-extension comment AGENTS.md: - Project-structure tree: added plugins/model-providers/ row - New section: 'Model-provider plugins' explaining discovery, override semantics, PluginManager integration, kind auto-coerce heuristic Verified: docusaurus build succeeds, new page renders, all 3 cross-links resolve. 347/347 targeted tests pass (tests/providers/, tests/hermes_cli/test_plugins.py, tests/hermes_cli/test_runtime_provider_resolution.py, tests/run_agent/test_provider_parity.py). * docs(plugins): add 'pluggable interfaces at a glance' maps to plugins.md + build-a-hermes-plugin Devs landing on either the user-guide plugin page or the build-a-plugin guide now get an upfront table of every distinct pluggable surface with a link to the right authoring doc. Previously they'd have to read the full general-plugin guide to discover that model providers / platforms / memory / context engines are separate systems. user-guide/features/plugins.md: - New 'Pluggable interfaces — where to go for each' section below the existing…

…verrides Salvage follow-up for PR NousResearch#20344: - AUTHOR_MAP entry for rob-maron (required by CI) - 17 parametrized tests covering _is_arcee_trinity_thinking, _fixed_temperature_for_model Trinity override, and _compression_threshold_for_model, including sibling-model negatives (trinity-large-preview, trinity-mini) and the OpenRouter slug form.

arcee temperature + compression

31df148

alt-glitch added type/feature New feature or request comp/agent Core agent loop, run_agent.py, prompt builder provider/arcee Arcee AI P3 Low — cosmetic, nice to have labels May 5, 2026

rob-maron requested a review from teknium1 May 5, 2026 18:20

teknium1 mentioned this pull request May 6, 2026

feat(arcee): Trinity Large Thinking temperature + compression overrides #20473

Merged

teknium1 closed this in #20473 May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arcee temperature + compression#20344

Arcee temperature + compression#20344
rob-maron wants to merge 1 commit into
NousResearch:mainfrom
rob-maron:arcee-temp-compress

rob-maron commented May 5, 2026

Uh oh!

teknium1 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rob-maron commented May 5, 2026

Uh oh!

teknium1 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants