feat(kanban): durable multi-profile collaboration board + dashboard GUI + dispatcher daemon#16100
Closed
teknium1 wants to merge 22 commits into
Closed
feat(kanban): durable multi-profile collaboration board + dashboard GUI + dispatcher daemon#16100teknium1 wants to merge 22 commits into
teknium1 wants to merge 22 commits into
Conversation
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for worker and orchestrator profiles. SQLite-backed task board (~/.hermes/kanban.db) shared across all profiles on the host. Zero changes to run_agent.py, no new core tools, no tool-schema bloat. Motivation: delegate_task is a function call — sync fork/join, anonymous subagent, no resumability, no human-in-the-loop. Kanban is the durable shape needed for research triage, scheduled ops, digital twins, engineering pipelines, and fleet work. They coexist (workers may call delegate_task internally). What this adds - hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution, dispatcher, workspace resolution, worker-context builder. - hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash() entry point used by both CLI and gateway. - skills/devops/kanban-worker — how a profile should work a claimed task. - skills/devops/kanban-orchestrator — "you are a dispatcher, not a worker" template with anti-temptation rules. - /kanban slash command wired into cli.py and gateway/run.py. Bypasses the running-agent guard (board writes don't touch agent state), so /kanban unblock can free a stuck worker mid-conversation. - Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns; 4 user stories; implementation plan; concurrency correctness. - Docs: website/docs/user-guide/features/kanban.md, CLI reference updated, sidebar entry added. Architecture highlights - Three planes: control (user + gateway), state (board + dispatcher), execution (pool of profile processes). - Every worker is a full OS process, spawned as `hermes -p <profile>`. No in-process subagent swarms — solves NanoClaw's SDK-lifecycle failure class. - Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale claims reclaimed 15 min after their TTL expires. - Tenant namespacing via one nullable column — one specialist fleet can serve many businesses with data isolation by workspace path. Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution, dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass hermetic via scripts/run_tests.sh.
The /kanban CLI + slash command are enough to run the board headlessly, but triage and cross-profile supervision want a visual board. Document the design as a dashboard plugin that: - reads live state from kanban.db over a WebSocket on task_events (no polling) - writes through run_slash() so CLI/gateway/GUI cannot drift - mounts under /api/plugins/kanban/ following the existing 'Extending the Dashboard' plugin shape The plugin is strictly a thin layer over kanban_db — no new business logic, nothing to merge into the kernel.
Ships plugins/kanban/dashboard/ as a bundled dashboard plugin. No core changes — uses the standard dashboard plugin contract (manifest.json + dist/index.js + plugin_api.py) documented in 'Extending the Dashboard'. What the tab gives you: - One column per kanban status (todo / ready / running / blocked / done; archived behind a toggle), column counts, coloured status dots. - Cards with id, title, priority badge, tenant tag, assignee, comment/link counts, 'created N ago'. - HTML5 drag-drop between columns — status change routes through the same kanban_db code the CLI /kanban verbs use, so the three surfaces (CLI, gateway, dashboard) can never drift. - Inline create per-column (title, assignee, priority). - Side drawer on card click: description, status action row (→ ready / → running / block / unblock / complete / archive), dependency links, comment thread with Enter-to-submit, last 20 events. - Toolbar: search, tenant filter, assignee filter, show-archived, nudge-dispatcher (skip the 60s wait), refresh. - Live updates via WebSocket tailing task_events — the board reflects CLI or gateway actions in real time. REST surface under /api/plugins/kanban/: GET /board, GET /tasks/:id, POST /tasks, PATCH /tasks/:id, POST /tasks/:id/comments, POST /links, DELETE /links, POST /dispatch, WS /events. Every handler is a thin wrapper around kanban_db — no new business logic. Visually theme-aware: the plugin CSS reads only --color-*, --radius, --font-mono etc. so it reskins with whichever dashboard theme is active. Tests (tests/plugins/test_kanban_dashboard_plugin.py, 16 tests): - empty board shape - create + appears in ready column with tenant/assignee rollups - tenant filter - detail includes parents/children/events - 404 on unknown task - PATCH status: complete / block / unblock / ready drag-drop / running - PATCH reassign, priority, edit, invalid-status rejection - POST comment (plus empty-body rejection) - POST link + DELETE link + cycle rejection - POST dispatch (dry run) All 76 kanban tests pass under scripts/run_tests.sh. Docs: website/docs/user-guide/features/kanban.md gains a full 'Dashboard (GUI)' section covering install, architecture, REST surface, live-updates mechanism, extending, and scope boundary.
Follows up on the initial dashboard plugin with the items called out
during self-review — ships the GUI-reality claims the PR body made,
closes the WebSocket auth gap, and lands the 'Triage' status the design
spec's Fusion-style screenshot leads with.
Kernel changes
- kanban_db.VALID_STATUSES gains 'triage'. status is TEXT without a
CHECK constraint so no schema migration is needed.
- create_task(triage=True) forces the initial status to 'triage'
regardless of parents, and parent ids are still validated so the
eventual link rows don't dangle. recompute_ready() only promotes
'todo' -> 'ready', so triage tasks are naturally isolated from the
dispatcher pipeline.
- hermes kanban create gains --triage.
Patterns table (docs) gains P9 'Triage specifier'.
Plugin backend (plugins/kanban/dashboard/plugin_api.py)
- GET /board now auto-init's kanban.db on first read (idempotent).
A fresh install shows an empty board instead of 'failed to load'.
- GET /board returns a new 'progress' field per task — {done, total}
of child-task completion, or None if the task has no children.
- BOARD_COLUMNS prepends 'triage'.
- POST /tasks accepts {triage: bool}; PATCH /tasks/:id accepts
{status: 'triage'}.
- WebSocket /events now requires ?token=<session_token> as a query
param — browsers can't set Authorization on a WS upgrade, so this
matches the pattern the in-browser PTY bridge uses. Constant-time
compare against hermes_cli.web_server._SESSION_TOKEN. In bare-test
contexts (no dashboard module) the check no-ops so the tail loop
stays testable. Security boundary documented in the module header
and in website/docs/user-guide/features/kanban.md.
Plugin UI (plugins/kanban/dashboard/dist/index.js + style.css)
- Adds the Triage column (lilac dot) with helper text
'Raw ideas — a specifier will flesh out the spec'. Inline-create
from the Triage column parks new tasks in triage.
- Status action row in the drawer gains '→ triage'.
- Progress pill (N/M) on cards that have children. Full-complete
state tints the pill green.
- 'Lanes by profile' toolbar toggle — sub-groups the Running column
by assignee so you see at a glance which specialist is busy on
what.
- Destructive status moves (done / archived / blocked) via drag-drop
OR via the drawer action row now prompt for confirmation.
- Escape closes the drawer.
- Live-update reloads are debounced (250ms) so a burst of
task_events triggers one refetch, not N.
- WebSocket includes ?token= built from window.__HERMES_SESSION_TOKEN__.
- WebSocket reconnect uses exponential backoff capped at 30s, not
a fixed 1.5s spin loop, and surfaces a user-visible error on
code-1008 (auth rejected) instead of reconnecting forever.
- ErrorBoundary wraps the page — a bad card render shows a
'rendering error, reload view' card instead of crashing the tab.
Tests (tests/plugins/test_kanban_dashboard_plugin.py, +5 tests = 21)
- empty-board shape now asserts all 6 columns including 'triage'
- create_triage_lands_in_triage_column
- triage_task_not_promoted_to_ready (dispatcher bypasses triage)
- patch_status_triage_works (both into triage and out of it)
- board_progress_rollup (0/2 -> 1/2 -> childless cards = None)
- board_auto_initializes_missing_db
- ws_events_rejects_when_token_required (three sub-assertions:
missing → 1008, wrong → 1008, correct → handshake accepted)
All 82 kanban tests pass under scripts/run_tests.sh.
Docs
- kanban.md 'What the plugin gives you' fully rewritten to match
shipped reality (triage, progress pill, assignee lanes,
destructive-confirm, Escape-close, debounce).
- New 'Security model' subsection documents the explicit-plugin-
route-bypass, the WS token requirement, and the --host 0.0.0.0
warning; also notes that kanban.db is profile-agnostic on purpose
(the coordination primitive) so cross-profile visibility is
expected.
- CLI command reference shows --triage.
- Collaboration patterns table adds P9 'Triage specifier'.
The dashboard plugin gets the last layer of features that turn it from a
'usable read surface with drag-drop' into a 'full kanban UI' — no more
'drop to CLI to do X' moments from inside the tab.
Plugin backend
- POST /tasks/bulk — apply the same patch (status / archive / assignee
/ priority) to every id in the request body. Each id runs
independently: one bad id reports {ok: false, error: ...} without
aborting siblings. Status transitions that aren't legal for the
current state are surfaced per-id ('transition to done refused').
Used by the multi-select bulk action bar.
- GET /config — returns the dashboard.kanban section of config.yaml
(default_tenant, lane_by_profile, include_archived_by_default,
render_markdown) with sensible defaults when the section is absent.
Loaded once by the SPA to preselect filters and toggle markdown
rendering.
- _conn() helper — every handler now goes through it, calling
kanban_db.init_db() (idempotent) before every connection. Fresh
installs work whether the first hit is GET /board, POST /tasks, or
any other endpoint — no more 'no such table: tasks' when the CLI
or a script hits the plugin before the dashboard has ever loaded.
Plugin UI (plugin bundle, +~12 KB)
- Multi-select: per-card checkbox; shift/ctrl-click also toggles
without opening the drawer. A BulkActionBar appears above the
columns with batch → ready / complete / archive / reassign
(profile dropdown + unassign option). Destructive batches confirm
first. Partial failures from the backend are surfaced inline.
- Drawer inline editing:
- Click the title → TitleEditor swaps in an input, Enter saves,
Escape cancels.
- Click the Assignee meta row → AssigneeEditor input (empty string
unassigns).
- Click the Priority meta row → PriorityEditor numeric input.
- New 'edit' button on Description → full-width textarea; Save /
Cancel switch back to rendered view.
- Dependency editor: chip list of parents + children with per-chip
× button (calls DELETE /links). Add-parent / add-child dropdowns
filter out self + already-linked tasks so you cannot re-add a
duplicate edge or a self-loop. Cycle rejections from the server
surface cleanly via the existing error banner.
- Parent selection in InlineCreate: new dropdown listing every task
on the board ('{id} — {title}') — picking one sends parents=[id]
with the create payload, so the task lands in todo (or triage if
created from the Triage column) with the dependency wired up.
- Safe markdown rendering for description, comment bodies, and
result. A small in-bundle renderer handles headings, bold, italic,
inline code, fenced code, bullet lists, and http(s)/mailto links.
Every substitution runs on HTML-escaped input (no raw HTML), links
get target=_blank + rel=noopener,noreferrer. Disabled by config
key dashboard.kanban.render_markdown=false (falls back to <pre>).
- Touch drag-drop: attachTouchDrag() installs a pointerdown handler
that spawns a drag proxy, tracks elementFromPoint under the finger,
and dispatches a hermes-kanban:drop CustomEvent on the column when
released. Desktop continues to use native HTML5 DnD. Columns
listen for both.
- ErrorBoundary already present from the prior commit catches any
renderer throw; markdown escape + touch-proxy cleanup both have
their own try/finally.
Tests (tests/plugins/test_kanban_dashboard_plugin.py — 90/90 pass)
- bulk_status_ready: 3 tasks blocked, batch → ready, all move
- bulk_archive hides all ids from default board
- bulk_reassign changes every assignee
- bulk_unassign_via_empty_string sets assignee back to None
- bulk_partial_failure_doesnt_abort_siblings: bogus id in middle,
good siblings still get priority=7
- bulk_empty_ids_400
- config_returns_defaults_when_section_missing
- config_reads_dashboard_kanban_section (writes config.yaml, verifies
every key round-trips)
Live smoke (real FastAPI app + isolated HERMES_HOME):
- /config without section returns defaults
- /config with dashboard.kanban section returns the configured values
- POST /tasks as the first-ever request (no prior /board) succeeds —
auto-init handles it
- Link add + remove via POST /links + DELETE /links round-trip
- Bulk priority bump on 2 ids, both get priority=5
- Bulk archive hides ids from default board
- PATCH {title, body} updates the task, markdown source survives
the round trip
- POST /tasks {triage: true, parents: [id]} lands in triage, not todo
- Bulk partial: 2 good + 1 bogus returns per-id outcome
Docs (website/docs/user-guide/features/kanban.md)
- 'What the plugin gives you' rewritten to reflect bulk, drawer
edit, dep editor, parent-on-create, markdown, touch drag-drop.
- New 'Dashboard config' subsection with a YAML example for
dashboard.kanban.*.
- REST table gains /tasks/bulk and /config rows.
4 tasks
… logs, notify, bulk, stats
Eliminates every 'known broken on day one' item in the core functionality
audit. The board is now self-driving (daemon, not cron), self-healing
(crash detection, spawn-failure circuit breaker), and self-reporting
(logs, stats, gateway notifications).
Dispatcher
- New `hermes kanban daemon` long-lived loop with --interval, --max,
--failure-limit, --pidfile, --verbose, signal-clean shutdown
(SIGINT/SIGTERM via threading.Event). A kb.run_daemon() entry point
lets tests drive it inline without subprocess.
- `hermes kanban init` now prints the dispatcher setup hint so users
don't leave the board off-by-default. Ships a systemd user unit at
plugins/kanban/systemd/hermes-kanban-dispatcher.service.
- Removed the old 'add this to cron' doc path. Cron runs agent
prompts (LLM cost per tick) — unacceptable for a per-minute
coordination loop.
Worker aliveness / safety
- Spawn returns the child's PID; dispatcher stores it on the task row
and calls detect_crashed_workers() every tick. If the PID is gone
but the claim TTL hasn't expired, the task drops back to ready with
a 'crashed' event. Host-local only — cross-host PIDs are ignored
per the single-host design.
- Spawn-failure circuit breaker: after N consecutive spawn_failed
events on the same task (default 5), the dispatcher auto-blocks
with the last error as the reason. Success resets the counter.
Workspace-resolution failures count against the same budget.
- Log rotation: _rotate_worker_log trims at 2 MiB, keeps one
generation (.log.1), bounds per-task disk usage at ~4 MiB.
Idempotency / dedup
- create_task(idempotency_key=...) returns the existing non-archived
task id for retried webhooks. --idempotency-key on the CLI, json
body field on the dashboard plugin. Archived tasks don't block a
fresh create with the same key.
CLI surface
- Bulk verbs: complete, unblock, archive accept multiple ids;
block accepts --ids for sibling blocks with the same reason.
- New verbs: daemon, watch (live event tail filtered by
assignee/tenant/kinds), stats, log, notify-subscribe,
notify-list, notify-unsubscribe.
- dispatch gains --failure-limit + crashed/auto_blocked columns in
JSON output and human-readable output.
- gc accepts --event-retention-days / --log-retention-days; prunes
task_events for terminal tasks and old log files.
Gateway integration
- New GatewayRunner._kanban_notifier_watcher: polls
kanban_notify_subs every 5s, pushes ✔/⏸/✖ messages to subscribed
chats for completed/blocked/spawn_auto_blocked/crashed events.
Cursor-advanced per-sub; auto-removed when the task reaches
done/archived. Runs alongside the session expiry and platform
reconnect watchers — SQLite work in asyncio.to_thread so the
event loop never blocks.
- /kanban create in the gateway auto-subscribes the originating
chat (platform + chat_id + thread_id). Users see
'(subscribed — you'll be notified when t_abcd completes or
blocks)' appended to the response.
Dashboard plugin
- GET /stats returns board_stats (by_status, by_assignee,
oldest_ready_age_seconds).
- GET /tasks/:id/log returns the worker log with optional ?tail=N
cap. 404 on unknown task, exists=false when the task has never
spawned.
- POST /tasks accepts idempotency_key; both Pydantic body and the
create_task kwarg now round-trip.
- /board attaches task.age (created/started/time_to_complete in
seconds) so the UI can colour stale cards without recomputing.
- Card CSS: amber border after N minutes, red border when clearly
stuck (tier per status: running 10m/60m, ready 1h/24h, todo
7d/30d, blocked 1h/24h).
- Drawer: new Worker log section, auto-loads on mount, last 100 KB
cap with on-disk path surfaced when truncated.
Kernel
- Schema additions: tasks.idempotency_key, tasks.spawn_failures,
tasks.worker_pid, tasks.last_spawn_error; new
kanban_notify_subs table. All gated by _migrate_add_optional_columns
so legacy DBs upgrade cleanly.
- release_stale_claims / complete_task / block_task now all clear
worker_pid so crash detection doesn't false-positive on reclaimed
tasks.
- read_worker_log fixed: tail-skip no longer eats one-giant-line
logs (common with child processes that don't flush newlines
before dying).
Tests (tests/hermes_cli/test_kanban_core_functionality.py, 28 new)
- Idempotency: same key returns existing, archived doesn't block,
no key never collides
- Circuit breaker: auto-blocks after limit, success resets counter,
workspace-resolution failure counts against budget
- Aliveness: _pid_alive helper, detect_crashed_workers reclaims
exited child
- Daemon: runs and stops cleanly via stop_event, survives a tick
exception
- Stats + task_age helpers
- Notify subs: CRUD, cursor advances, distinct-thread is a separate row
- GC: events-only-for-terminal-tasks, old worker logs deleted
- Log: rotation keeps one generation, read_worker_log tail
- CLI: bulk complete/archive/unblock/block, create with
--idempotency-key, stats --json, notify-subscribe+list, log
missing task, gc reports counts
- run_slash parity: smoke-tests every registered verb (23
invocations); none may raise or return empty string
Full kanban test suite: 234/234 pass under scripts/run_tests.sh
(60 original + 30 dashboard plugin + 28 new core + 116 command
registry). Live smoke covers /stats, idempotency, age, log endpoint
with and without content, log?tail= truncation signal, 404 on unknown
task.
Docs (website/docs/user-guide/features/kanban.md)
- 'Core concepts' rewritten: new statuses (triage), idempotency key,
dispatcher-as-daemon-not-cron with circuit breaker behaviour
documented.
- Quick start swapped to daemon. New systemd section covers user
service install.
- New sections: idempotent create, bulk verbs, gateway
notifications, out-of-scope single-host note (kanban.db is local;
don't expect multi-host).
- CLI reference updated for every new verb, every new flag.
| doesn't exist. If ``tail_bytes`` is set, only the last N bytes are | ||
| returned (useful for the dashboard drawer which shouldn't page megabytes).""" | ||
| path = worker_log_path(task_id) | ||
| if not path.exists(): |
| return None | ||
| try: | ||
| if tail_bytes is None: | ||
| return path.read_text(encoding="utf-8", errors="replace") |
| try: | ||
| if tail_bytes is None: | ||
| return path.read_text(encoding="utf-8", errors="replace") | ||
| size = path.stat().st_size |
| if tail_bytes is None: | ||
| return path.read_text(encoding="utf-8", errors="replace") | ||
| size = path.stat().st_size | ||
| with open(path, "rb") as f: |
| raise HTTPException(status_code=404, detail=f"task {task_id} not found") | ||
| content = kanban_db.read_worker_log(task_id, tail_bytes=tail) | ||
| log_path = kanban_db.worker_log_path(task_id) | ||
| size = log_path.stat().st_size if log_path.exists() else 0 |
| raise HTTPException(status_code=404, detail=f"task {task_id} not found") | ||
| content = kanban_db.read_worker_log(task_id, tail_bytes=tail) | ||
| log_path = kanban_db.worker_log_path(task_id) | ||
| size = log_path.stat().st_size if log_path.exists() else 0 |
…er, event vocab cleanup Ports four items from the Multica audit (https://github.com/multica-ai/multica). Dropped their cross-host server/daemon architecture and their Postgres+pgvector skill search — both the wrong shape for our single-host SQLite kernel. 1. Per-task max-runtime (`max_runtime_seconds` column) - New kernel function `enforce_max_runtime(conn)` runs in every dispatch tick. When a running task's elapsed time exceeds the cap, we SIGTERM the worker, wait a 5 s grace (polling _pid_alive), then SIGKILL. The task goes back to 'ready' with a `timed_out` event and re-queues on the next tick (unless the spawn-failure circuit breaker has already parked it). - Host-local only: lock prefix must match this host's claimer_id so we never signal a PID on another machine. - CLI: `hermes kanban create --max-runtime 30m | 2h | 1d | <seconds>`. New `_parse_duration` helper accepts s/m/h/d suffixes or bare integers. - Dashboard POST body + the card's `max_runtime_seconds` field. 2. Worker heartbeat (`last_heartbeat_at` column, `heartbeat` event) - `heartbeat_worker(conn, task_id, note=None)` emits the event and touches last_heartbeat_at. Refused when the task isn't running. - CLI: `hermes kanban heartbeat <id> [--note "..."]`. - kanban-worker skill instructs workers to heartbeat during long loops (training runs, encodes, crawls, batch uploads). - Separate signal from PID crash detection: a worker's Python can still be alive while the actual work process is stuck. Heartbeat absence is diagnostic; future work can auto-block on stale heartbeats but v1 just surfaces the signal. 3. Assignee enumeration (`known_assignees`, `list_profiles_on_disk`) - Scans ~/.hermes/profiles/ for dirs containing config.yaml + unions with current assignees on the board. Each entry returns {name, on_disk, counts: {status: n}}. - CLI: `hermes kanban assignees [--json]`. Also hooked into `hermes kanban init` which now prints discovered profiles so new installs see 'these are the assignees you can target' immediately. - Dashboard: GET /api/plugins/kanban/assignees for the picker. 4. Event vocab cleanup (three renames + three new kinds) - `ready` → `promoted` (fires when deps clear; clearer semantic). - `priority` → `reprioritized` (past-tense verb, matches others). - `spawn_auto_blocked` → `gave_up` (short, memorable; the circuit breaker gave up on this task). - New: `spawned` (emitted with {pid} on successful spawn), `heartbeat` ({note?}), `timed_out` ({pid, elapsed_seconds, limit_seconds, sigkill}). - One-shot migration in `_migrate_add_optional_columns` renames legacy rows in-place on init_db(), so existing DBs upgrade cleanly. - Gateway notifier's TERMINAL_KINDS set updated; timed_out gets its own ⏱ message template, gave_up renamed from 'auto-blocked'. - Plugin_api.py's two 'priority' emit sites renamed to 'reprioritized'. - Documented in a new 'Event reference' section in kanban.md, grouped into three clusters (lifecycle / edits / worker telemetry) with payload shapes. Tests (+18 in tests/hermes_cli/test_kanban_core_functionality.py, 136/136 pass): - max_runtime_terminates_overrun_worker: real SIGTERM flow with _pid_alive stub, verifies event payload + state reset. - max_runtime_none_means_no_cap: unbounded tasks aren't timed out. - create_task_persists_max_runtime. - enforce_max_runtime_integrates_with_dispatch: kernel-level + dispatch_once chaining. - heartbeat_on_running_task + heartbeat_refused_when_not_running. - cli_heartbeat_verb with --note round-trip. - recompute_ready_emits_promoted_not_ready. - spawn_failure_circuit_breaker_emits_gave_up. - spawned_event_emitted_with_pid. - migration_renames_legacy_event_kinds (injects old rows, re-runs init_db, asserts rename). - list_profiles_on_disk (tmp_path + config.yaml filter). - known_assignees_merges_disk_and_board (profiles on disk + board assignees + per-status counts). - cli_assignees_json. - parse_duration_accepts_formats (s/m/h/d/float). - parse_duration_rejects_garbage. - cli_create_max_runtime_via_duration (2h → 7200). - cli_create_max_runtime_bad_format_exits_nonzero. Live smoke: POST /tasks with max_runtime_seconds round-trips; /assignees returns the union of on-disk + board-assigned names; PATCH priority produces 'reprioritized' events (not 'priority'); board cards expose max_runtime_seconds + last_heartbeat_at. Docs (website/docs/user-guide/features/kanban.md): - New 'Event reference' section with three-cluster table (lifecycle / edits / worker telemetry) + payload shapes. - CLI reference updated for --max-runtime, heartbeat, assignees. - Gateway notifications section updated for the new TERMINAL_KINDS. Not ported from Multica (deliberate, documented in the out-of-scope section already): Postgres+pgvector skill search (heavy deps conflict with SQLite kernel), server+daemon cross-host model (we're single-host on purpose), first-class agent identity with threaded comments (we keep the board profile-agnostic).
…compat for v2 workflows Addresses vulcan-artivus's RFC review on issue #16102. Picks up the structural changes that are expensive to retrofit later and zero-cost to land now; defers workflow-template routing + per-stage lanes to v2 (kept forward-compat hooks in the schema). Kernel - New `task_runs` table. Each claim opens a run (pid, claim_lock, heartbeat, max_runtime, started_at), each terminal transition closes it with an outcome (completed / blocked / crashed / timed_out / spawn_failed / gave_up / reclaimed). Multiple rows per task when retries happen, preserving full attempt history. - `tasks.current_run_id` points at the active run (NULL when idle); denormalised for cheap reads. - `task_events.run_id` carries the run a given event belongs to so UIs group events by attempt. claim/spawned/complete/block/crash/ timeout/spawn_fail/gave_up/heartbeat events are all run-scoped; created/promoted/assigned/edited stay task-scoped (run_id=NULL). - Legacy DBs: migration adds the columns + indexes + synthesizes a run row for any task that's 'running' before the runs table existed, so subsequent complete/heartbeat/reclaim calls have a target. Idempotent. Structured handoff - `complete_task(summary=, metadata=)` persists both on the closing run. `summary` falls back to `result` when omitted so single-run callers don't duplicate. `metadata` is a free-form dict ({changed_files, tests_run, findings, ...}). - `build_worker_context` rewrites: now reads "Prior attempts on this task" (closed runs: outcome, summary, error, metadata) and "Parent task results" pulls run.summary + run.metadata of the most-recent completed run per parent, falling back to task.result for legacy rows without runs. Retrying workers see why earlier attempts failed; downstream workers see parent handoffs structurally, not as loose `result` strings. CLI - `hermes kanban complete <id> --summary "..." --metadata '{"files":1}'`. JSON is parsed and rejected with exit-2 if malformed. - New `hermes kanban runs <id> [--json]` verb. Shows per-run rows: outcome, profile, elapsed, summary, error. JSON mode serializes the full run dataclass for scripting. Dashboard plugin - GET /tasks/:id now carries a runs[] array alongside task / events / comments / links. Each run serialised with outcome, summary, metadata, worker_pid, elapsed fields. - New Run History section in the drawer. Outcome-coloured left border (green=active, blue=completed, amber=reclaimed, red=crashed/timed_out/gave_up/blocked). Collapsed when >3 runs with a '+N earlier' toggle. Shows summary + error + metadata inline. Forward-compat for v2 (vulcan's workflow templates + stages) - `tasks.workflow_template_id` and `tasks.current_step_key` added as nullable columns. v1 kernel ignores them for routing; v2 will add workflow_templates + workflow_steps tables and wire the dispatcher to consult them. task_runs has a matching `step_key` column. Lets a v2 release land additively without another schema migration. Tests (+22 in test_kanban_core_functionality.py, +2 in dashboard) - run_created_on_claim / run_closed_on_complete_with_summary - run_summary_falls_back_to_result - multiple_attempts_preserved_as_runs (3 attempts: reclaimed → crashed → completed, all visible in list_runs) - run_on_block_with_reason / run_on_spawn_failure_records_failed_runs (5 spawn_failed runs + 1 gave_up run) - event_rows_carry_run_id (task-scoped vs run-scoped split) - build_worker_context_includes_prior_attempts - build_worker_context_uses_parent_run_summary (metadata JSON in context) - migration_backfills_inflight_run_for_legacy_db (simulates a pre-migration running task, re-runs init_db, asserts backfill) - forward_compat_columns_writable - cli_runs_verb + cli_runs_json - cli_complete_with_summary_and_metadata (JSON round-trip through shlex + argparse) - cli_complete_bad_metadata_exits_nonzero - task_detail_includes_runs / task_detail_runs_empty_before_claim 269/269 kanban suite pass under scripts/run_tests.sh. Live-smoke covered: single-attempt complete → run closed + summary persisted; retry scenario → two runs visible (blocked + completed); parent run summary + metadata surfaced to child via build_worker_context; forward-compat columns writable via UPDATE; GET /tasks/:id returns runs[]. Docs - New 'Runs — one row per attempt' section in kanban.md: the why (full attempt history, structured metadata), the two-table model (task is logical, run is execution), the structured handoff shape (--summary / --metadata), example CLI + dashboard output, forward-compat note for v2. - Event reference updated to mention task_events.run_id. - CLI reference gains 'hermes kanban runs <id>'. Not in v1 (deferred to v2): - Workflow templates (workflow_templates + workflow_steps tables, stage-based routing, success/failure step links). - 'stage' as a distinct axis from status in the UI. - Shared-by-default workspace binding across stages of the same workflow run. - Pipeline replacement for the kanban-orchestrator skill (the orchestrator's 'decompose, don't execute' guidance is still correct; it becomes partly redundant once workflows land).
…direct-status / drag-drop Integration audit of the runs-as-first-class work (0146cb2) found five bugs where structured runs got orphaned or dashboard parity was missing. All behavioral fixes; no schema change needed. Kernel - archive_task: when called on a running task, now closes the in-flight run with outcome='reclaimed' and clears current_run_id. Previously, dashboard bulk-archive or CLI `kanban archive <running>` would leave the task_runs row open with ended_at=NULL forever and strand the pointer. Adds the claim_lock / claim_expires / worker_pid clearing to the UPDATE so the task row is clean too. - complete_task: embeds the first-line handoff summary in the `completed` event payload (capped at 400 chars). Notifier can now render `✔ task done — <title>\n<summary>` without a second SQL hit, and the full summary still lives on the run row. Dashboard plugin - _set_status_direct: drag-drop OFF 'running' (to 'ready', 'todo', 'triage', 'done' — anywhere except back to 'running') now closes the active run with outcome='reclaimed'. Clears worker_pid too. Snapshots previous status + current_run_id before the UPDATE so the decision has the right before-state. status event rows now carry run_id when closing a run, NULL otherwise. - UpdateTaskBody: adds `summary` and `metadata` fields. PATCH /tasks/:id with status='done' now forwards them to complete_task, giving the dashboard parity with `hermes kanban complete --summary ... --metadata ...`. Previously these fields only existed on the CLI. CLI - `hermes kanban complete a b c --summary X` or `--metadata Y`: refused with a clear stderr message instead of silently applying the same handoff to every task. Bulk-close without handoff flags still works. (Note: hermes_cli.main discards subcommand exit codes via `args.func(args)` without propagating; tracked separately. Side-effect check is the real guard.) Gateway notifier - Completion message prefers run.summary (carried in event payload) over task.result. task.result remains the fallback for legacy rows written before runs shipped. - Docstring: renamed stale `spawn_auto_blocked` reference to `gave_up` / `timed_out` — matches the actual TERMINAL_KINDS tuple, which was already correct in code. Tests (+8 in core functionality, +3 in dashboard plugin) - archive_of_running_task_closes_run - archive_of_ready_task_does_not_create_spurious_run - dashboard_direct_status_change_off_running_closes_run - dashboard_direct_status_change_within_same_state_is_noop_for_runs - cli_bulk_complete_with_summary_rejects (side-effect assertion) - cli_bulk_complete_without_summary_still_works - completed_event_payload_carries_summary - completed_event_payload_summary_none_when_missing - patch_status_done_with_summary_and_metadata - patch_status_done_without_summary_still_works (legacy path) - patch_status_archive_closes_running_run (E2E through FastAPI TestClient) 164/164 kanban suite pass under scripts/run_tests.sh. Live smoke (execute_code with isolated HERMES_HOME) covered all five fixed paths plus a re-claim-after-drag-drop to confirm the fresh run is tracked correctly after the orphan close.
…aimed-on-status-change, completed event carries summary - Runs section: dashboard PATCH parity (summary/metadata forward), `completed` event embeds first-line summary for notifiers, bulk --summary/--metadata refused, archive/drag-drop reclaim semantics. - Event reference: added Payload column to Lifecycle and Edits tables; called out the invariant that `status` carries run_id when closing a reclaimed run.
…, invariant recovery, live drawer refresh
Second integration audit covering surfaces the first pass didn't hit.
Found eight issues spanning kernel, dashboard frontend, notifier, and CLI.
All behavioral / UX fixes; no schema change.
Kernel
- complete_task on a never-claimed task (ready/blocked → done with no
run in flight) was silently dropping the summary/metadata/result
onto a non-existent run. Now synthesizes a zero-duration run
(started_at == ended_at) so attempt history is complete. Only
fires when there's actually handoff data to persist — bare
complete_task(tid) remains a no-op for run creation.
- block_task on a never-claimed task had the same bug for --reason.
Same fix: synthesize a zero-duration run when a reason is passed.
- Event dataclass gained a `run_id: Optional[int] = None` field.
list_events, unseen_events_for_sub, and the dashboard _event_dict
were all SELECTing the column but dropping it on the way out,
so downstream consumers couldn't group events by attempt. Every
read path now surfaces run_id.
- claim_task got a defensive invariant-recovery step: if somehow
`current_run_id` is non-NULL on a task in 'ready' status (invariant
violation from an unknown code path), close the leaked run as
'reclaimed' inside the same txn as the new claim. No-op in the
common case; belt-and-suspenders in case a future code path forgets
to clear the pointer.
Dashboard
- GET /tasks/:id events array now carries run_id per event (via
_event_dict).
- WebSocket /events SELECT now includes run_id in the pushed event
payload.
- TaskDrawer reloads itself on live events for its own task id. New
`taskEventTick[taskId]` state in the Board, incremented on every
WS event, passed down as `eventTick` prop; drawer's useEffect
depends on it. Previously, background workers completing a task
the user was viewing left the drawer showing stale data until
manual close/reopen.
- CSS: added `.hermes-kanban-run--ended` rule for the fallback class
the JS emits when outcome is unset. Harmless before; just
inconsistent.
CLI
- `hermes kanban watch --kinds` help text listed the legacy event
name `spawn_auto_blocked`. The kernel migration renames it to
`gave_up`, so users typing the documented name got zero matches.
Now shows the current lexicon (`completed,blocked,gave_up,
crashed,timed_out`).
Tests (+6 in core functionality, +1 in dashboard plugin)
- complete_never_claimed_task_synthesizes_run
- block_never_claimed_task_synthesizes_run
- complete_never_claimed_without_handoff_skips_synthesis
- event_dataclass_carries_run_id (created.run_id None, completed.run_id matches)
- unseen_events_for_sub_includes_run_id (notifier path)
- claim_task_recovers_from_invariant_leak (engineer the leak, verify recovery)
- event_dict_includes_run_id (dashboard API shape)
171/171 kanban suite pass under scripts/run_tests.sh. Live-smoke (isolated
HERMES_HOME via execute_code) exercised all six fixed paths plus the
claim-after-leak recovery sequence.
Docs
- Runs section: new 'Synthetic runs for never-claimed completions'
and 'Live drawer refresh' paragraphs explaining the invariants.
- Event reference: `created` / `promoted` / `unblocked` entries now
explicitly note `run_id` is `NULL`; `completed` / `blocked`
describe synthetic-run fallback.
… runs[]
Found during full-stack live-test of the kanban system: two bugs where
the kernel and CLI didn't match the documented contract.
Kernel
- `connect()` now auto-runs schema creation + migrations on the
first connection to a given DB path, matching its docstring
('Open (and initialize if needed)' — which was aspirational until
now). Module-level _INITIALIZED_PATHS cache keeps subsequent
connects cheap. Previously the docstring lied: every path that
went through connect() on a fresh HERMES_HOME raised 'no such
table: tasks' — only `hermes kanban init` and `daemon` triggered
schema creation.
- `init_db()` always re-runs the migration pass (clears the cache
entry first). Callers that know the on-disk schema may have
drifted — tests writing legacy event kinds, external tools
upgrading an old DB file — can force re-migration.
CLI
- `kanban_command()` entry point auto-inits the DB before
dispatching any subcommand. Idempotent; the underlying
connect-based init pattern makes this a one-line SELECT against
sqlite_master after the first call.
- `hermes kanban show --json` now includes:
- `runs`: full attempt history (id, profile, step_key, status,
outcome, summary, error, metadata, worker_pid, started_at,
ended_at)
- `run_id` on every event object
Dashboard API already had both; CLI was behind. Now scripts that
inspect a task can use `show --json` alone.
- `hermes kanban show` (human-readable) prints a Runs section
matching the `runs` subcommand's format, and each Event line
prefixes its run_id when present. Makes attempt attribution
visible at a glance without a second command.
Tests (+3)
- cli_create_on_fresh_home_auto_inits (subprocess, no init_db call,
must succeed — covers the most common first-user path)
- connect_auto_inits_fresh_db (direct kernel use without init_db)
- cli_show_json_carries_runs (runs[] present; events carry run_id)
174/174 kanban suite pass under scripts/run_tests.sh.
Live-tested end-to-end
- Phases 1-6: CLI subprocess (create/claim/complete/bulk-guard/
synthetic-run/reclaim-via-TTL/multi-attempt history)
- Phases 7-10: Dashboard FastAPI TestClient (POST/PATCH with
summary+metadata, drag-drop running->ready, archive-while-
running, mark-done-with-handoff)
- Phases 11-13: Dispatcher with stub spawn_fn (3 tasks success,
3 failures -> gave_up circuit breaker, event.run_id attribution
across retries)
- Phases 14-15: WebSocket /events (run_id payload, auth rejects
wrong tokens with code 1008)
- Phase 16: Gateway notifier (unseen_events_for_sub returns
run_id on events, message renders 'done — title' + handoff
summary from event payload, crashed event path, sub cleanup)
Every surface — kernel, dispatcher, CLI, dashboard REST, dashboard
WebSocket, gateway notifier — exercised end-to-end against a live
FastAPI app with a real SQLite DB in an isolated HERMES_HOME. All
passed.
New website/docs/user-guide/features/kanban-tutorial.md walks four
user stories end-to-end, each backed by a real screenshot of the
dashboard running against seeded data.
Stories
1. Solo dev shipping a feature (parent->child dependencies,
structured handoff, run history rendering).
2. Fleet farming (parallel independent tasks across 3 assignees,
lanes-by-profile grouping, dispatcher daemon).
3. Role pipeline with retry (PM spec -> eng implements -> review
blocks -> eng retries -> review approves; two-run history
visible in the drawer; downstream workers pull parent
summary+metadata).
4. Circuit breaker + crash recovery (2 spawn_failed + 1 gave_up
for a deploy with missing creds; 1 crashed + 1 completed for
an OOM-killed migration that recovered on retry).
Each story shows both CLI commands and the dashboard drawer
equivalent. Screenshots captured via playwright + chromium at 2x
device scale, then repalettized with PIL (22MB -> 6.1MB for the
10-image set, no visible quality loss verified against vision).
Side updates
- website/sidebars.ts: added kanban-tutorial under features.
- website/docs/user-guide/features/kanban.md: prefix banner
linking new readers to the tutorial before the reference.
All image references validate: `/img/kanban-tutorial/*` maps to
website/static/img/kanban-tutorial/ (10 files). Docusaurus build
not run locally (no node_modules in worktree); CI build on merge
will confirm.
Six concrete bugs + two cheap v2 extensions from the review at #16102 (comment) Larger items (structured comments as session substrate, taxonomy reorg) deferred to v2 with reply posted on the issue. Pre-merge bug fixes - unblock_task: close any stale current_run_id pointer with a reclaimed run inside the unblock txn. Defensive; the invariant holds under current data paths (block_task already closes the run) but a future or external write that leaves the pointer dangling would otherwise persist across the ready->blocked-> ready cycle. Mirrors the same pattern in claim_task + archive_task. - Migration backfill: wrap the in-flight backfill loop in write_txn and add a CAS guard (`current_run_id IS NULL`) on the pointer UPDATE, with a cleanup path that marks any orphan run row reclaimed if the CAS fails. Prevents races against a concurrent dispatcher between SELECT and INSERT. - Notifier sub leak on non-done terminals: unsub on the last delivered event's kind being terminal (completed / blocked / gave_up / crashed / timed_out), not just on task.status in (done, archived). blocked / gave_up / crashed / timed_out used to fire one ping then strand the subscription row forever. - Notifier thrashes dead chats: per-subscription send-failure counter keyed on (task_id, platform, chat_id, thread_id). After 3 consecutive adapter.send exceptions, drop the sub automatically. Counter resets on any successful send. Daemon ops visibility - run_daemon on_tick now tracks consecutive ticks where the ready queue is non-empty but 0 spawns succeeded. After 6 such ticks (default ~30s at interval=5), emits a WARN line to stderr pointing at profile health (venv, PATH, credentials) and `hermes kanban list --status blocked`. Rate-limited to one message per 5 minutes so a persistent outage doesn't spam logs. v2 extensions shipped in scope (pure upside) - build_worker_context: new "Recent work by @assignee" section surfacing the 5 most-recent completed runs for the current task's assignee (excluding this task). Bounded, cached by the natural LIMIT, no new dependencies. Skipped when the task has no assignee. - Gateway notifier message prefix: terminal pings now lead with `@<assignee>` so fleets (one chat subscribing to many tasks with different workers) stay legible at a glance. One-line template change. Deferred to v2 (noted in reply to erosika) - recompute_ready full-scan starvation at 10k+ tasks: dirty-set approach is a real refactor; fine as follow-up. - Skill ↔ assignee validation for routing: depends on skill introspection surface that isn't nailed down. - Structured comments (in_reply_to / addressed_to / kind) as multi-peer session substrate: schema-affecting, exactly the v2-scope design vulcan flagged shouldn't cram into this PR. - Pattern vs mechanism taxonomy split in docs: pure docs reorg, low urgency. Tests (+6 in core functionality) - unblock_invariant_recovery (engineered leak, defensive close) - unblock_normal_path_no_spurious_run (no run created on happy block->unblock; erosika's main concern) - migration_backfill_idempotent_under_re_run (3x init_db on a legacy-shape DB yields exactly 1 run row, not 3) - build_worker_context_includes_role_history (role continuity) - build_worker_context_role_history_skipped_when_no_assignee - build_worker_context_role_history_bounded_to_5 180/180 kanban suite pass under scripts/run_tests.sh. Live-smoke exercised all three kernel fixes end-to-end with isolated HERMES_HOME.
Added tests/stress/ — an opt-in suite that stresses the kernel with
real concurrency, real subprocesses, random property fuzzing, and scale
benchmarks. Exposed two real bugs the 4 audit passes missed.
Bugs found and fixed
- _pid_alive returned True for zombie processes. os.kill(pid, 0)
succeeds against zombies (process table entry exists until parent
reaps), so a worker that exited normally but wasn't yet reaped
would look alive to detect_crashed_workers indefinitely. Fixed by
peeking at /proc/<pid>/status on Linux and treating State: Z as
dead. No-op on other POSIX; Windows unchanged.
- Task ID generator used 2 hex bytes (65k space). By birthday
paradox, ~5% collision probability at 1k tasks, ~50% at 10k.
Would raise UNIQUE constraint errors on large boards without a
retry path. Bumped to 4 hex bytes (4.3B space). Surfaced by
benchmarks trying to seed 10k tasks.
Stress suite (tests/stress/, opt-in via --run-stress)
- test_concurrency.py — 5 processes race for 100 tasks. 1
lost-claim race observed, 0 double-claims, 0 orphan runs.
- test_concurrency_mixed.py — 10 workers + 1 reclaimer, 500
tasks, random ops (claim/complete/block/unblock/archive).
1518 events, 43 lost-claim races, zero invariant violations.
- test_concurrency_reclaim_race.py — TTL < work duration so the
reclaimer intentionally yanks tasks mid-work. 82 claims, 44
reclaimed, 50 completed, 32 complete_refused (CAS correctly
blocked late completes on reclaimed tasks).
- test_subprocess_e2e.py — dispatcher spawns real python
subprocess workers that heartbeat + complete via the CLI.
Also exercises crash detection against a real dead PID via
double-fork.
- test_property_fuzzing.py — 500 randomized sequences, ~40k
operations, 9 invariant checks after each step. Zero
violations.
- test_benchmarks.py — latency at 100/1k/10k tasks. Key numbers:
dispatch_once @ 10k = 4ms, recompute_ready @ 10k = 47ms,
list_tasks @ 10k = 50ms, build_worker_context w/ 50 parents
= 1ms. Erosika's 10k starvation flag is pessimistic by roughly
an order of magnitude.
Regression tests added to the main suite (+2)
- test_pid_alive_detects_zombie: /proc check against a real
zombified process (Linux-only, skipped elsewhere).
- test_task_ids_dont_collide_at_scale: 500 creates, verify
uniqueness + format.
New harness
- tests/stress/conftest.py skips the suite by default; opt in with
pytest --run-stress. Scripts are also __main__-runnable directly
since they were developed as standalone stress tests.
- tests/stress/_fake_worker.py: minimal Python worker that
exercises the real subprocess contract (reads HERMES_KANBAN_TASK,
heartbeats, completes via CLI).
- tests/stress/README.md explains opt-in usage.
Test count: 182/182 kanban suite still pass under scripts/run_tests.sh;
stress suite is additive and out-of-band.
…mp fix
Added tests/stress/test_atypical_scenarios.py — 28 scenarios covering
atypical user inputs and environments that the normal tests assume
away. Surfaced one real UI bug in the process.
Bug found and fixed
- CLI display of negative elapsed time when NTP jumps backward
between claim_task and complete_task. The kernel faithfully stores
started_at > ended_at; both CLI display sites (_cmd_show's Runs
section at line 685, _cmd_runs's table at line 1153) now clamp
elapsed to max(0, end - start), matching the dashboard JS which
already had the same Math.max(0, ...) clamp. Regression test
test_cli_show_clamps_negative_elapsed forces a future started_at
via raw UPDATE and verifies neither show nor runs prints a
`-<digits>s` token.
Atypical scenarios covered (all passing, 28 total)
Data:
- unicode_and_emoji: CJK, RTL (Hebrew/Arabic), ZWJ emoji sequences,
control chars, null bytes in titles + metadata
- huge_strings: 1 MB body + 1 MB summary + 50-level nested metadata
- sql_injection_attempts: 6 classic payloads, parameterized queries
hold across every string field
- newlines_in_summary: full preserved on run, first line in event
payload for notifier brevity
- malformed_metadata_via_cli: 4 bad JSON values each cleanly
rejected with stderr error, no partial task mutation
- empty_string_fields: empty title rejected, whitespace-only title
rejected, empty body accepted
- tenant_with_newlines: multiline tenant strings survive
board_stats
Dependency graphs:
- dependency_cycle: A→B→A refused
- self_parent: cannot depend on itself
- diamond_dependency: child promotes only when both parents done
- wide_fan_out: 500 children promoted in 4ms via complete_task's
internal recompute_ready
- wide_fan_in: 500 parents → 1 child, promotion gated correctly
- parent_in_different_status_states: child stays todo for parent
in ready/running/blocked/triage/archived; only 'done' unblocks
Workspace:
- workspace_path_traversal: dir: workspaces are intentionally
arbitrary paths (documented threat model)
- workspace_nonexistent_path: spawn_failure counter increments,
task returns to ready
Clock:
- clock_skew_start_greater_than_end: kernel stores faithfully,
CLI now clamps at display (see Bug above)
Filesystem:
- hermes_home_with_spaces: works
- hermes_home_with_unicode: works
- hermes_home_via_symlink: two symlinks to same dir share DB
(Path.resolve() in _INITIALIZED_PATHS)
Scale extremes:
- huge_run_count_on_one_task: 1000 runs → list_runs in <1ms,
build_worker_context 3ms/63KB (flagged: unbounded on
retry-heavy tasks; v2 cap candidate)
- hundred_tenants: 5000 tasks / 100 tenants → stats 1ms, list 26ms
- comment_storm: 1000 comments → 50KB worker context (same
unbounded-context flag)
Lifecycle:
- completed_task_reclaim_attempt: done tasks can't be
re-claimed/re-completed/blocked
- archived_task_resurrection_attempt: archived tasks invisible to
default list + all ops refused
- unassigned_task_never_claims: dispatcher skips, task untouched
Assignees:
- assignee_with_special_chars: @-signs, dots, CJK, emoji, 200-char
names, empty strings all handled
Concurrency:
- idempotency_key_race: two concurrent processes calling
create_task with same key both get back the SAME task id,
exactly one row in DB
Dashboard:
- dashboard_rest_with_weird_inputs: empty/huge/unicode titles,
unknown fields, type mismatches handled correctly (200 for
valid, 400/422 for invalid)
Observations flagged for v2 (not fixed, not blocking)
- build_worker_context is unbounded on retry-heavy tasks (1000 runs
→ 63KB) and comment-heavy tasks (1000 comments → 50KB). A
`--max-prior-attempts` / `--max-comments` cap would be appropriate.
- `dir:` workspace paths are intentionally arbitrary; docs should
note the threat model (trusted local user; path is stored
verbatim, not sandboxed).
183/183 main kanban suite pass. Stress suite still opted out by
default; enable with `pytest --run-stress` or run scripts directly.
Both items the atypical-scenarios pass flagged as "v2 follow-up"
actually belong in v1. Fixed now.
Fix 1: workspace path traversal
resolve_workspace now rejects non-absolute paths for all three
workspace_kinds (scratch-with-explicit-path, dir:, worktree). A
relative path like '../../../tmp/attacker' was being silently
resolved against the dispatcher's CWD — a confused-deputy escape.
Error message points users at the absolute-path requirement.
Storage remains verbatim (kernel doesn't rewrite user input);
the refusal happens at resolution time, so the dispatcher's
existing spawn-failure circuit breaker correctly categorizes it.
Threat model documented in website/docs/user-guide/features/kanban.md:
single-host, trusted-local-user. The absolute-path rule prevents
ambiguity-driven escape, not malicious access — kanban runs as you,
with your uid, on your filesystem.
Fix 2: build_worker_context unbounded
Added per-section caps so worker prompts stay bounded on pathological
boards:
_CTX_MAX_PRIOR_ATTEMPTS = 10 most-recent N runs shown; older
collapsed into "N earlier attempts
omitted" marker. Attempt numbering
preserved (shows "Attempt 16" not
renumbered).
_CTX_MAX_COMMENTS = 30 same pattern for comments.
_CTX_MAX_FIELD_BYTES = 4 KB per summary / error / metadata / result.
_CTX_MAX_BODY_BYTES = 8 KB per task.body (opening post).
_CTX_MAX_COMMENT_BYTES = 2 KB per comment.
Truncation uses a visible ellipsis + char-count so the worker knows
it's been truncated.
Effect on atypical-scenario runs:
huge_run_count_on_one_task (1000 runs): 63 KB → 820 chars
comment_storm (1000 comments): 50 KB → 1,671 chars
Tests (+6 in main suite)
test_resolve_workspace_rejects_relative_dir_path — relative dir:
path stored verbatim but refused at resolve.
test_resolve_workspace_accepts_absolute_dir_path — legitimate
absolute paths are created and returned.
test_resolve_workspace_rejects_relative_worktree_path — same guard
for worktree kind.
test_build_worker_context_caps_prior_attempts — 25 runs → exactly
_CTX_MAX_PRIOR_ATTEMPTS shown, omitted marker present,
attempt numbering preserves original index.
test_build_worker_context_caps_comments — 100 comments → 30 shown,
70 in the omitted marker.
test_build_worker_context_caps_huge_summary — 1 MB summary on a
prior run → context under 10 KB total, truncation marker visible.
189/189 kanban suite pass. Atypical-scenarios stress script still
passes all 28 scenarios with the new caps in effect.
Seven new tools in `tools/kanban_tools.py` that give kanban workers a
backend-portable, schema-filtered way to interact with the board from
inside their own Python process — no shelling out to `hermes kanban`.
Motivation
The CLI path (`hermes kanban complete \$TASK --summary ...`) breaks
on any remote terminal backend (Docker, Modal, Singularity, SSH).
The terminal tool runs `hermes kanban` inside the container, where
`hermes` isn't installed and `~/.hermes/kanban.db` isn't mounted.
Tools run in the agent's own Python process, so they always reach
the board regardless of backend. Also skips shell-quoting fragility
on --metadata JSON and gives structured error returns the model can
reason about.
The seven tools
kanban_show read current task (defaults to HERMES_KANBAN_TASK)
kanban_complete structured handoff: summary + metadata
kanban_block ask for human input
kanban_heartbeat signal liveness during long operations
kanban_comment append to task thread
kanban_create fan out into child tasks (orchestrator path)
kanban_link add parent→child dependency after the fact
Gating
Each tool's check_fn returns True iff HERMES_KANBAN_TASK is set in
the process env. The dispatcher sets it when spawning a worker;
normal `hermes chat` sessions never have it. Empirically verified:
a baseline hermes-cli schema is 27 tools; with HERMES_KANBAN_TASK
set it grows to exactly 34 (+7). Zero leak into normal sessions.
Also set HERMES_PROFILE in the spawn env so the kanban_comment tool's
author default works cleanly (it's what the tool reads to attribute
comments).
Skill updates
- `skills/devops/kanban-worker/SKILL.md`: lifecycle rewritten to use
kanban_show / kanban_heartbeat / kanban_block / kanban_complete /
kanban_comment / kanban_create directly. CLI fallback section
added for human operators / scripts.
- `skills/devops/kanban-orchestrator/SKILL.md`: all examples ported
from CLI to tool form; top-banner note explaining tools are the
primary surface. kanban_create / kanban_link throughout.
Docs
`website/docs/user-guide/features/kanban.md`:
new "How workers interact with the board" section explaining the
tool surface, gating mechanism, and why tools vs CLI. The worker
skill / orchestrator skill subsections are now nested under it.
Tests (+25 in tests/tools/test_kanban_tools.py)
- Schema gating: kanban_tools_hidden_without_env_var,
kanban_tools_visible_with_env_var.
- Happy paths: show (default + explicit task_id), complete (with
summary+metadata, with result only), block, heartbeat (with and
without note), comment (default + custom author), create (with
list parents, with string parent), link.
- Error paths: complete rejects no-handoff and non-dict metadata,
block rejects empty reason, comment rejects empty body, create
rejects no title / no assignee / non-list parents, link rejects
self-reference / missing args / cycles.
- End-to-end: full worker lifecycle driven entirely through the
tools, verified against DB state.
214/214 kanban suite pass under scripts/run_tests.sh.
…nce-only
Moves the kanban worker lifecycle out of the skills and into the system
prompt as a conditionally-injected guidance block, matching the
existing MEMORY_GUIDANCE / SESSION_SEARCH_GUIDANCE / SKILLS_GUIDANCE
pattern in `agent/prompt_builder.py`. Workers now know the full
lifecycle even if no skill was loaded; skills become the deeper
playbook for edge cases.
Why
Previously the 6-step lifecycle (orient → work → heartbeat → block/
complete → fan-out) only reached the model if the kanban-worker
skill was loaded. That's fragile: users configure skills-per-
platform, profiles can forget to include them, skill loading order
matters. The lifecycle is load-bearing for correct board behavior
— it belongs in the prompt path.
What
- New `KANBAN_GUIDANCE` constant in agent/prompt_builder.py (~3 KB):
identity framing ("You are a Kanban worker"), 6-step lifecycle
with exact tool-call shapes (`kanban_show()`, `kanban_complete(
summary=..., metadata=...)`, `kanban_block(reason=...)`, etc.),
orchestrator mode note, and explicit DO NOTs including "don't
shell out to `hermes kanban <verb>`".
- Wired into `AIAgent._build_system_prompt` gated on `kanban_show
in self.valid_tool_names`. Because `kanban_show`'s `check_fn`
gates on `HERMES_KANBAN_TASK` env var, this indirection
guarantees guidance appears iff a worker was dispatched. Zero
cost to normal sessions.
Gating verified live
Normal `hermes chat` session: 2329-char prompt, zero kanban content.
Worker session (HERMES_KANBAN_TASK set): 5272-char prompt, +2943
chars of KANBAN_GUIDANCE present. Regression test asserts the
header phrase and each tool name.
Skills reshape (both from ~6 KB lifecycles to ~7 KB references)
- `skills/devops/kanban-worker/SKILL.md`: drops the lifecycle
(now in the guidance block), keeps good-summary patterns,
block-reason examples, retry diagnostics, pitfalls, CLI
fallback cheatsheet. Adds explicit note that you don't need to
load the skill to work a task anymore.
- `skills/devops/kanban-orchestrator/SKILL.md`: drops the
"don't execute" rule (now in guidance), keeps the full
decomposition playbook, specialist roster with typical
workspaces, a concrete Postgres-migration example with
`kanban_create` + `parents=[...]` linking, and pitfalls
(reassignment vs new task, link argument order, tenant
inheritance).
Tests (+3)
- `test_kanban_guidance_not_in_normal_prompt`: verifies a regular
AIAgent with no env var produces a prompt that contains none of
the kanban lifecycle phrases.
- `test_kanban_guidance_in_worker_prompt`: spawns an AIAgent with
HERMES_KANBAN_TASK set, verifies the header + all 4 tool-call
examples + anti-shell guidance appear.
- `test_kanban_guidance_prompt_size_bounded`: sanity check that
the guidance is 1.5-4 KB so it doesn't balloon the cached prompt.
217/217 kanban + tools suite green; 303/303 run_agent suite green.
The guidance block sits alongside the other tool-gated blocks, so
prompt caching remains intact — the system prompt is stable across
every turn of a kanban worker's run.
`_default_spawn` now passes `--skills kanban-worker` in the child's argv, so every dispatched worker gets the skill loaded automatically regardless of the profile's default skill config. Why The system prompt carries the MANDATORY lifecycle (via KANBAN_GUIDANCE). The skill carries the deeper reference material: good summary/metadata patterns, retry diagnostics, block-reason examples, workspace handling, CLI fallback. Both are useful; requiring users to wire the skill into skills config per-profile is a footgun — they'd hit tasks where workers use the lifecycle correctly but not the patterns, producing weaker handoffs. Auto-loading makes kanban-worker the baseline. Profiles can still add more skills via the normal config; --skills is additive. Test New test_default_spawn_auto_loads_kanban_worker_skill intercepts subprocess.Popen to capture the argv without actually spawning a hermes subprocess. Asserts '--skills kanban-worker' appears in the cmd and the env still carries HERMES_KANBAN_TASK + HERMES_PROFILE. Skill note updated Worker skill's opening note changed from "you don't need to load this" to "you're seeing this because the dispatcher loaded it for you" — reflects the new always-loaded reality. 218/218 kanban + tools suite green.
Tasks can now pin extra skills to load into their worker alongside the built-in kanban-worker. Use cases: translation tasks that need a translation skill, review tasks that need github-code-review, security audits that need security-pr-audit — without editing the assignee's profile config. Changes: - tasks.skills column (JSON array), idempotent migration, Task.skills dataclass field. - create_task(skills=[...]) normalises (strip/dedupe), rejects commas in a single name. - _default_spawn emits one `--skills X` pair per task skill, in addition to the built-in `--skills kanban-worker`. Deduped against the built-in. - CLI: `hermes kanban create --skill <name>` (repeatable). - Tool: `kanban_create(skills=[...])` (accepts list or single string). - Dashboard: POST /tasks accepts `skills`; inline create form has a comma-separated skills input; drawer shows a Skills row. - `hermes kanban show` prints a skills row when present. - Docs: kanban.md has a new 'Pinning extra skills to a specific task' section; CLI reference shows `--skill`. - Tests: 16 new (kernel round-trip, dedupe, comma-rejection, dispatcher argv with multiple skills, built-in dedupe, CLI flag repeatable, CLI no-flag stays None, show renders skills, idempotent migration on legacy DB, tool list/string/non-list, dashboard REST round-trip, dashboard default empty list).
…alone daemon The kanban dispatcher is now embedded in the gateway process — 'gateway is the single dispatcher host' — removing the need for a separate `hermes kanban daemon` or systemd unit. Typical user path becomes: `hermes gateway start` + `hermes kanban create ...` and ready tasks run on the next tick (60s default). The cost is ~300µs per interval on an idle board; negligible. Changes: - config: new `kanban` section in DEFAULT_CONFIG with `dispatch_in_gateway: true` (default on) and `dispatch_interval_seconds: 60`. Additive — no \_config_version bump. - gateway/run.py: new `_kanban_dispatcher_watcher()` background task, symmetric with `_kanban_notifier_watcher`. Reads config at boot; exits cleanly if the flag is off. Runs each tick via `asyncio.to_thread` so the SQLite WAL lock never blocks the loop. Sleeps in 1s slices so shutdown is snappy. Health telemetry mirrored from `_cmd_daemon` — warns when the ready queue is non-empty for 6 consecutive ticks with 0 spawns. Env override `HERMES_KANBAN_DISPATCH_IN_GATEWAY=0` disables without editing config.yaml. - hermes_cli/kanban.py: `_cmd_daemon` becomes a deprecation stub — no `--force`, exits 2 with migration guidance pointing at `hermes gateway start`. With `--force` (undocumented, help=SUPPRESS) the old standalone loop still runs, for headless hosts that can't run the gateway. Help text marks the subcommand DEPRECATED. - hermes_cli/kanban.py: new `_check_dispatcher_presence()` helper — returns (running, human_message) by probing `gateway.status.get_running_pid` AND reading `kanban.dispatch_in_gateway`. Defensive: import/probe failures return (True, "") so we never cry wolf on partial installs. - `hermes kanban create` prints a stderr warning when the task lands in 'ready' with an assignee but no dispatcher will pick it up. Skipped in `--json` mode so machine-parseable output stays clean. Skipped for triage/unassigned tasks (they can't dispatch regardless). - `hermes kanban init` prints `hermes gateway start` as the next step (was: `hermes kanban daemon`). - plugin_api.py: POST /tasks response includes a `warning` field when the same probe returns not-running, so the dashboard UI can surface a banner. Probe errors are swallowed silently — must never break create. - dashboard dist/index.js: `createTask` threads the `warning` response field into the existing error-banner channel with 'Task created, but: ...'. - toolsets.py: "spawned by `hermes kanban daemon`" description updated to "spawned by the kanban dispatcher (gateway-embedded by default)". Matching change to the tools/kanban_tools.py docstring. - systemd unit: marked DEPRECATED in its Description + header comment, invokes the standalone daemon via the explicit `--force` flag so users who haven't migrated don't accidentally spawn duplicate dispatchers. - docs: kanban.md 'Running the dispatcher as a service' section rewritten to describe gateway-embedded dispatch as the default; the standalone systemd instructions are gone. kanban-tutorial.md 'Start the daemon' block replaced with `hermes gateway start`. CLI command reference table now marks `hermes kanban daemon` DEPRECATED. Tests (17 new): - DEFAULT_CONFIG has kanban.dispatch_in_gateway=True and a sane interval. - `_check_dispatcher_presence` returns running when gateway pid is found and flag is on; warns with `hermes gateway start` guidance when no gateway; warns with `dispatch_in_gateway` guidance when flag is off; silent on probe error. - `hermes kanban create` (non-JSON) warns on stderr when no gateway and task is ready+assigned; stays silent when gateway is up; never warns on triage or unassigned tasks. - `hermes kanban daemon` without --force prints DEPRECATED + exits 2. - Argparse help for `kanban daemon` contains the word DEPRECATED. - Gateway `_kanban_dispatcher_watcher` respects config flag=False (exits fast), respects HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 env override, and treats truthy env as 'defer to config' (not force-on). - Dashboard POST /tasks response carries `warning` when probe says no dispatcher; omits it when probe says running; skips probe on triage tasks; survives probe errors without breaking create. E2E verified in an isolated HERMES_HOME: watcher exits in 0.000s when flag is off, 0.000s on env override, gracefully stops within 3s of `_running=False` with the flag on. Integration issues fixed in the same pass: - Stale 'hermes kanban daemon --assignee' refs in kanban-tutorial.md (that flag never existed; just bad author copy). - Stale toolset description pointing at the retired daemon. - Stale docstring in tools/kanban_tools.py for `_check_kanban_mode`. - Systemd unit still invoking `hermes kanban daemon` without `--force`; now invokes with `--force` and is marked DEPRECATED.
This was referenced Apr 30, 2026
teknium1
added a commit
that referenced
this pull request
Apr 30, 2026
Salvage of PR #16100 onto current main (after emozilla's #17514 fix that unblocks plugin Pydantic body validation). History preserved on the standing `feat/kanban-standing` branch; this squashes the 22 iterative commits into one clean landing. What this lands: - SQLite kernel (hermes_cli/kanban_db.py) — durable task board with tasks, task_links, task_runs, task_comments, task_events, kanban_notify_subs tables. WAL mode, atomic claim via CAS, tenant-namespaced, skills JSON array per task, max-runtime timeouts, worker heartbeats, idempotency keys, circuit breaker on repeated spawn failures, crash detection via /proc/<pid>/status, run history preserved across attempts. - Dispatcher — runs inside the gateway by default (`kanban.dispatch_in_gateway: true`). Ticks every 60s, reclaims stale claims, promotes ready tasks, spawns `hermes -p <assignee> chat -q "work kanban task <id>"` with HERMES_KANBAN_TASK + HERMES_KANBAN_WORKSPACE env. Auto-loads `--skills kanban-worker` plus any per-task skills. Health telemetry warns on stuck ready queue. - Structured tool surface (tools/kanban_tools.py) — 7 tools (kanban_show, kanban_complete, kanban_block, kanban_heartbeat, kanban_comment, kanban_create, kanban_link). Gated on HERMES_KANBAN_TASK via check_fn so zero schema footprint in normal sessions. - System-prompt guidance (agent/prompt_builder.py KANBAN_GUIDANCE) injected only when kanban tools are active. - Dashboard plugin (plugins/kanban/dashboard/) — Linear-style board UI: triage/todo/ready/running/blocked/done columns, drag-drop, inline create, task drawer with markdown, comments, run history, dependency editor, bulk ops, lanes-by-profile grouping, WS-driven live refresh. Matches active dashboard theme via CSS variables. - CLI — `hermes kanban init|create|list|show|assign|link|unlink| claim|comment|complete|block|unblock|archive|tail|dispatch|context| init|gc|watch|stats|notify|log|heartbeat|runs|assignees` + `/kanban` slash in-session. - Worker + orchestrator skills (skills/devops/kanban-worker + kanban-orchestrator) — pattern library for good summary/metadata shapes, retry diagnostics, block-reason examples, fan-out patterns. - Per-task force-loaded skills — `--skill <name>` (repeatable), stored as JSON, threaded through to dispatcher argv as one `--skills X` pair per skill alongside the built-in kanban-worker. Dashboard + CLI + tool parity. - Deprecation of standalone `hermes kanban daemon` — stub exits 2 with migration guidance; `--force` escape hatch for headless hosts. - Docs (website/docs/user-guide/features/kanban.md + kanban-tutorial.md) with 11 dashboard screenshots walking through four user stories (Solo Dev, Fleet Farming, Role Pipeline, Circuit Breaker). - Tests (251 passing): kernel schema + migration + CAS atomicity, dispatcher logic, circuit breaker, crash detection, max-runtime timeouts, claim lifecycle, tenant isolation, idempotency keys, per- task skills round-trip + validation + dispatcher argv, tool surface (7 tools × round-trip + error paths), dashboard REST (CRUD + bulk + links + warnings), gateway-embedded dispatcher (config gate, env override, graceful shutdown), CLI deprecation stub, migration from legacy schemas. Gateway integration: - GatewayRunner._kanban_dispatcher_watcher — new asyncio background task, symmetric with _kanban_notifier_watcher. Runs dispatch_once via asyncio.to_thread so SQLite WAL never blocks the loop. Sleeps in 1s slices for snappy shutdown. Respects HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 env override for debugging. - Config: new `kanban` section in DEFAULT_CONFIG with `dispatch_in_gateway: true` (default) + `dispatch_interval_seconds: 60`. Additive — no \_config_version bump needed. Forward-compat: - workflow_template_id / current_step_key columns on tasks (v1 writes NULL; v2 will use them for routing). - task_runs holds claim machinery (claim_lock, claim_expires, worker_pid, last_heartbeat_at) so multi-attempt history is first- class from day one. Closes #16102. Co-authored-by: emozilla <emozilla@nousresearch.com>
Contributor
Author
nickdlkk
pushed a commit
to nickdlkk/hermes-agent
that referenced
this pull request
May 11, 2026
…#17805) Salvage of PR NousResearch#16100 onto current main (after emozilla's NousResearch#17514 fix that unblocks plugin Pydantic body validation). History preserved on the standing `feat/kanban-standing` branch; this squashes the 22 iterative commits into one clean landing. What this lands: - SQLite kernel (hermes_cli/kanban_db.py) — durable task board with tasks, task_links, task_runs, task_comments, task_events, kanban_notify_subs tables. WAL mode, atomic claim via CAS, tenant-namespaced, skills JSON array per task, max-runtime timeouts, worker heartbeats, idempotency keys, circuit breaker on repeated spawn failures, crash detection via /proc/<pid>/status, run history preserved across attempts. - Dispatcher — runs inside the gateway by default (`kanban.dispatch_in_gateway: true`). Ticks every 60s, reclaims stale claims, promotes ready tasks, spawns `hermes -p <assignee> chat -q "work kanban task <id>"` with HERMES_KANBAN_TASK + HERMES_KANBAN_WORKSPACE env. Auto-loads `--skills kanban-worker` plus any per-task skills. Health telemetry warns on stuck ready queue. - Structured tool surface (tools/kanban_tools.py) — 7 tools (kanban_show, kanban_complete, kanban_block, kanban_heartbeat, kanban_comment, kanban_create, kanban_link). Gated on HERMES_KANBAN_TASK via check_fn so zero schema footprint in normal sessions. - System-prompt guidance (agent/prompt_builder.py KANBAN_GUIDANCE) injected only when kanban tools are active. - Dashboard plugin (plugins/kanban/dashboard/) — Linear-style board UI: triage/todo/ready/running/blocked/done columns, drag-drop, inline create, task drawer with markdown, comments, run history, dependency editor, bulk ops, lanes-by-profile grouping, WS-driven live refresh. Matches active dashboard theme via CSS variables. - CLI — `hermes kanban init|create|list|show|assign|link|unlink| claim|comment|complete|block|unblock|archive|tail|dispatch|context| init|gc|watch|stats|notify|log|heartbeat|runs|assignees` + `/kanban` slash in-session. - Worker + orchestrator skills (skills/devops/kanban-worker + kanban-orchestrator) — pattern library for good summary/metadata shapes, retry diagnostics, block-reason examples, fan-out patterns. - Per-task force-loaded skills — `--skill <name>` (repeatable), stored as JSON, threaded through to dispatcher argv as one `--skills X` pair per skill alongside the built-in kanban-worker. Dashboard + CLI + tool parity. - Deprecation of standalone `hermes kanban daemon` — stub exits 2 with migration guidance; `--force` escape hatch for headless hosts. - Docs (website/docs/user-guide/features/kanban.md + kanban-tutorial.md) with 11 dashboard screenshots walking through four user stories (Solo Dev, Fleet Farming, Role Pipeline, Circuit Breaker). - Tests (251 passing): kernel schema + migration + CAS atomicity, dispatcher logic, circuit breaker, crash detection, max-runtime timeouts, claim lifecycle, tenant isolation, idempotency keys, per- task skills round-trip + validation + dispatcher argv, tool surface (7 tools × round-trip + error paths), dashboard REST (CRUD + bulk + links + warnings), gateway-embedded dispatcher (config gate, env override, graceful shutdown), CLI deprecation stub, migration from legacy schemas. Gateway integration: - GatewayRunner._kanban_dispatcher_watcher — new asyncio background task, symmetric with _kanban_notifier_watcher. Runs dispatch_once via asyncio.to_thread so SQLite WAL never blocks the loop. Sleeps in 1s slices for snappy shutdown. Respects HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 env override for debugging. - Config: new `kanban` section in DEFAULT_CONFIG with `dispatch_in_gateway: true` (default) + `dispatch_interval_seconds: 60`. Additive — no \_config_version bump needed. Forward-compat: - workflow_template_id / current_step_key columns on tasks (v1 writes NULL; v2 will use them for routing). - task_runs holds claim machinery (claim_lock, claim_expires, worker_pid, last_heartbeat_at) so multi-attempt history is first- class from day one. Closes NousResearch#16102. Co-authored-by: emozilla <emozilla@nousresearch.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hermes Kanban — durable multi-profile collaboration board
Summary
Adds a first-class multi-agent collaboration surface to Hermes: a durable SQLite-backed task board (
~/.hermes/kanban.db) shared across all profiles on the host, a long-lived dispatcher daemon that atomically claims ready tasks and spawns the assigned profile in an isolated workspace (with crash detection, a spawn-failure circuit breaker, max-runtime enforcement, and tick-level health telemetry), a/kanbanslash command wired into both the interactive CLI and the gateway (with auto-notify-back), and a bundled dashboard plugin that gives the board a Linear/Fusion-style drag-drop UI with multi-select, inline editing, dependency editing, markdown rendering, touch support, live WebSocket updates, per-run Run History, and a Worker Log drawer panel.Zero changes to
run_agent.py. No new core tools. No tool-schema bloat. Every worker is a full OS process — no in-process subagent swarms, no SDK-lifecycle fragility.Status: standing, ready for review
feat/kanban-standing, tip123f8d0fescripts/run_tests.shtests/stress/— ~40k randomized operations, real multi-process concurrency, real subprocess E2E, scale benchmarks — zero invariant violationsMotivation
delegate_taskis a synchronous function call: fork → join, anonymous subagent, no resumability, no human-in-the-loop. It's correct for short reasoning subtasks; it is the wrong shape for the workloads users asked for in the Nous Discord design thread:inbox-triage/ops-reviewprofiles with persistent memory over weeks.Kanban is the shape that covers all four user-story types. They coexist with
delegate_task: a kanban worker may calldelegate_taskinternally for reasoning within its own run. The single test: does this handoff need to outlive a single API loop and be visible to others?Design rationale, full comparison with Cline Kanban / Paperclip / NanoClaw / Google Gemini Enterprise, concurrency correctness argument, and implementation plan live in
docs/hermes-kanban-v1-spec.pdf.Architecture
Data model
tasks— the logical unit of work: title, body, assignee, status (triage/todo/ready/running/blocked/done/archived), tenant, priority, workspace, parents viatask_links, optionalidempotency_key, optionalmax_runtime_seconds,current_run_idpointer.task_runs— one row per attempt. Claim opens a run; terminal transitions close it with anoutcome(completed / blocked / crashed / timed_out / spawn_failed / gave_up / reclaimed). Carries the worker'ssummary,metadata,error, profile, step_key, started/ended timestamps, worker_pid. Multi-attempt history preserved forever.task_events— append-only event log carryingrun_idfor per-attempt grouping. Kinds cluster into Lifecycle (created, promoted, completed, blocked, unblocked, archived), Edits (assigned, edited, reprioritized, status), and Worker telemetry (claimed, spawned, heartbeat, reclaimed, crashed, timed_out, spawn_failed, gave_up).task_comments— freeform human-readable thread.task_links— parent → child dependency edges.kanban_notify_subs— gateway chat subscriptions for terminal-event notifications.Invariant (enforced and tested):
tasks.current_run_id IS NULL⇔ the run row for that task is in a terminal state. Holds across CLI, dashboard, dispatcher (crash/timeout/reclaim), archive, and the migration backfill. Verified across ~40k randomized operations in the property fuzzer.Forward-compat for v2: Nullable
tasks.workflow_template_id+tasks.current_step_key+task_runs.step_key. v1 kernel ignores them; v2 workflow routing can land additively without a schema migration.Collaboration patterns covered
Nine, all expressible with the base primitives (tasks + links + comments + assignee + workspace):
@mention— inline routing from prose./kanban herepins dir to the gateway thread.triage→ specifier expands body →todo.What shipped
Kernel (
hermes_cli/kanban_db.py)BEGIN IMMEDIATE.task_runsfirst-class (one row per attempt) with structured handoff (summary,metadata,error)./proccheck, spawn-failure circuit breaker (auto-blocks after N consecutive failures), max-runtime enforcement (SIGTERM → 5s grace → SIGKILL).scratch/dir:<path>/worktree); per-task log capture at~/.hermes/kanban/logs/<task>.logwith 2 MiB rotation.board_stats,task_age).add / list / remove / unseen_events_for_sub / advance_notify_cursor.build_worker_contextgives every worker: this task's prior attempts (retry continuity), parent tasks' completed run summaries + metadata (downstream handoff), and the same assignee's 5 most-recent completions on other tasks (cross-task role continuity).completeorblockis called on a never-claimed task with handoff data, sosummary/metadata/reasonis never silently dropped.claim_task/unblock_task/archive_task/ dashboard drag-drop.CLI (
hermes_cli/kanban.py)All verbs auto-init the DB on first use.
create / list / show / show --json (with runs[]) / claim / complete (with --summary --metadata) / block / unblock / archive / comment / link / unlink / heartbeat / runs / log / stats / assignees / dispatch / daemon / watch / tail / gc / init / notify-subscribe / notify-list / notify-unsubscribe. Bulk support oncomplete / unblock / archive / block --ids. Bulkcompletewith--summaryor--metadatais refused — copy-pasting the same handoff to N tasks is a footgun. Daemon emits health WARNs when the ready queue is non-empty AND no spawns succeeded for 6+ ticks.Gateway (
gateway/run.py)_kanban_notifier_watcherruns alongside_session_expiry_watcher. Delivers terminal events (completed / blocked / gave_up / crashed / timed_out) to subscribed chats, prefixed with@<assignee>for identity. Unsubscribes on the last delivered event's kind being terminal (not just on task status). Per-sub consecutive-send-failure counter drops dead chats after 3 failures./kanban createauto-subscribes the originating chat.Dashboard plugin (
plugins/kanban/)hermes dashboardgrows a Kanban tab.triage / todo / ready / running / blocked / done(+ archived toggle), lanes-by-profile in Running, staleness colouring, progress pills (N/M children done), inline create per column, drag-drop with touch fallback./eventswithrun_idper event; live refresh of the open drawer via per-task event ticks.GET /board,GET /tasks/:id(includesruns[]),POST /tasks(+ idempotency_key),PATCH /tasks/:id(withsummary+metadata),POST /tasks/bulk,POST /tasks/:id/comments,POST/DELETE /links,POST /dispatch,GET /config,GET /stats,GET /tasks/:id/log,WS /events?since=&token=.Skills
skills/devops/kanban-worker/SKILL.md— claim → work → heartbeat → complete/block workflow.skills/devops/kanban-orchestrator/SKILL.md— decompose-don't-execute guide.Docs
website/docs/user-guide/features/kanban.md— full reference: concepts, quickstart, systemd, architecture, data model, CLI, REST, event reference, invariants.website/docs/user-guide/features/kanban-tutorial.md— four user stories with 10 dashboard screenshots: solo dev, fleet farming, role-pipeline with retry, circuit-breaker + crash recovery.Engineering history — audits, reviews, fixes
Layered chronologically, most recent first. Every audit bug was real; every fix lands with regression tests.
Battle-test suite + 2 real bugs it found (commit 123f8d0)
Added
tests/stress/— opt-in suite that stresses the kernel with real concurrency, real subprocesses, randomized property fuzzing, and scale benchmarks. Exposed two bugs that four audit passes missed._pid_alivereturned True for zombie processes.os.kill(pid, 0)succeeds against zombies (process table entry exists until parent reaps), sodetect_crashed_workerswould treat a dead-but-unreaped worker as alive./proc/<pid>/statuson Linux, treatState: Zas dead.Stress suite results (all passing):
dispatch_once4ms,recompute_ready47ms,list_tasks50ms. Vulcan's + erosika's starvation concerns are an order of magnitude off.Suite is opt-in via
pytest --run-stress.Atypical-scenario stress + clock-skew fix (commit 63b7b6d)
Added
tests/stress/test_atypical_scenarios.py— 28 scenarios covering inputs and environments the normal tests assume away. Categories:HERMES_HOMEwith spaces / unicode / symlinks (two links to same dir share DB).Bug caught and fixed: CLI
showandrunsprinted negative elapsed time when NTP jumps backward between claim and complete. Both sites now clamp tomax(0, end - start), matching the dashboard JS which already had the same clamp. Regression test added.Follow-up in v1 (commit be184aa): Both items initially flagged as v2 — fixed immediately instead of deferred.
resolve_workspaceaccepted relativedir:paths, which resolved against the dispatcher's CWD (confused-deputy escape vector).build_worker_contextunbounded on retry-heavy (63 KB at 1000 runs) and comment-heavy (50 KB at 1000 comments) tasks.189/189 kanban suite pass (+6 regression tests for the caps and path guards).
189/189 main kanban suite pass.
Tool surface for worker + orchestrator agents (commit 832ecde)
Added a 7-tool structured surface (
kanban_show,kanban_complete,kanban_block,kanban_heartbeat,kanban_comment,kanban_create,kanban_link) so kanban workers can interact with the board from inside their Python process instead of shelling out tohermes kanban. Fixes the Docker/Modal/SSH backend portability gap the atypical-scenarios pass surfaced — CLI calls from inside a remote terminal backend fail becausehermesisn't in the container andkanban.dbisn't mounted. Tools run in the agent's process and always reach the DB.Zero schema footprint on normal sessions. Each tool's
check_fnreturns True only whenHERMES_KANBAN_TASKis set in the process env — which only happens when the dispatcher spawned this worker. Verified empirically:hermes chatsessions have 27 tools in schema, worker sessions have exactly 34 (+7). No leak either direction.The dispatcher sets
HERMES_KANBAN_TASK+HERMES_KANBAN_WORKSPACE+HERMES_PROFILEwhen spawning. The CLI, dashboard, and/kanbanslash command are unchanged — those are humans reaching the kernel directly, not the agent path.Skill updates:
kanban-workerandkanban-orchestratorrewritten to call the tools instead of CLI subprocess; CLI remains documented as fallback for human operators and scripts.25 new regression tests in
tests/tools/test_kanban_tools.pycovering gating, all 7 handler happy paths, error branches (missing args, non-dict metadata, empty strings, cycles, self-link), and a full E2E worker lifecycle driven purely through tools.214/214 kanban suite green.
System-prompt guidance + skill reshape (commit 1970bcf)
Moved the kanban worker lifecycle from the skills into a conditionally-injected system-prompt guidance block, matching the existing
MEMORY_GUIDANCE/SESSION_SEARCH_GUIDANCE/SKILLS_GUIDANCEpattern. Workers now know the full 6-step lifecycle even if no skill loaded — the lifecycle is load-bearing for correct board behavior and belongs in the prompt path, not the skill path.New
KANBAN_GUIDANCE(agent/prompt_builder.py) — ~3 KB covering identity framing, 6-step lifecycle with exact tool-call shapes, orchestrator-mode note, and explicit DO NOTs. Injected into_build_system_promptgated onkanban_show in valid_tool_names. Sincekanban_show'scheck_fngates onHERMES_KANBAN_TASK, a normalhermes chatsession sees none of it.Verified live:
Delta is exactly the
KANBAN_GUIDANCEblock. Zero cost to normal users.Skills reshaped from lifecycle-plus-references to references-only. Both now carry what doesn't fit in a system prompt: good
summary/metadatashape patterns, block-reason examples, retry diagnostics, the orchestrator's decomposition playbook with a concrete Postgres-migration walkthrough, pitfalls, CLI fallback cheatsheet. Loading the skill is no longer required to work a task correctly.+3 regression tests. 217/217 kanban+tools suite green; 303/303 run_agent suite green.
Auto-load kanban-worker skill on dispatch (commit c65c1dd)
_default_spawnnow includes--skills kanban-workerin the child's argv. Every dispatched worker gets the skill's pattern library (good summary/metadata shapes, retry diagnostics, block-reason examples) loaded automatically — no per-profile skill configuration required.--skillsis additive to the profile's defaults, so users can still add more skills on top.What lives where, finalized:
KANBAN_GUIDANCE— mandatory 6-step lifecycle, tool-call shapes, DO NOTskanban_*handlers with self-explanatory descriptionskanban-worker— deeper patterns, retry diagnostics, CLI fallbackZero of these show up in a normal
hermes chatsession. All three activate together when the dispatcher spawns a worker.Regression test intercepts
Popenin the spawn path and asserts--skills kanban-workerappears in the argv. 218/218 kanban + tools suite green.External review: @erosika pre-merge audit (commit a24c6e1)
Full reply on the issue: #16102 (comment)
Six bugs, all real, all fixed:
unblock_taskdidn't defensively close a stalecurrent_run_idpointer.claim_task/archive_task.write_txn; concurrent dispatcher could orphan runs.write_txn+ CAS guard on pointer UPDATE; orphan rows marked reclaimed if CAS fails.task.status in (done, archived), leaking subs onblocked/gave_up/crashed/timed_out.recompute_readyfull-scan would starve claim CAS at 10k todo tasks.Two cheap v2 extensions shipped in-scope:
build_worker_context— 5 most-recent completed runs for the current task's assignee, giving workers implicit continuity.@<assignee>.Deferred to v2: skill-aware routing, structured comments as multi-peer session substrate (asked erosika to open a separate RFC), pattern/mechanism taxonomy docs split.
Deep-scan pass 2 (commit e27c819)
Second integration audit covering frontend rendering, WebSocket plumbing, never-claimed-task handling, dataclass shape. Eight issues.
complete_taskon never-claimed task silently droppedsummary/metadata.block_taskon never-claimed task silently dropped--reason.Eventdataclass missingrun_idfield.GET /tasks/:idevents + WS/eventspayload omittedrun_id.TaskDrawerdidn't live-refresh on WS events for its own task.useEffectdepends on it.claim_taskdidn't guard against strandedcurrent_run_id.kanban watch --kindshelp listed legacyspawn_auto_blocked..hermes-kanban-run--endedfallback rule.First audit pass (commits 8ef2ae6 + 1c78f66)
Five bugs where runs got orphaned or the dashboard was a second-class citizen.
archive_taskon a running task orphaned the run.reclaimed, clears claim columns.runningleft the run open forever._set_status_directcloses via_end_run(outcome='reclaimed').PATCH /tasks/:idwithstatus=doneacceptedresultbut notsummary/metadata.UpdateTaskBodycarries them; forwarded tocomplete_task.hermes kanban complete a b c --summary Xsilently broadcast the same handoff.task.resultinstead ofrun.summary.complete_taskembeds first-line summary in event payload; notifier reads it. Zero extra SQL.Invariant
tasks.current_run_id IS NULL⇔ run row terminal now enforced across CLI, dashboard, dispatcher, archive.Runs as first-class + structured handoffs (commit 0146cb2, addresses @vulcan-artivus's review)
External review flagged that the one-assignee / one-workspace / one-attempt-per-row shape breaks on role-pipelines (PM → Eng → Review → loop → PR). Landed the high-value, schema-affecting pieces in v1 so the model doesn't ossify.
task_runstable (one row per attempt).--summary+--metadata;build_worker_contextreads them for downstream + retry workers.task_events.run_id.hermes kanban runs <id>verb.runningtask.Explicitly deferred to v2: workflow templates with success/failure step routing,
stageas a distinct axis, shared-by-default workspace across stages.Multica-inspired additions
After reading Multica's codebase (21.8k stars, Go/Next.js/Postgres task-tracker), ported four ideas that fit the single-host SQLite model; dropped their cross-host server + pgvector skill search.
~/.hermes/profiles/+ assigned names).ready → promoted,priority → reprioritized,spawn_auto_blocked → gave_up; newspawned,heartbeat,timed_outkinds.Core hardening
hermes kanban initprints daemon setup hint.hermes kanban watch.Tutorial
website/docs/user-guide/features/kanban-tutorial.mdwalks four user stories end-to-end with 10 dashboard screenshots:gave_upterminal block for missing creds; crashed-then-completed migration with chunked-strategy retry).Tests
All pass under
scripts/run_tests.sh(hermetic, 4 xdist workers, TZ=UTC, LANG=C.UTF-8).Stress suite (
tests/stress/, opt-in):test_concurrency.py— 5 workers, 100 tasks.test_concurrency_mixed.py— 10 workers + reclaimer, 500 tasks, random ops.test_concurrency_reclaim_race.py— adversarial TTL.test_subprocess_e2e.py— real Python subprocess workers.test_property_fuzzing.py— 500 sequences × ~80 ops = ~40k total operations, 9 invariants per step.test_benchmarks.py— latency at 100 / 1k / 10k tasks; results saved to JSON.Out of scope (deliberately)
routerprofile.tools/approval.py.kanban.dbis local SQLite; crash detection assumes host-local PIDs. Run independent boards per host and bridge viadelegate_task/ a message queue if needed.in_reply_to/addressed_to/kind) as multi-peer session substrate → v2 (erosika invited to open a separate RFC).Credits
User stories and design input from the Nous Discord design thread: Teknium, waxhy, A Real Icehole, Keimpe, LLM.STORE, caco, hunter_cat, djm, ionmanden, psbd, Aiz, Rikllo, sudo_relax, neo2k8.
P6borrowed from Google's Gemini CLI subagent@namesyntax;P7from psbd;P8from neo2k8.Pre-merge review by @vulcan-artivus (runs as first-class) and @erosika (six pre-merge bugs, two v2 seeds, structured-comments RFC seed).
Multica (https://github.com/multica-ai/multica) for the max-runtime / heartbeat / assignee-enumeration ideas.
Follow-ups shipped on this PR
Per-task force-loaded skills (commit dd83173)
Tasks now carry their own extra-skill list. The dispatcher emits one
--skills <name>flag per task skill alongside the built-in--skills kanban-worker, so a task can pin specialist context (e.g.translation,github-code-review,security-pr-audit) without theuser having to edit the assignee profile's skill config.
Surface parity on all 4 entry points:
hermes kanban create "..." --skill translation --skill github-code-review(repeatable)kanban_create(title=..., assignee=..., skills=["translation"])POST /api/plugins/kanban/tasks {"skills": ["translation"]}Schema changes are additive + idempotent:
tasks.skills TEXT(JSONarray), migrated via
_migrate_add_optional_columns. Existing rowsstay
skills=NULL(reads back asTask.skills is None). 16 newtests cover kernel round-trip, normalization (dedupe, strip,
comma-rejection), dispatcher argv ordering and built-in dedupe, CLI
flag repeatability, show output, migration idempotency, tool
list/string/non-list inputs, and dashboard REST round-trip.
Dispatcher runs in the gateway (commit 60e7567)
The dispatcher is now embedded in the gateway process.
hermes gateway startis the single entry point — no separatehermes kanban daemonor systemd unit required. Typical path:
hermes kanban create ...writes the row, the gateway's embedded dispatcher picks it up on the
next tick (60s default), worker spawns. The standalone daemon is
retired (deprecation stub + undocumented
--forceescape hatch forheadless hosts that can't run the gateway).
Config (additive; no version bump):
HERMES_KANBAN_DISPATCH_IN_GATEWAY=0disables at runtime withoutediting YAML.
Warnings surface through three channels when a task would sit idle
because no dispatcher is present:
hermes kanban createprints a⚠line to stderr (non-JSON mode only) withhermes gateway startguidancePOST /tasksresponse gains awarningfield that the UI threads into the existing error bannerTriage and unassigned tasks never warn (they wouldn't dispatch
anyway). Probe failures are silent so partial installs don't cry wolf.
Integration fixes in the same commit: stale "spawned by
hermes kanban daemon" phrasing intoolsets.py+ tool docstrings;tutorial's
hermes kanban daemon --assignee …block (a flag thatnever existed); systemd unit now carries
--force+ DEPRECATEDheader; docs
Running the dispatcher as a servicesection rewrittento describe the gateway-embedded path;
hermes kanban initnext-stephint updated.
17 new tests cover:
dispatch_in_gateway=True+ sane interval_check_dispatcher_presencereturns correct (running, message) forgateway-up, gateway-down, flag-off, and probe-error paths
createwarns on stderr for ready+assigned when no gateway;silent when up; never warns on triage/unassigned
hermes kanban daemonwithout--forceprints DEPRECATED + exits 2daemonDEPRECATED_kanban_dispatcher_watcherrespects config flag off(exits fast), respects env override, defers to config when env
truthy-but-not-false
warningcorrectly, omits when running,skips on triage, survives probe errors
E2E verified in isolated HERMES_HOME: watcher starts, ticks, and
exits within 3s of
_running=False.