Skip to content

kanban dispatcher: macOS zombie detection is a no-op — _pid_alive returns True for defunct workers #20015

@BowmanStephen

Description

@BowmanStephen

Summary

_pid_alive() in hermes_cli/kanban_db.py only implements zombie detection on Linux (parsing /proc/<pid>/status for State: Z). On macOS, os.kill(pid, 0) returns success for defunct/zombie processes, so a worker that crashes immediately stays "alive" to the dispatcher until claim_expires times out (~15 min default).

Where

hermes_cli/kanban_db.py:2158-2173

The docstring at line 2136-2144 even admits this:

On Linux we additionally peek at /proc/<pid>/status and treat State: Z
as dead. On other POSIX or on Windows the zombie check is a no-op.

Reproduction

  1. Run the kanban dispatcher on macOS with a ~5 min cadence.
  2. Assign a worker a task that causes an immediate crash (e.g., require a skill it doesn't have, or a missing credential that triggers an unhandled exception at startup).
  3. os.kill(pid, 0) succeeds against the defunct process because the process table entry still exists.
  4. The dispatcher sees the worker as alive and does NOT re-queue the task until claim_expires (~15 min later).
  5. This creates a zombie-respawn loop where the dispatcher tries again every N minutes, gets the same crash, and the task stays stuck until manual SQL intervention.

Impact

Tasks stuck in running for up to 15 minutes on macOS, requiring manual sqlite3 surgery to break the loop. With a 5-minute dispatcher cadence and default claim_expires of 15 minutes, users see 3+ wasted spawn attempts per stuck task.

Suggested Fix

On Darwin, use proc_pidinfo(PROC_PIDTASKINFO) or kqueue with EVFILT_PROC to detect zombie state. A simpler fallback: check if the process group leader is still alive, or verify that proc_pidinfo's pti_status field is not 0.

Environment

  • macOS (any version)
  • Hermes Agent v0.11.0 (a7fb79e)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/pluginsPlugin system and bundled pluginstype/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions