Skip to content

gateway restart: race condition causes Weixin token conflict #17198

@Frankleee

Description

@Frankleee

Bug Description

hermes gateway restart fails when the old gateway process still holds the Weixin (WeChat) bot token while the new process tries to claim it.

Error Log

2026-04-29 08:26:25,601 INFO gateway.run: Received SIGTERM/SIGINT — initiating shutdown
2026-04-29 08:26:25,665 WARNING gateway.run: Shutdown diagnostic — other hermes processes running
2026-04-29 08:26:44,134 INFO gateway.run: Starting Hermes Gateway...
2026-04-29 08:26:44,359 ERROR gateway.platforms.base: [Weixin] Weixin bot token already in use (PID 30033). Stop the other gateway first.
2026-04-29 08:26:44,360 ERROR gateway.run: Gateway hit a non-retryable startup conflict: weixin: Weixin bot token already in use (PID 30033). Stop the other gateway first.
2026-04-29 08:26:44,361 ERROR gateway.run: Gateway exiting cleanly

Steps to Reproduce

  1. Start Hermes gateway with Weixin platform enabled
  2. Run hermes gateway restart
  3. Observe: new gateway fails to start because old PID (30033) hasn't released the Weixin token yet
  4. Manual workaround: hermes gateway stop && sleep 3 && hermes gateway start (succeeds)

Environment

  • OS: macOS (Apple Silicon)
  • Hermes version: latest (as of 2026-04-29)
  • Platform: Weixin (WeChat) via iLink Bot API
  • Architecture: gateway running as foreground process (not systemd service)

Root Cause

The restart command kills the old process via SIGTERM and immediately starts a new one, without waiting for the old process to fully exit. The old process takes ~18 seconds to release the Weixin token after receiving SIGTERM, creating a race condition window.

Suggested Fix

Add a wait / waitpid step between killing the old gateway and starting the new one. The restart should:

  1. Send SIGTERM to old process
  2. waitpid() for old process to exit (with timeout)
  3. Only then start the new gateway process

This would eliminate the race condition entirely.

Workaround

hermes gateway stop
sleep 3
hermes gateway start

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliveryplatform/wecomWeCom / WeChat Work adaptertype/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions