Bug Description
hermes gateway restart fails when the old gateway process still holds the Weixin (WeChat) bot token while the new process tries to claim it.
Error Log
2026-04-29 08:26:25,601 INFO gateway.run: Received SIGTERM/SIGINT — initiating shutdown
2026-04-29 08:26:25,665 WARNING gateway.run: Shutdown diagnostic — other hermes processes running
2026-04-29 08:26:44,134 INFO gateway.run: Starting Hermes Gateway...
2026-04-29 08:26:44,359 ERROR gateway.platforms.base: [Weixin] Weixin bot token already in use (PID 30033). Stop the other gateway first.
2026-04-29 08:26:44,360 ERROR gateway.run: Gateway hit a non-retryable startup conflict: weixin: Weixin bot token already in use (PID 30033). Stop the other gateway first.
2026-04-29 08:26:44,361 ERROR gateway.run: Gateway exiting cleanly
Steps to Reproduce
- Start Hermes gateway with Weixin platform enabled
- Run
hermes gateway restart
- Observe: new gateway fails to start because old PID (30033) hasn't released the Weixin token yet
- Manual workaround:
hermes gateway stop && sleep 3 && hermes gateway start (succeeds)
Environment
- OS: macOS (Apple Silicon)
- Hermes version: latest (as of 2026-04-29)
- Platform: Weixin (WeChat) via iLink Bot API
- Architecture: gateway running as foreground process (not systemd service)
Root Cause
The restart command kills the old process via SIGTERM and immediately starts a new one, without waiting for the old process to fully exit. The old process takes ~18 seconds to release the Weixin token after receiving SIGTERM, creating a race condition window.
Suggested Fix
Add a wait / waitpid step between killing the old gateway and starting the new one. The restart should:
- Send SIGTERM to old process
waitpid() for old process to exit (with timeout)
- Only then start the new gateway process
This would eliminate the race condition entirely.
Workaround
hermes gateway stop
sleep 3
hermes gateway start
Bug Description
hermes gateway restartfails when the old gateway process still holds the Weixin (WeChat) bot token while the new process tries to claim it.Error Log
Steps to Reproduce
hermes gateway restarthermes gateway stop && sleep 3 && hermes gateway start(succeeds)Environment
Root Cause
The
restartcommand kills the old process via SIGTERM and immediately starts a new one, without waiting for the old process to fully exit. The old process takes ~18 seconds to release the Weixin token after receiving SIGTERM, creating a race condition window.Suggested Fix
Add a
wait/waitpidstep between killing the old gateway and starting the new one. The restart should:waitpid()for old process to exit (with timeout)This would eliminate the race condition entirely.
Workaround