Description
When the gateway restarts (or the Discord adapter reconnects), DiscordAdapter.connect() creates a new commands.Bot client but never closes the old one. Discord doesn't immediately terminate the old websocket, leaving two live connections for a window of time. Both connections receive every incoming message, resulting in two separate agent turns being spawned — each generating a different response.
Symptoms
- Every Discord message triggers two responses with different wording (not a duplicate of the same response)
- When
auto_thread is enabled: one response appears in the auto-created thread (correct), a second response appears directly in the parent channel (incorrect)
- Gateway log shows the same message arriving twice ~400ms apart:
inbound message: platform=discord user=X chat=Y msg='hello'
inbound message: platform=discord user=X chat=Y msg='hello' ← ~400ms later
- Only one gateway process is running (
ps aux confirms)
MessageDeduplicator exists and is correctly placed, but fails due to the race condition between two concurrent websocket deliveries
Root Cause
In gateway/platforms/discord.py, connect() unconditionally creates a new commands.Bot instance:
self._client = commands.Bot(
command_prefix="!",
intents=intents,
...
)
When connect() is called a second time (e.g. during reconnect in run.py line ~2848), the old self._client is orphaned — still connected to Discord's gateway — while the new client also connects. Both are alive simultaneously and both fire on_message for every event.
The MessageDeduplicator (per-adapter instance) cannot prevent duplicates because both websockets deliver the event independently, and the two on_message coroutines may check is_duplicate before either has marked the ID as seen (race condition).
Fix
Before creating the new Bot instance in connect(), close and await the old client if one exists:
# Add before: self._client = commands.Bot(...)
if self._client and not self._client.is_closed():
await self._client.close()
self._client = None
self._ready_event.clear()
This ensures only one Discord websocket connection is ever active for the adapter at any time.
Environment
- Hermes gateway running in Docker container
- Discord platform adapter
auto_thread: true in config (makes the symptom very visible — thread response + channel response)
- Triggered by any gateway restart or reconnect cycle
Workaround
Avoid gateway restarts. The zombie connection eventually times out on its own (~minutes), after which responses return to normal until the next restart.
Description
When the gateway restarts (or the Discord adapter reconnects),
DiscordAdapter.connect()creates a newcommands.Botclient but never closes the old one. Discord doesn't immediately terminate the old websocket, leaving two live connections for a window of time. Both connections receive every incoming message, resulting in two separate agent turns being spawned — each generating a different response.Symptoms
auto_threadis enabled: one response appears in the auto-created thread (correct), a second response appears directly in the parent channel (incorrect)ps auxconfirms)MessageDeduplicatorexists and is correctly placed, but fails due to the race condition between two concurrent websocket deliveriesRoot Cause
In
gateway/platforms/discord.py,connect()unconditionally creates a newcommands.Botinstance:When
connect()is called a second time (e.g. during reconnect inrun.pyline ~2848), the oldself._clientis orphaned — still connected to Discord's gateway — while the new client also connects. Both are alive simultaneously and both fireon_messagefor every event.The
MessageDeduplicator(per-adapter instance) cannot prevent duplicates because both websockets deliver the event independently, and the twoon_messagecoroutines may checkis_duplicatebefore either has marked the ID as seen (race condition).Fix
Before creating the new
Botinstance inconnect(), close and await the old client if one exists:This ensures only one Discord websocket connection is ever active for the adapter at any time.
Environment
auto_thread: truein config (makes the symptom very visible — thread response + channel response)Workaround
Avoid gateway restarts. The zombie connection eventually times out on its own (~minutes), after which responses return to normal until the next restart.