Skip to content

fix(slack): track assistant typing per-thread to unstick concurrent chats#24139

Open
briandevans wants to merge 2 commits into
NousResearch:mainfrom
briandevans:fix/slack-typing-per-thread-24117
Open

fix(slack): track assistant typing per-thread to unstick concurrent chats#24139
briandevans wants to merge 2 commits into
NousResearch:mainfrom
briandevans:fix/slack-typing-per-thread-24117

Conversation

@briandevans
Copy link
Copy Markdown
Contributor

Summary

  • A single Slack channel/DM can host multiple Assistant threads at once. The typing-indicator tracker was keyed only by chat_id, so a second concurrent send_typing overwrote the first thread's thread_ts and the earlier thread was left stuck in "is thinking…" forever.
  • This PR extends the per-chat tracker to a per-thread set, routes stop_typing by metadata["thread_ts"] when present, and also clears the indicator when send() fails.

The bug

gateway/platforms/slack.py tracked the active Assistant status thread per chat:

self._active_status_threads: Dict[str, str] = {}

send_typing(chat_id, metadata) wrote self._active_status_threads[chat_id] = thread_ts, so two overlapping Assistant requests in the same Slack channel/DM collided. stop_typing(chat_id, ...) then popped the dict and cleared whichever thread_ts was last written — usually the wrong (newer) one. The result: the older Slack Assistant thread stayed in is thinking… indefinitely even though Hermes had already replied to it. Reported in #24117.

A related failure path: the send() exception branch did not clear the typing indicator at all, so a Slack post error left the user staring at is thinking… until the gateway restarted.

The fix

  • _active_status_threads: Dict[str, set[str]] — track each active thread.
  • send_typing adds thread_ts to the chat's set (bounded at 128 active threads per chat as a defensive guard against missed clears).
  • stop_typing(chat_id, metadata={"thread_ts": ts}) clears exactly that thread. Called without metadata, it clears every tracked thread for the chat (preserves the legacy base.py wiring that calls stop_typing(chat_id) without thread context).
  • send() finalize now passes its own thread_ts to stop_typing so it only clears its own thread among concurrent ones.
  • send() exception path now also clears the thread's status, so a failed Slack post doesn't leave the user stuck on is thinking….
  • edit_message(finalize=True) is unchanged — it has no thread context to target, and clearing every tracked thread for the chat matches the existing behavior.

Test plan

  • Focused regression tests: tests/gateway/test_slack.py::TestSendTyping — 16/16 passing including five new tests covering concurrent threads, per-thread stop_typing routing, the legacy clear-all path, send-failure clearing, and the per-chat bound.
  • Adjacent suite: full tests/gateway/test_slack.py (186 passed), plus test_slack_approval_buttons.py / test_slack_channel_skills.py / test_slack_mention.py (89 passed).
  • Ruff: ruff check gateway/platforms/slack.py tests/gateway/test_slack.py clean.

Regression guard — the two new tests fail against current main:

  • test_send_typing_tracks_multiple_concurrent_threads — asserts both thread_a and thread_b are tracked after two send_typing calls in the same chat. With the old Dict[str, str] shape, the second overwrites the first.
  • test_send_clears_only_targeted_thread_among_concurrent — asserts that send(metadata={"thread_id": "thread_a"}) clears only thread_a and leaves thread_b tracked. The old code popped the dict and cleared whichever single thread happened to be stored.

Related

Copilot AI review requested due to automatic review settings May 12, 2026 03:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a Slack adapter lifecycle bug where concurrent Assistant threads in the same channel/DM could overwrite each other’s tracked typing/status thread, leaving older threads stuck showing “is thinking…”. The change updates tracking to be per-(chat, thread) and ensures status cleanup happens for the correct thread, including on send failures.

Changes:

  • Change _active_status_threads from a per-chat single thread_ts to a per-chat set[thread_ts] to support concurrent Assistant threads.
  • Route stop_typing() to clear a specific thread when metadata["thread_id"/"thread_ts"] is provided; otherwise clear all tracked threads for the chat (legacy behavior).
  • Update send() to clear typing for its own resolved thread_ts, and attempt cleanup on exceptions; add regression tests covering concurrency, targeted clearing, failure cleanup, and per-chat bounding.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
gateway/platforms/slack.py Implements per-thread status tracking/clearing and adds best-effort cleanup on send failure.
tests/gateway/test_slack.py Updates existing tests for the new tracking shape and adds focused regression coverage for concurrent-thread scenarios and bounds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gateway/platforms/slack.py Outdated
Comment on lines +970 to +976
tracked.discard(ts)
try:
await client.assistant_threads_setStatus(
channel_id=chat_id,
thread_ts=ts,
status="",
)
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/slack Slack app adapter labels May 12, 2026
briandevans added a commit to briandevans/hermes-agent that referenced this pull request May 12, 2026
Address Copilot review on NousResearch#24139: stop_typing previously discarded
the thread from _active_status_threads before calling
assistant_threads_setStatus, so a transient API failure removed the
tracking entry while Slack's UI still showed "is thinking…", with no
way for a later stop_typing call to retry the clear.

Reorder so the discard only happens after a successful clear; on
failure the thread stays tracked and a subsequent stop_typing (e.g.
the next finalize, or a bulk clear without metadata) gets another
chance to clear it. The exception is still swallowed, so a permanent
failure (missing assistant:write scope) doesn't surface to callers.

Updated test_stop_typing_handles_api_error_gracefully to assert the
new contract (thread stays tracked on failure). Added
test_stop_typing_partial_failure_keeps_failed_thread_tracked covering
the mixed success/failure case where one thread clears and another
fails — succeeding threads are discarded, failing ones stay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@briandevans
Copy link
Copy Markdown
Contributor Author

@copilot Addressed in c604e8cstop_typing now discards the tracked thread only after the assistant.threads.setStatus clear succeeds. On a transient API failure the thread stays in _active_status_threads so a later stop_typing (next finalize, or a no-metadata bulk clear) can retry the clear; on permanent failures (e.g. missing assistant:write scope) the exception is still swallowed.

Updated tests:

  • test_stop_typing_handles_api_error_gracefully now asserts the thread stays tracked on failure (was asserting it was popped).
  • Added test_stop_typing_partial_failure_keeps_failed_thread_tracked for the mixed success/failure case: succeeding threads are discarded, failing ones remain tracked.

@briandevans
Copy link
Copy Markdown
Contributor Author

CI audit — all 3 failing checks on this PR are pre-existing baselines on clean origin/main (32abe742f). Zero failures are in touched code (gateway/platforms/slack.py, tests/gateway/test_slack.py).

Job Symptom Root cause on main
test / e2e (Tests workflow) uv pip install -e .[all,dev] fails: Because there are no versions of mistralai and hermes-agent[all]==0.13.0 depends on mistralai>=2.3.0,<3 pyproject.toml on main pins mistralai>=2.3.0,<3 but no satisfying version is currently resolvable on PyPI. Same failure on main's own scheduled run 25713449972.
Windows footguns (blocking) (Lint workflow) tools/process_registry.py:588: [bare os.killpg] / [bare signal.SIGKILL] Pre-existing on main: os.killpg(os.getpgid(proc.pid), signal.SIGKILL) at tools/process_registry.py:588. Same failure on main's own scheduled run 25713449992.

Happy to address the Copilot finding (already fixed in c604e8c3c) or any review comments — just flagging that the red CI badge here isn't from this PR.

briandevans and others added 2 commits May 11, 2026 23:07
…hats

A single Slack channel / DM can host multiple Assistant threads at the
same time (overlapping user requests, parallel sessions, etc.). The
typing-indicator tracker was keyed by chat_id alone:

    self._active_status_threads: Dict[str, str] = {}

so the second concurrent `send_typing` for the same chat overwrote the
first thread's `thread_ts`. When the earlier thread's send finished,
`stop_typing` popped the wrong (newer) `thread_ts` — and the older
thread stayed stuck in "is thinking…" forever, as reported in NousResearch#24117.

The existing fix landed via NousResearch#18553 (April 22) covered the single-thread
lifecycle (set status on send_typing, clear on send / edit-finalize).
This PR extends that lifecycle to concurrent threads:

- Track per-thread instead of per-chat: `Dict[str, set[str]]`.
- `send_typing` adds the thread_ts to the chat's set (bounded at 128
  active threads per chat as a defensive guard against missed clears).
- `stop_typing(chat_id, metadata={"thread_ts": ts})` clears that
  specific thread. Without metadata, every tracked thread in the chat
  is cleared (preserves the legacy base.py wiring that has no thread
  context).
- `send()` finalize now passes its own `thread_ts` to `stop_typing` so
  it clears only its own thread among concurrent ones.
- `send()` exception path now also clears the thread's status, so a
  failed Slack post doesn't leave the user staring at "is thinking…".
- `edit_message(finalize=True)` keeps its existing behavior (clears
  whatever's tracked for the chat — it has no thread context to
  target a specific thread).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Copilot review on NousResearch#24139: stop_typing previously discarded
the thread from _active_status_threads before calling
assistant_threads_setStatus, so a transient API failure removed the
tracking entry while Slack's UI still showed "is thinking…", with no
way for a later stop_typing call to retry the clear.

Reorder so the discard only happens after a successful clear; on
failure the thread stays tracked and a subsequent stop_typing (e.g.
the next finalize, or a bulk clear without metadata) gets another
chance to clear it. The exception is still swallowed, so a permanent
failure (missing assistant:write scope) doesn't surface to callers.

Updated test_stop_typing_handles_api_error_gracefully to assert the
new contract (thread stays tracked on failure). Added
test_stop_typing_partial_failure_keeps_failed_thread_tracked covering
the mixed success/failure case where one thread clears and another
fails — succeeding threads are discarded, failing ones stay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@briandevans briandevans force-pushed the fix/slack-typing-per-thread-24117 branch from c604e8c to 283f4a3 Compare May 12, 2026 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/slack Slack app adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Slack Assistant thread can stay stuck in 'is thinking...' after response is sent

3 participants