Skip to content

fix(tools): serialize concurrent hermes_tools RPC calls from execute_code#17872

Closed
vominh1919 wants to merge 1 commit into
NousResearch:mainfrom
vominh1919:fix/hermes-tools-rpc-concurrency
Closed

fix(tools): serialize concurrent hermes_tools RPC calls from execute_code#17872
vominh1919 wants to merge 1 commit into
NousResearch:mainfrom
vominh1919:fix/hermes-tools-rpc-concurrency

Conversation

@vominh1919
Copy link
Copy Markdown
Contributor

Summary

When a script run via execute_code calls hermes_tools functions (terminal, read_file, etc.) from multiple threads (e.g. ThreadPoolExecutor), each thread frequently receives another thread's response.

Root Cause

UDS transport (local backend): All threads share a single socket (_sock). Concurrent send+recv interleaves.

File transport (remote backends): The global _seq counter has a race condition — two threads can get the same seq number.

Fix

Transport Problem Fix
UDS Shared socket, no serialization threading.Lock() around send+recv
File _seq race condition threading.Lock() around _seq increment

Fixes #17770

…code

When execute_code scripts call hermes_tools functions from multiple
threads (ThreadPoolExecutor), the RPC responses get mismatched — each
thread receives another thread's response.

UDS transport: All threads share a single socket (_sock). Concurrent
send+recv interleaves, so thread A reads thread B's response.
Fix: wrap send+recv in a threading.Lock().

File transport: The global _seq counter has a race condition — two
threads can increment simultaneously and get the same seq number,
causing response file collisions.
Fix: wrap _seq increment and seq_str assignment in a threading.Lock().

Fixes NousResearch#17770
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/tools Tool registry, model_tools, toolsets tool/code-exec execute_code sandbox labels Apr 30, 2026
@alt-glitch
Copy link
Copy Markdown
Collaborator

Likely duplicate of #17771 — identical fix (threading.Lock on UDS send+recv and file transport seq) for the same #17770 race condition in code_execution_tool.py.

@alt-glitch
Copy link
Copy Markdown
Collaborator

Likely duplicate of #17771

2 similar comments
@alt-glitch
Copy link
Copy Markdown
Collaborator

Likely duplicate of #17771

@alt-glitch
Copy link
Copy Markdown
Collaborator

Likely duplicate of #17771

@teknium1
Copy link
Copy Markdown
Contributor

Thanks for tackling this! Closing in favor of #17902 which went with @Heltman's #17771 (he filed the original issue #17770 a few hours earlier). Two things your version would have hit in production that the merged one avoids:

  1. The file-transport header uses with _seq_lock: but never declares _seq_lock and doesn't add threading to the import line — any remote backend (Docker/Modal/SSH) would have raised NameError on the first RPC call.
  2. The payload still does "seq": _seq (module global) outside the lock, so the filename (seq_str) and the payload's seq field can disagree under contention.

The merged fix snapshots seq into a local under the lock and uses that local everywhere, plus adds a full E2E regression test through execute_code. Appreciate you diagnosing it independently — right instinct, same root cause analysis.

@teknium1 teknium1 closed this Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tools Tool registry, model_tools, toolsets P1 High — major feature broken, no workaround tool/code-exec execute_code sandbox type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: hermes_tools RPC client mismatches responses under concurrent tool calls from execute_code

3 participants