Skip to content

fix(tools): serialize concurrent hermes_tools RPC calls from execute_code (#17770)#17894

Merged
teknium1 merged 1 commit into
mainfrom
fix/17770-hermes-tools-rpc-thread-safety
Apr 30, 2026
Merged

fix(tools): serialize concurrent hermes_tools RPC calls from execute_code (#17770)#17894
teknium1 merged 1 commit into
mainfrom
fix/17770-hermes-tools-rpc-thread-safety

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

Salvages #17771 by @Heltman onto current main. Closes #17770. Also supersedes @vominh1919's #17872 (same fix, submitted 4h later — both contributors credited).

Problem

Inside execute_code, concurrent tool calls from multiple threads (ThreadPoolExecutor, asyncio.to_thread, etc.) silently receive each other's responses. Responses are individually intact; they just get delivered to the wrong caller.

Root cause in tools/code_execution_tool.py:

  • UDS transport (local backend) — _sock is a shared module-level connection, the newline-framed protocol has no request-id, the server handles requests serially in FIFO order, and _call() has no lock around sendall + recv. Concurrent callers race on recv() and get cross-matched.
  • File transport (remote backends) — _seq += 1 is a non-atomic read-modify-write, so two threads can allocate the same seq and clobber each other's request/response files.

Fix (author: @Heltman, 2 files, +103/-17)

Smallest correct fix: wrap send+recv round-trip (UDS) and seq allocation (file) in a threading.Lock. No protocol change, no server change.

Validation

scripts/run_tests.sh tests/tools/test_code_execution.py tests/tools/test_code_execution_modes.py
103 passed in 33.25s

New regression tests:

  • test_uds_transport_serializes_concurrent_calls — asserts _call_lock is present in generated UDS source
  • test_file_transport_serializes_seq_allocation — asserts _seq_lock is present in generated file source
  • test_concurrent_tool_calls_match_responses — end-to-end: runs a sandboxed ThreadPoolExecutor of 10 terminal() calls with a slow mock dispatcher and asserts every caller sees its own tag (fails 10/10 without the fix).

Backward compatibility

None broken. Single-threaded use is unchanged. The lock only affects concurrent callers inside one execute_code run — which were getting wrong answers without it. Server side is untouched.

Authorship preserved for @Heltman via plain cherry-pick. Thanks also to @vominh1919 who independently identified and fixed the same issue in #17872.

…code

The sandbox-side `_call()` in both the UDS and file-based transports was
not thread-safe, so scripts that call tools from multiple threads (e.g.
`ThreadPoolExecutor` over `terminal()`) inside a single `execute_code`
run could silently receive each other's responses.

Root cause:

* UDS transport — a single module-level `_sock` was shared across all
  threads; the newline-framed protocol has no request-id; and the
  server-side RPC loop handles one connection serially. With concurrent
  callers, each thread would `sendall()` then race to `recv()` the next
  newline-terminated response from the shared buffer, so responses got
  delivered to the wrong caller.

* File transport — `_seq += 1` is a non-atomic read-modify-write, so
  two threads could allocate the same sequence number and clobber each
  other's request/response files.

Fix: guard `_call()` with a `threading.Lock` in the UDS case (covering
send+recv), and guard `_seq` allocation with a lock in the file case.
No protocol change.

Regression tests cover both the generated-source level (lock is present
and used) and an end-to-end concurrency test: running a sandboxed
ThreadPoolExecutor of 10 `terminal()` calls against a slow mock
dispatcher, asserting every caller sees its own tagged response. The
test fails without the fix (10/10 mismatched, matching real-world
repro) and passes with it.
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/tools Tool registry, model_tools, toolsets tool/code-exec execute_code sandbox duplicate This issue or pull request already exists labels Apr 30, 2026
@alt-glitch
Copy link
Copy Markdown
Collaborator

Duplicate of #17902 — same fix for hermes_tools RPC concurrent response mismatch (#17770). #17902 supersedes both #17771 and #17872.

@alt-glitch
Copy link
Copy Markdown
Collaborator

Duplicate of #17902 — same fix for hermes_tools RPC concurrent response mismatch (#17770).

@alt-glitch
Copy link
Copy Markdown
Collaborator

Duplicate of #17902.

@teknium1 teknium1 merged commit b50bc13 into main Apr 30, 2026
12 checks passed
@teknium1 teknium1 deleted the fix/17770-hermes-tools-rpc-thread-safety branch April 30, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tools Tool registry, model_tools, toolsets duplicate This issue or pull request already exists P1 High — major feature broken, no workaround tool/code-exec execute_code sandbox type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: hermes_tools RPC client mismatches responses under concurrent tool calls from execute_code

3 participants