fix(tools): serialize concurrent hermes_tools RPC calls from execute_code (#17770)#17894
Merged
Merged
Conversation
…code The sandbox-side `_call()` in both the UDS and file-based transports was not thread-safe, so scripts that call tools from multiple threads (e.g. `ThreadPoolExecutor` over `terminal()`) inside a single `execute_code` run could silently receive each other's responses. Root cause: * UDS transport — a single module-level `_sock` was shared across all threads; the newline-framed protocol has no request-id; and the server-side RPC loop handles one connection serially. With concurrent callers, each thread would `sendall()` then race to `recv()` the next newline-terminated response from the shared buffer, so responses got delivered to the wrong caller. * File transport — `_seq += 1` is a non-atomic read-modify-write, so two threads could allocate the same sequence number and clobber each other's request/response files. Fix: guard `_call()` with a `threading.Lock` in the UDS case (covering send+recv), and guard `_seq` allocation with a lock in the file case. No protocol change. Regression tests cover both the generated-source level (lock is present and used) and an end-to-end concurrency test: running a sandboxed ThreadPoolExecutor of 10 `terminal()` calls against a slow mock dispatcher, asserting every caller sees its own tagged response. The test fails without the fix (10/10 mismatched, matching real-world repro) and passes with it.
Collaborator
Collaborator
Collaborator
|
Duplicate of #17902. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Salvages #17771 by @Heltman onto current
main. Closes #17770. Also supersedes @vominh1919's #17872 (same fix, submitted 4h later — both contributors credited).Problem
Inside
execute_code, concurrent tool calls from multiple threads (ThreadPoolExecutor, asyncio.to_thread, etc.) silently receive each other's responses. Responses are individually intact; they just get delivered to the wrong caller.Root cause in
tools/code_execution_tool.py:_sockis a shared module-level connection, the newline-framed protocol has no request-id, the server handles requests serially in FIFO order, and_call()has no lock aroundsendall + recv. Concurrent callers race onrecv()and get cross-matched._seq += 1is a non-atomic read-modify-write, so two threads can allocate the same seq and clobber each other's request/response files.Fix (author: @Heltman, 2 files, +103/-17)
Smallest correct fix: wrap send+recv round-trip (UDS) and seq allocation (file) in a
threading.Lock. No protocol change, no server change.Validation
New regression tests:
test_uds_transport_serializes_concurrent_calls— asserts_call_lockis present in generated UDS sourcetest_file_transport_serializes_seq_allocation— asserts_seq_lockis present in generated file sourcetest_concurrent_tool_calls_match_responses— end-to-end: runs a sandboxed ThreadPoolExecutor of 10terminal()calls with a slow mock dispatcher and asserts every caller sees its own tag (fails 10/10 without the fix).Backward compatibility
None broken. Single-threaded use is unchanged. The lock only affects concurrent callers inside one
execute_coderun — which were getting wrong answers without it. Server side is untouched.Authorship preserved for @Heltman via plain cherry-pick. Thanks also to @vominh1919 who independently identified and fixed the same issue in #17872.