Skip to content

fix(agent): propagate ContextVars to concurrent tool worker threads (salvage #16660)#18123

Merged
teknium1 merged 2 commits into
mainfrom
salvage/contextvar-tool-executor-16660
Apr 30, 2026
Merged

fix(agent): propagate ContextVars to concurrent tool worker threads (salvage #16660)#18123
teknium1 merged 2 commits into
mainfrom
salvage/contextvar-tool-executor-16660

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

Salvages the core fix from #16660 by @banditburai (commit authored by @firefly). The original PR had scope contamination — a tests/eval_018/ directory containing eval-oracle test files for a different project ("talaria") that fail 5/5 on hermes main (check for a non-existent cli.CliError class, a rename from ContextCompressorContextCompactor that never happened here, talaria.tools._anchor_state imports, a wrong generate_title timeout default, etc.). Those were dropped from this salvage.

The real fix (4 LOC in run_agent.py)

_execute_tool_calls_concurrent submits tools via executor.submit(_run_tool, ...) without copy_context().run, so worker threads run with a fresh context — tools.approval._approval_session_key (set by gateway adapters before agent.run) is invisible. Workers fall through get_current_session_key()'s resolution order to the os.environ fallback (which every agent step overwrites), silently collapsing per-session dispatch to whichever session stepped most recently.

Fix: snapshot the caller's context and submit ctx.run(_run_tool, …). Mirrors asyncio.to_thread semantics. The existing threading.local callback propagation at run_agent.py:~8796 (from commit 0046d17 / GHSA-qg5c-hvr5-hjgr) is preserved — that one handles a different propagation surface (approval/sudo callbacks) that ContextVars cannot carry across thread boundaries.

Real-world repro

Via @syahidfrd on #16660 comment: two concurrent Slack sessions (channels A and B), session A's agent fired a dangerous-command approval for a recursive delete → approval card was delivered to channel B — the user there saw an approval prompt for a command they had no context for, while session A's thread blocked waiting for a response that would never come. Any user in B could click "Allow Once" without understanding what they were authorizing.

Regression suite

tests/run_agent/test_tool_executor_contextvar_propagation.py — 5 guards, following the contextvar-run-in-executor-bridge skill's two-test pattern plus a source-level guard for the real call site:

  1. Contract documentationexecutor.submit(fn) without copy_context does NOT propagate ContextVars. If this ever flips, the fix becomes redundant.

  2. Contract validationcopy_context().run(fn) does propagate. Positive baseline.

  3. End-to-end — set the real _approval_session_key in a caller, verify the worker thread observes it via get_current_session_key().

  4. Source-level guard — AST-parses run_agent.py and asserts the executor.submit call site for _run_tool is invoked with ctx.run as its first arg. This is the primary regression guard. Behavioral tests 1-3 + 5 exercise the pattern but not the real call site — they keep passing even if someone reverts the wrapper in run_agent.py. Test 4 fails with a concrete diagnostic:

    AssertionError: run_agent.py contains `executor.submit(_run_tool, ...)`
    without a `ctx.run` wrapper. This is the pre-#16660 shape: worker
    threads will read a fresh ContextVar and approval-session routing
    collapses to the os.environ fallback.
    
  5. Concurrent-caller isolation — two callers each set a different session key; each worker must see its own caller's key.

Regression guard validation

Planted the pre-fix shape: reverted run_agent.py to origin/main → guard #4 fails with the diagnostic above ✓. Restored the fix → 5/5 pass ✓.

Test plan

  • scripts/run_tests.sh tests/run_agent/test_tool_executor_contextvar_propagation.py — 5/5 pass
  • Broader sweep tests/run_agent/ tests/tools/test_approval.py tests/acp/ — 1481 pass, 17 skipped, 1 pre-existing failure on origin/main (test_interactive_env_var_routes_to_callback, unrelated to this PR)
  • Regression guard validated via plant-and-revert
  • Empirical proof of bug mechanism: demonstrated in a standalone REPL that executor.submit(read_ctxvar) sees DEFAULT while executor.submit(ctx.run, read_ctxvar) sees the parent-set value

Closes #16660.

Co-authored-by: firefly promptsiren@gmail.com
Co-authored-by: banditburai banditburai@users.noreply.github.com

banditburai and others added 2 commits April 30, 2026 16:19
`_execute_tool_calls_concurrent` submits tools via `executor.submit(_run_tool, ...)`
without `copy_context().run`, so worker threads do not inherit the parent's
ContextVar values — including `_approval_session_key` set by the gateway before
`agent.run`. Worker tools fall through `tools/approval.py:get_current_session_key`'s
resolution order to the `os.environ` fallback ("default" session key), silently
collapsing per-session dispatch for any tool that runs on a worker thread.

Fix: wrap the submitted callable in `contextvars.copy_context().run`, mirroring
`asyncio.to_thread`'s implementation. The existing threading.local callback
propagation (0046d17 / GHSA-qg5c-hvr5-hjgr) is preserved unchanged — it
handles a different propagation surface that ContextVars cannot carry.
… executor

Regression suite for the PR #16660 fix. Five layers of guards:

1. test_executor_submit_without_copy_context_does_not_propagate — documents
   the Python contract the fix relies on. If this ever flips, the fix
   becomes redundant (and the comment explains why).

2. test_executor_submit_with_copy_context_run_propagates — positive contract
   test for the copy_context().run(...) pattern itself.

3. test_run_tool_worker_sees_parent_approval_session_key — exercises the
   real tools.approval._approval_session_key ContextVar through the
   executor pattern end-to-end.

4. test_run_agent_concurrent_executor_wraps_submit_with_copy_context —
   AST-level guard: parses run_agent.py and asserts the executor.submit
   call site for _run_tool is invoked with ctx.run as the first arg, not
   _run_tool directly. Reverting the fix fails this test with a concrete
   diagnostic message. This is the PRIMARY regression guard; behavioral
   tests above exercise the pattern but not the actual call site.

5. test_two_concurrent_tool_batches_keep_session_keys_isolated — two
   concurrent callers each set a different session key; each worker
   must observe its own caller. Guards against future refactors that
   share a single context snapshot across callers.

Validated by plant-and-revert: reverting run_agent.py to origin/main makes
guard #4 fail with the expected diagnostic. Restoring the fix = 5/5 pass.
@teknium1 teknium1 merged commit e5dad4a into main Apr 30, 2026
9 of 11 checks passed
@teknium1 teknium1 deleted the salvage/contextvar-tool-executor-16660 branch April 30, 2026 23:26
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 30, 2026
donald131 pushed a commit to donald131/hermes-agent that referenced this pull request May 2, 2026
…ousResearch#18123)

Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics.

Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd).

Salvages NousResearch#16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site.

Closes NousResearch#16660.

Co-authored-by: firefly <promptsiren@gmail.com>
Co-authored-by: banditburai <banditburai@users.noreply.github.com>
nickdlkk pushed a commit to nickdlkk/hermes-agent that referenced this pull request May 11, 2026
…ousResearch#18123)

Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics.

Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd).

Salvages NousResearch#16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site.

Closes NousResearch#16660.

Co-authored-by: firefly <promptsiren@gmail.com>
Co-authored-by: banditburai <banditburai@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants