Skip to content

fix(mcp): add periodic keepalive to _wait_for_lifecycle_event (salvage #17016)#20209

Merged
teknium1 merged 1 commit into
mainfrom
salvage/pr-17016
May 5, 2026
Merged

fix(mcp): add periodic keepalive to _wait_for_lifecycle_event (salvage #17016)#20209
teknium1 merged 1 commit into
mainfrom
salvage/pr-17016

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

@teknium1 teknium1 commented May 5, 2026

Salvages the keepalive portion of @vominh1919's PR #17016. Fixes #17003.

What it does

During long idle periods, the MCP session's TCP connection goes stale behind NAT / LB idle timeouts (commonly 300–600s). Sends a lightweight list_tools probe every 3 minutes; if it fails, triggers a reconnect cleanly instead of leaving the agent stuck on a dead socket.

Changes

  • tools/mcp_tool.py_wait_for_lifecycle_event wraps asyncio.wait in a loop with a 180s timeout; on timeout, probes session.list_tools() with a 30s deadline and fires _reconnect_event on failure.

Dropped vs original

The original PR #17016 also added half-open state to the circuit breaker, but that feature was already landed on main via @benbarclay's commit 8cc3ceb ("fix(mcp): add half-open state to circuit breaker", Apr 21). Only the keepalive is salvaged here.

Validation

tests/tools/test_mcp_tool.py — 182 passed locally.

Closes #17016 via salvage.

@alt-glitch alt-glitch added type/bug Something isn't working tool/mcp MCP client and OAuth P2 Medium — degraded but workaround exists labels May 5, 2026
@alt-glitch
Copy link
Copy Markdown
Collaborator

Supersedes keepalive portion of #17016 (salvage onto current main). Half-open circuit breaker already merged separately.

Sends a lightweight list_tools() probe every 3 minutes during idle
periods to prevent TCP connections from going stale behind LB / NAT
idle timeouts (commonly 300-600s).  When the keepalive fails, the
reconnect event fires so the transport rebuilds the session cleanly.

Salvages the keepalive portion of @vominh1919's PR #17016. The
circuit-breaker half-open recovery from the same PR was independently
landed on main via #benbarclay's commit 8cc3ceb ("fix(mcp): add
half-open state to circuit breaker", Apr 21); only the keepalive is
salvaged here.

Fixes #17003.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P2 Medium — degraded but workaround exists tool/mcp MCP client and OAuth type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP HTTP connections go stale after extended idle periods

3 participants