Skip to content

fix(mcp): fix lifecycle-task leak in stateful clients and refactor shared logic…#4152

Merged
xieyxclack merged 1 commit into
agentscope-ai:mainfrom
qbc2016:bc/mcp
May 9, 2026
Merged

fix(mcp): fix lifecycle-task leak in stateful clients and refactor shared logic…#4152
xieyxclack merged 1 commit into
agentscope-ai:mainfrom
qbc2016:bc/mcp

Conversation

@qbc2016
Copy link
Copy Markdown
Member

@qbc2016 qbc2016 commented May 9, 2026

Problem

MCP subprocesses were accumulating as children of the qwenpaw app daemon across agent hot-reloads (#4105). Root cause: StdIOStatefulClient.close() and HttpStatefulClient.close() returned early when is_connected=False, even if the background lifecycle task was still running in a reconnect loop. Every hot-reload called mcp_manager.close_all(), but the reconnecting clients were silently skipped — their tasks eventually spawned fresh subprocesses that were never cleaned up.

A secondary issue (#4100): HttpStatefulClient did not recover from transport-layer failures (anyio.ClosedResourceError, HTTP timeouts, etc.) because the error never propagated back to the lifecycle loop.

Changes

Bug fixes

  • close() lifecycle-task leak — always stop the lifecycle task when it is running, regardless of is_connected state (is_connected=False + running task = reconnect loop, not a clean idle state).
  • close() stale _lifecycle_task reference on cancellation — moved self._lifecycle_task = None to a finally block so it is cleared even when the calling coroutine is cancelled (CancelledError is BaseException, not Exception).
  • connect() spurious ready on reconnect — cleared _ready_event before creating the lifecycle task so a second connect() after close() waits for the new connection rather than returning immediately.
  • reload() returning before reconnect completes — cleared _ready_event before waiting so the caller blocks until the new session is actually established.
  • connect() double lifecycle-task on transport error — added has_task guard to prevent creating a second task when one is already running in a reconnect loop.
  • Transport-error auto-recovery (#4100) — introduced _handle_transport_error() and _TRANSPORT_ERRORS to detect broken-pipe, EOF, and anyio / httpx transport errors in call_tool / list_tools and trigger a reconnect via _reload_event, fixing the case where streamable_http_client's internal post_writer task silently closes the
    write stream.
  • _force_cleanup_client silent no-op (manager.py) — replaced the dead client.stack attribute access (which was always None for the new clients) with a direct client.close(ignore_errors=True) call.
  • replace_client() holding lock during close() (manager.py) — moved old_client.close() outside the lock to match remove_client()'s pattern and avoid blocking get_clients() / close_all() for the duration of the lifecycle-task shutdown.

Refactoring

  • Extracted _MCPClientMixin containing all shared logic (_run_lifecycle, connect, reload, close, list_tools, call_tool, _handle_transport_error, _validate_connection). Each concrete class now only implements __init__ and _setup_transport.
  • reload() was previously only on StdIOStatefulClient; it is now available on HttpStatefulClient as well via the mixin.
  • Consolidated all transport-error exception types into a single _TRANSPORT_ERRORS tuple for clarity.

@github-project-automation github-project-automation Bot moved this to Todo in QwenPaw May 9, 2026
@qbc2016 qbc2016 temporarily deployed to maintainer-approved May 9, 2026 07:41 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

Welcome to QwenPaw! 🐾

Hi @qbc2016, this is your 60th Pull Request.

🎁 Milestone Celebration

🎖️🏅 Congratulations! This is your 60th contribution. Thank you for your continued support! You're an important member of the QwenPaw community! 🐾

📋 About PR Template

To help maintainers review your PR faster, please make sure to include:

  • Description - What this PR does and why
  • Type of Change - Bug fix / Feature / Breaking change / Documentation / Refactoring
  • Component(s) Affected - Core / Console / Channels / Skills / CLI / Documentation / Tests / CI/CD / Scripts
  • Checklist:
    • Run and pass pre-commit run --all-files
    • Run and pass relevant tests (pytest or as applicable)
    • Update documentation if needed
  • Testing - How to test these changes
  • Local Verification Evidence:
    pre-commit run --all-files
    # paste summary result
    
    pytest
    # paste summary result

Complete PR information helps speed up the review process. You can edit the PR description to add these details.

🙌 Join Developer Community

Thanks so much for your contribution! We'd love to invite you to join the official QwenPaw developer group! You can find the Discord and DingTalk group links under the "Developer Community" section on our docs page:
https://qwenpaw.agentscope.io/docs/community

We truly appreciate your enthusiasm—and look forward to your future contributions! 😊

We'll review your PR soon.

@xieyxclack xieyxclack merged commit 0dc50af into agentscope-ai:main May 9, 2026
19 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in QwenPaw May 9, 2026
cofly-io pushed a commit to cofly-io/xClaw that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

2 participants