Skip to content

[Bug]: MCP HTTP/SSE timeout policy still has default-budget and SseTransport gaps #6404

@Audacity88

Description

@Audacity88

Affected component

tooling/ci

Severity

S2 - degraded behavior

Current behavior

After #5945 and #6397, configured tool_timeout_secs values above 120 seconds are no longer undercut by HttpTransport::new()'s reqwest client-wide timeout. Two MCP HTTP/SSE timeout gaps remain on current master:

  1. For streamable HTTP tools/call requests with tool_timeout_secs unset, McpServer::call_tool gives the call an outer default budget of 180 seconds, but HttpTransport::send_and_recv still wraps the first SSE JSON-RPC read with self.tool_timeout_secs.unwrap_or(RECV_TIMEOUT_SECS), where RECV_TIMEOUT_SECS is 30 seconds. That means a slow default-budget HTTP/SSE tool can still fail at 30 seconds before the outer 180-second tool-call budget owns the call.
  2. The legacy SseTransport POST path still applies .timeout(Duration::from_secs(120)) and does not store or apply config.tool_timeout_secs, so configured MCP tool budgets above 120 seconds do not reach that request path.

#5945 intentionally preserved None -> 30s for the HTTP SSE read path and left SseTransport untouched. #6397 fixed only the configured-timeout client-wide cap in HttpTransport::new(). This issue tracks the remaining timeout-policy decision rather than reopening #6383.

Expected behavior

MCP HTTP/SSE tool calls should have one effective tool-call budget:

  • For tools/call with configured tool_timeout_secs, the HTTP/SSE request and SSE read paths should not impose a shorter transport timeout.
  • For tools/call with no configured tool_timeout_secs, the transport should either leave the budget to the outer McpServer::call_tool default or explicitly document that streamable HTTP tools are intentionally capped at 30 seconds despite the outer 180-second default.
  • Non-tool MCP requests such as initialize and tools/list should keep short legacy bounds.
  • SseTransport should either honor tool_timeout_secs for its POST request path or document why the legacy SSE transport intentionally remains capped at 120 seconds.

Steps to reproduce

1. On current master at 70b84d302, inspect `crates/zeroclaw-tools/src/mcp_client.rs`.
2. Confirm `McpServer::call_tool` wraps tool calls with `tool_timeout_secs.unwrap_or(DEFAULT_TOOL_TIMEOUT_SECS)`, and `DEFAULT_TOOL_TIMEOUT_SECS` is 180.
3. Inspect `crates/zeroclaw-tools/src/mcp_transport.rs`.
4. Confirm streamable HTTP SSE responses still use `self.tool_timeout_secs.unwrap_or(RECV_TIMEOUT_SECS)` around `read_first_jsonrpc_from_sse_response(resp)`, and `RECV_TIMEOUT_SECS` is 30.
5. Configure a streamable HTTP MCP server without `tool_timeout_secs`, then invoke a `tools/call` whose first SSE JSON-RPC result arrives after 30 seconds but before 180 seconds.
6. Observe the transport read timeout before the outer default tool-call timeout can own the call.
7. For legacy SSE, inspect `SseTransport::send_and_recv` and confirm its POST request still applies `.timeout(Duration::from_secs(120))` without reading `config.tool_timeout_secs`.

Impact

Affected users: users of slow MCP tools over streamable HTTP with no explicit tool_timeout_secs, and users of legacy SSE transport with configured tool budgets above 120 seconds.

Frequency: deterministic when the inner transport timeout is shorter than the intended tool-call budget.

Consequence: ZeroClaw can report a transport timeout before the configured or default tool-call budget is reached, leaving MCP HTTP/SSE behavior inconsistent across configured and default timeout paths.

Logs / stack traces

No fresh live slow-server repro was run for this follow-up. This is verified by source inspection on current master:

  • crates/zeroclaw-tools/src/mcp_client.rs gives tools/call an outer default timeout of 180 seconds.
  • crates/zeroclaw-tools/src/mcp_transport.rs::HttpTransport::send_and_recv() still uses a 30-second default SSE read timeout when tool_timeout_secs is unset.
  • crates/zeroclaw-tools/src/mcp_transport.rs::SseTransport::send_and_recv() still applies a hardcoded 120-second POST timeout.

Related user-visible error shape:

timeout waiting for MCP response from streamable HTTP SSE stream

ZeroClaw version

Current master at 70b84d3 after #6397.

Rust version

Not applicable; source-inspection issue.

Operating system

All platforms using MCP HTTP/SSE transports.

Regression?

Unknown

Related

#6244 / #5945 fixed the earlier configured-timeout 30-second HTTP SSE read cap and deliberately preserved the unset-timeout fallback behavior.

#6383 / #6397 fixed the configured-timeout 120-second reqwest client-wide cap in HttpTransport::new().

This issue tracks the remaining policy gaps exposed by comparing that landed fix with the local follow-up patch work.

Pre-flight checks

  • I reproduced this on the latest master branch or latest release.
  • I redacted secrets, tokens, and personal data from all submitted content.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:p2Medium priorityrisk: highAuto risk: security/runtime/gateway/tools/workflows.status:in-progressAn open PR is actively targeting this issue.status:no-staleExempt from the 60-day stale auto-close policy.toolAuto scope: src/tools/** changed.tool:mcp

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions