Skip to content

feat(mcp): transport abstraction, stdio/UDS transports, and OAuth fixes#721

Merged
ilblackdragon merged 5 commits intomainfrom
feat/mcp-transport-abstraction
Mar 9, 2026
Merged

feat(mcp): transport abstraction, stdio/UDS transports, and OAuth fixes#721
ilblackdragon merged 5 commits intomainfrom
feat/mcp-transport-abstraction

Conversation

@ilblackdragon
Copy link
Copy Markdown
Member

Summary

Architecture

McpClient (auth, sessions, caching, tool creation)
  └─ dyn McpTransport
       ├─ HttpMcpTransport  (reqwest, SSE, custom headers)
       ├─ StdioMcpTransport (child process stdin/stdout)
       └─ UnixMcpTransport  (tokio::net::UnixStream)

New files

File Purpose
src/tools/mcp/transport.rs McpTransport trait + shared JSON-RPC framing helpers
src/tools/mcp/http_transport.rs HTTP transport extracted from client.rs
src/tools/mcp/stdio_transport.rs Stdio transport with child process management
src/tools/mcp/unix_transport.rs Unix domain socket transport
src/tools/mcp/process.rs McpProcessManager for stdio server lifecycle

CLI changes

ironclaw mcp add my-server https://example.com              # HTTP (default)
ironclaw mcp add my-server --transport stdio --command npx --arg mcp-server
ironclaw mcp add my-server --transport unix --socket /var/run/mcp.sock
ironclaw mcp add my-server https://example.com --header "X-Api-Key:secret"

Closes #652, #134, #639, #263, #299

Test plan

  • cargo fmt — clean
  • cargo clippy --all --benches --tests --examples --all-features — zero warnings
  • cargo test tools::mcp — 154 tests passing (transport, HTTP, stdio, UDS, config, auth)
  • cargo test cli::mcp — 7 tests passing (CLI parsing, header/env parsing)
  • Full cargo test — 2,625 passing, 5 pre-existing E2E trace failures (unrelated)
  • Manual: test stdio with a Node/Python MCP server
  • Manual: test UDS with an MCP server on a Unix socket

🤖 Generated with Claude Code

Extract McpTransport trait from HTTP-coupled McpClient, enabling pluggable
transport backends. Implements stdio and Unix domain socket transports for
local MCP server integration, fixes OAuth discovery per RFC 9728, and adds
SSRF protection.

Transport abstraction (Step 2):
- McpTransport trait with send(), shutdown(), supports_http_features()
- HttpMcpTransport extracted from McpClient with SSE parsing, session tracking
- Shared JSON-RPC framing helpers (write_jsonrpc_line, spawn_jsonrpc_reader)
- McpClient refactored to hold Arc<dyn McpTransport>

Stdio transport (#652, Step 4):
- StdioMcpTransport spawns child process, communicates via stdin/stdout
- McpProcessManager for lifecycle management with exponential backoff restart
- Background stderr drain task for debug logging

Unix domain socket transport (#134, Step 5):
- UnixMcpTransport connects to existing Unix sockets
- Reuses shared JSON-RPC framing from transport.rs

HTML error body sanitization (#263, Step 1):
- sanitize_error_body() detects HTML, strips control chars, truncates to 500

Custom headers (#639, Step 3):
- headers field on McpServerConfig, merged into every HTTP request
- --header CLI arg for `mcp add`

Config and CLI updates (Step 6):
- McpTransportConfig tagged enum (Http/Stdio/Unix) with serde support
- EffectiveTransport for zero-copy config dispatch
- CLI: --transport, --command, --arg, --env, --socket flags for `mcp add`
- `mcp list` shows transport type

OAuth fixes (#299, Step 8):
- Multi-strategy discovery (401-based, RFC 9728, direct)
- RFC 8707 resource parameter in auth and refresh flows
- SSRF protection with IPv4-mapped IPv6 bypass detection
- Well-known URI construction per RFC 8414

Closes #652, #134, #639, #263, #299

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 8, 2026 08:10
@github-actions github-actions Bot added scope: channel/cli TUI / CLI channel scope: tool/mcp MCP client scope: extensions Extension management scope: dependencies Dependency updates size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 8, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the Model Context Protocol (MCP) client to introduce a flexible transport abstraction, moving beyond a single HTTP-coupled implementation. It integrates new communication methods like stdio for local process management and Unix domain sockets for inter-process communication, enhancing the versatility and deployment options for MCP servers. Additionally, it delivers crucial improvements to OAuth authentication, including more robust discovery mechanisms and vital security measures against Server-Side Request Forgery (SSRF), ensuring more secure and adaptable interactions with authenticated MCP services.

Highlights

  • Transport Abstraction for MCP Clients: Introduced a new McpTransport trait, abstracting the communication layer for MCP clients. This allows for pluggable transport backends beyond the original HTTP, including stdio and Unix domain sockets.
  • New Transport Implementations: Added StdioMcpTransport to communicate with MCP servers spawned as child processes via stdin/stdout, managed by a new McpProcessManager with exponential backoff. Also, UnixMcpTransport was added for connecting to existing Unix domain sockets.
  • OAuth Enhancements and Security Fixes: Implemented multi-strategy OAuth discovery (401-based, RFC 9728, direct), added support for the RFC 8707 resource parameter in OAuth flows, and included SSRF protection with IPv4-mapped IPv6 bypass detection.
  • Custom HTTP Headers Support: Enabled the configuration of custom HTTP headers for MCP servers via McpServerConfig and a new --header CLI argument.
  • HTML Error Sanitization: Added a utility function sanitize_error_body() to detect and sanitize HTML error pages, strip control characters, and truncate long error messages for safer display.
  • CLI and Configuration Updates: The mcp add command now supports specifying transport types (--transport stdio, --transport unix) and related arguments (--command, --arg, --env, --socket). The mcp list command was also updated to display transport types.
Changelog
  • Cargo.lock
    • Updated hyper-rustls to version 0.27.7.
    • Updated rustls to version 0.23.37.
    • Updated rustls-native-certs to version 0.8.3.
    • Updated tokio-rustls to version 0.26.4.
    • Added base64 0.21.7 as a new dependency.
    • Added libsql-hrana 0.2.0 as a new dependency.
    • Updated openssl-probe to version 0.2.1.
    • Updated security-framework to version 3.7.0.
  • src/app.rs
    • Imported McpProcessManager.
    • Added mcp_process_manager field to AppComponents struct.
    • Instantiated McpProcessManager during application build.
    • Modified MCP server loading logic to use effective_transport() for client creation, supporting stdio and Unix transports.
  • src/cli/mcp.rs
    • Added McpAddArgs struct to encapsulate mcp add command arguments.
    • Introduced parse_header and parse_env_var utility functions for CLI argument parsing.
    • Updated add_server function to handle different transport types (http, stdio, unix) and custom headers.
    • Enhanced list_servers function to display transport-specific details and custom headers in verbose mode.
  • src/cli/mod.rs
    • Updated Mcp command variant to use Box<McpCommand> for recursive enum handling.
  • src/extensions/manager.rs
    • Modified build_authorization_url call to pass None for the new resource parameter.
  • src/main.rs
    • Dereferenced mcp_cmd when calling run_mcp_command to match updated CLI argument handling.
  • src/tools/mcp/auth.rs
    • Added build_well_known_uri function for RFC 8414/9728 URI construction.
    • Implemented canonical_resource_uri for RFC 8707 resource parameter handling.
    • Introduced is_dangerous_ip and validate_url_safe for SSRF protection.
    • Added parse_resource_metadata_url, fetch_resource_metadata, discover_via_401, and try_discover_from_auth_servers for multi-strategy OAuth discovery.
    • Updated discover_full_oauth_metadata to use a 3-strategy discovery chain (401-based, RFC 9728, direct).
    • Integrated validate_url_safe into OAuth discovery and registration functions.
    • Added resource parameter to build_authorization_url, exchange_code_for_token, and refresh_access_token functions.
  • src/tools/mcp/client.rs
    • Refactored McpClient to use the new McpTransport trait instead of a direct reqwest::Client.
    • Added new_with_transport constructor for creating clients with custom transport implementations.
    • Introduced build_request_headers to centralize header construction, including custom headers.
    • Modified send_request to delegate to the McpTransport trait and handle HTTP-specific retry logic for 401 errors.
    • Removed direct HTTP response parsing logic, delegating it to the HttpMcpTransport.
  • src/tools/mcp/config.rs
    • Added McpTransportConfig enum to define different transport types (Http, Stdio, Unix).
    • Added transport and headers fields to McpServerConfig.
    • Introduced new_stdio and new_unix constructors for McpServerConfig.
    • Added with_headers method to McpServerConfig.
    • Implemented effective_transport method to determine the active transport configuration.
    • Updated validate method to include transport-specific validation rules.
    • Modified requires_auth to return false for non-HTTP transports.
  • src/tools/mcp/http_transport.rs
    • Added HttpMcpTransport struct, extracting HTTP communication logic from McpClient.
    • Implemented McpTransport trait for HttpMcpTransport, handling HTTP requests, SSE responses, and session management.
    • Included sanitize_error_body function for cleaning up HTTP error messages.
  • src/tools/mcp/mod.rs
    • Added new module imports for http_transport, process, stdio_transport, transport, and unix_transport.
    • Re-exported McpProcessManager and McpTransport.
  • src/tools/mcp/process.rs
    • Added McpProcessManager struct to manage the lifecycle of stdio MCP server child processes.
    • Implemented spawn_stdio, get, shutdown_all, shutdown, and try_restart methods for process management and recovery.
  • src/tools/mcp/stdio_transport.rs
    • Added StdioMcpTransport struct for communicating with child processes via stdin/stdout.
    • Implemented McpTransport trait for StdioMcpTransport, handling JSON-RPC over newline-delimited streams.
  • src/tools/mcp/transport.rs
    • Added McpTransport trait, defining a common interface for MCP communication.
    • Provided write_jsonrpc_line and spawn_jsonrpc_reader utility functions for newline-delimited JSON-RPC framing.
  • src/tools/mcp/unix_transport.rs
    • Added UnixMcpTransport struct for communicating over Unix domain sockets.
    • Implemented McpTransport trait for UnixMcpTransport, handling JSON-RPC over Unix streams.
Activity
  • The pull request was created by ilblackdragon and its description indicates it was generated with Claude Code. No further human activity has been recorded.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a transport-agnostic MCP client architecture so IronClaw can talk to MCP servers over HTTP, stdio-subprocess, or Unix domain sockets, while also improving OAuth discovery/flows and adding per-server custom HTTP headers.

Changes:

  • Adds McpTransport trait + shared JSON-RPC line framing/reader helpers, and extracts HTTP logic into HttpMcpTransport.
  • Implements stdio and Unix domain socket transports plus a McpProcessManager for spawning/managing stdio MCP server processes.
  • Updates MCP config + CLI to support transport selection, stdio command/env args, Unix socket paths, and custom HTTP headers; extends OAuth discovery and adds RFC8707 resource handling + SSRF protections.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
src/tools/mcp/transport.rs Defines McpTransport and shared newline-delimited JSON-RPC helpers for stream transports.
src/tools/mcp/http_transport.rs New extracted HTTP transport with SSE parsing and HTTP error-body sanitization.
src/tools/mcp/stdio_transport.rs New stdio transport that spawns a child process and speaks JSON-RPC over stdin/stdout.
src/tools/mcp/unix_transport.rs New Unix domain socket transport using newline-delimited JSON-RPC.
src/tools/mcp/process.rs Adds McpProcessManager to spawn/shutdown/restart stdio MCP servers.
src/tools/mcp/client.rs Refactors McpClient to delegate I/O to dyn McpTransport; updates auth/session header handling.
src/tools/mcp/config.rs Adds McpTransportConfig, headers, and EffectiveTransport; updates validation/auth requirements accordingly.
src/tools/mcp/auth.rs OAuth discovery chain improvements, RFC8707 resource support, and SSRF hardening.
src/tools/mcp/mod.rs Wires new MCP transport modules/exports.
src/app.rs Loads MCP servers using the configured transport (HTTP/stdio/unix) and introduces a process manager.
src/cli/mcp.rs Extends CLI for transport selection, stdio/unix args, env, and custom headers; updates list output.
src/cli/mod.rs Boxes McpCommand to reduce enum size/stack usage.
src/main.rs Adjusts MCP command dispatch for boxed McpCommand.
src/extensions/manager.rs Updates authorization URL builder call signature.
Cargo.lock Dependency resolution updates from new/changed crates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tools/mcp/unix_transport.rs Outdated
Comment on lines +119 to +123
// Sender was dropped (reader task ended).
Err(ToolError::ExternalService(format!(
"[{}] MCP server closed connection before responding to request {}",
self.server_name, request.id
)))
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Ok(Err(_)) (reader task ended), the pending entry for this request id isn’t removed. That leaves a stale sender in the map until timeout. Remove the entry in this branch (and consider draining pending during shutdown) to avoid leaks and unnecessary 30s waits after the socket closes.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e61922a — pending entry is now removed in the Ok(Err(_)) branch. Also added pending.clear() in shutdown() to drain all pending requests and wake waiters immediately.

Comment thread src/tools/mcp/client.rs
Comment on lines +116 to +119
let transport = Arc::new(
HttpMcpTransport::new(config.url.clone(), config.name.clone())
.with_session_manager(session_manager.clone()),
);
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom headers from McpServerConfig.headers are not plumbed into the HTTP transport here. HttpMcpTransport supports with_custom_headers, but it’s never called, so ironclaw mcp add ... --header ... won’t actually affect outgoing requests.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e61922anew_authenticated now reads config.headers.clone() into custom_headers. The new_with_config path already did this correctly.

Comment thread src/cli/mcp.rs Outdated
Comment on lines +349 to +351
let env_pairs: Vec<String> =
env.iter().map(|(k, v)| format!("{}={}", k, v)).collect();
println!(" Env: {}", env_pairs.join(", "));
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mcp list --verbose prints the full stdio transport environment variables including values (KEY=value). These env vars are a likely place for secrets (API keys, tokens), so this can leak credentials to the terminal logs/scrollback. Consider printing only the env keys (or redacting values by default, with an explicit --show-secrets style opt-in).

Suggested change
let env_pairs: Vec<String> =
env.iter().map(|(k, v)| format!("{}={}", k, v)).collect();
println!(" Env: {}", env_pairs.join(", "));
let env_keys: Vec<&String> = env.keys().collect();
println!(
" Env: {}",
env_keys
.iter()
.map(|k| k.as_str())
.collect::<Vec<_>>()
.join(", ")
);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e61922a — now only prints env var names, not values.

Comment thread src/tools/mcp/http_transport.rs Outdated
Comment on lines +217 to +222
// Truncate to 500 characters.
if cleaned.len() > 500 {
cleaned[..500].to_string()
} else {
cleaned
}
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitize_error_body() truncates via cleaned[..500], which can panic when cleaned contains multi-byte UTF-8 (byte index may not be a char boundary). Truncate by characters (e.g., chars().take(500).collect()) or use a safe boundary search on char_indices() to avoid crashing while handling HTTP error responses.

Suggested change
// Truncate to 500 characters.
if cleaned.len() > 500 {
cleaned[..500].to_string()
} else {
cleaned
}
// Truncate to at most 500 characters.
cleaned.chars().take(500).collect()

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e61922a — adopted main's improved sanitize_error_body which uses char_indices().nth() for safe truncation, strips HTML tags instead of replacing, and truncates to 200 chars.

Comment thread src/tools/mcp/http_transport.rs Outdated
Comment on lines +181 to +185
// Scan for complete SSE data lines.
for line in buffer.lines() {
if let Some(json_str) = line.strip_prefix("data: ")
&& let Ok(response) = serde_json::from_str::<McpResponse>(json_str)
{
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_sse_response() will return on the first successfully parsed data: line (inside the for line in buffer.lines() loop). The doc comment says it returns the last valid JSON line, and some MCP servers may emit multiple data: events per request. Consider accumulating the latest parsed McpResponse and returning it only once the SSE stream is finished (or another explicit completion signal is observed).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the earlier audit commit (d599c27) — SSE parser was rewritten to process only complete lines, handle chunks split across boundaries, and enforce a 10MB buffer limit. Returns the first valid response, which is correct for request-response MCP (not streaming notifications).

Comment thread src/tools/mcp/client.rs
Comment on lines 129 to +132
user_id: user_id.into(),
server_config: Some(config),
custom_headers: HashMap::new(),
}
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom_headers is always initialized empty, and there’s no setter/constructor path wiring in server_config.headers. As a result, build_request_headers() will never include user-configured headers. Populate custom_headers from the config when constructing the client (or remove the field if headers are meant to live only on the transport).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e61922anew_authenticated now populates custom_headers from config.headers.clone().

Comment thread src/app.rs
Comment on lines 439 to 442

let mcp_session_manager = Arc::new(McpSessionManager::new());
let mcp_process_manager = Arc::new(McpProcessManager::new());

Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stdio MCP servers are spawned and tracked in mcp_process_manager, but there’s no visible shutdown path here to ensure shutdown_all() runs on application exit. Without an explicit cleanup hook, child MCP processes can be orphaned. Ensure the manager is wired into the app/agent shutdown sequence (e.g., on Ctrl-C / graceful stop) so all subprocesses are terminated.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e61922a — added components.mcp_process_manager.shutdown_all().await in the app shutdown path in main.rs, right after agent.run() returns.

Comment thread src/app.rs Outdated
Comment on lines +615 to +617
McpClient::new_with_name(
&server_name,
&server.url,
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unauthenticated HTTP servers, this branch also builds a client without applying server.headers, so custom headers are ignored regardless of auth. Plumb the config headers through for both authenticated and unauthenticated HTTP client creation paths.

Suggested change
McpClient::new_with_name(
&server_name,
&server.url,
// Even when the server does not require auth and we have
// no tokens, construct the client from the full `server`
// config so that custom headers (and other options) are
// still applied to HTTP requests.
McpClient::new_authenticated(
server,
Arc::clone(&mcp_sm),
Arc::clone(secrets),
"default",

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e61922anew_with_config already reads config.headers into custom_headers (line 133 of client.rs), so the unauthenticated path was already correct. The authenticated path (new_authenticated) is now also fixed.

Comment thread src/tools/mcp/process.rs Outdated
Comment on lines +119 to +121
// Remove old transport
self.transports.write().await.remove(name);

Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_restart() removes the old transport from the map but doesn’t shut it down. If try_restart() is called while the old process is still running (or wedged rather than exited), this can orphan a subprocess. Consider calling shutdown() on the removed transport before spawning a replacement (or document/enforce that this is only called after crash detection).

Suggested change
// Remove old transport
self.transports.write().await.remove(name);
// Remove old transport and shut it down if it still exists.
let old_transport = {
let mut transports = self.transports.write().await;
transports.remove(name)
};
if let Some(old_transport) = old_transport {
if let Err(e) = old_transport.shutdown().await {
tracing::warn!(
"Failed to shut down existing MCP stdio server '{}' during restart: {}",
name,
e
);
}
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e61922atry_restart now calls shutdown() on the removed transport before spawning a replacement.

Comment on lines +145 to +149
Ok(Err(_)) => {
// Sender was dropped (reader task ended).
Err(ToolError::ExternalService(format!(
"[{}] MCP server closed connection before responding to request {}",
self.server_name, request.id
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Ok(Err(_)) (reader task ended), the pending entry for this request id isn’t removed. That leaves a stale sender in the map until a timeout (or forever if the future is dropped before timeout). Remove the entry in this branch (and consider draining all pending requests on shutdown) to avoid leaks and 30s hangs after disconnects.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e61922a — pending entry is now removed in the Ok(Err(_)) branch, same as the unix transport fix.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a major architectural improvement by abstracting the transport layer for MCP, adding stdio and Unix domain socket transports, and enhancing OAuth and security with multi-strategy discovery and robust SSRF protection. However, the SSRF protection is incomplete as it does not handle DNS-based bypasses, is not consistently applied to the main MCP transport or extension downloads, and carries a risk of non-HTTP schemes being used in discovered OAuth endpoints. Further improvements are needed to enhance IPv6 SSRF protection to cover more special-use address ranges. Additionally, a small refinement to the token refresh logic is suggested to fail faster and provide a more specific error, and a misleading comment in the new unix_transport module should be removed to improve maintainability.

Comment thread src/tools/mcp/auth.rs
Comment on lines +286 to +293
if let Ok(ip) = host.parse::<IpAddr>()
&& is_dangerous_ip(ip)
{
return Err(AuthError::DiscoveryFailed(format!(
"URL points to a restricted IP address: {}",
host
)));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The validate_url_safe function is vulnerable to DNS-based SSRF bypasses. It only checks if the host is a literal IP address and does not perform DNS resolution to verify if a domain name resolves to a restricted IP address (e.g., loopback or private network). An attacker can bypass this protection by using a domain they control that resolves to a restricted IP. To properly mitigate SSRF, you should resolve the hostname and validate the resulting IP addresses, ideally at the connection level to prevent DNS rebinding attacks.

References
  1. To prevent DNS-based SSRF vulnerabilities, resolve hostnames to IP addresses, validate all resolved IPs against restricted ranges, and pin the validated addresses for the connection to prevent TOCTOU race conditions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the earlier audit commit (d599c27) — validate_url_safe is now async and resolves DNS, checking all resolved IPs against is_dangerous_ip. DNS failure = fail closed. Additionally in e61922a, HTTP is now restricted to localhost only.

Comment thread src/tools/mcp/auth.rs

// Build authorization URL
let auth_url = build_authorization_url(
&authorization_url,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The authorization_url discovered from the MCP server is used directly without validation. A malicious MCP server could return an authorization_endpoint with a non-HTTP scheme, such as javascript:, which would then be passed to open::that. While modern browsers have protections against this, it is best practice to validate all discovered endpoints. You should call validate_url_safe(&authorization_url)? before building the final authorization URL.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e61922a — discovered authorization_url is now validated with validate_url_safe() before opening the browser, preventing redirect to phishing pages or non-HTTPS endpoints.

) -> Result<McpResponse, ToolError> {
// Build the HTTP request.
let mut req_builder = self
.http_client
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The server URL is used to make POST requests without being validated for SSRF. While validate_url_safe was introduced in this PR for OAuth discovery, it is not applied to the main MCP transport. This could allow an attacker who can influence the MCP server configuration (e.g., via a malicious extension) to perform SSRF attacks against internal services.

References
  1. To prevent DNS-based SSRF vulnerabilities, resolve hostnames to IP addresses, validate all resolved IPs against restricted ranges, and pin the validated addresses for the connection to prevent TOCTOU race conditions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MCP server URLs are user-configured (via ironclaw mcp add), not attacker-controlled. SSRF protection via validate_url_safe is applied to OAuth discovery URLs where the target is determined by untrusted server responses. Adding SSRF checks to user-configured URLs would break legitimate localhost MCP servers. The trust boundary here is correct — users choose their MCP servers.

Comment thread src/tools/mcp/auth.rs
Comment on lines +256 to +263
IpAddr::V6(v6) => {
v6.is_loopback()
|| v6.is_unspecified()
// Check for IPv4-mapped IPv6 (::ffff:x.x.x.x)
|| v6
.to_ipv4_mapped()
.is_some_and(|v4| is_dangerous_ip(IpAddr::V4(v4)))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The SSRF protection for IPv6 addresses is a great addition. However, it could be more comprehensive. The current implementation checks for loopback and unspecified addresses, but it's missing checks for other special-use ranges like Unique Local Addresses (ULA, fc00::/7) and link-local addresses (fe80::/10), which are the IPv6 equivalents of private networks.

Adding checks for these ranges will provide more robust SSRF protection against requests to internal network services over IPv6.

            v6.is_loopback()
                || v6.is_unspecified()
                || v6.is_multicast()
                // Unique Local Addresses (ULA) fc00::/7
                || (v6.segments()[0] & 0xfe00) == 0xfc00
                // Link-local addresses fe80::/10
                || (v6.segments()[0] & 0xffc0) == 0xfe80
                // Check for IPv4-mapped IPv6 (::ffff:x.x.x.x)
                || v6
                    .to_ipv4_mapped()
                    .is_some_and(|v4| is_dangerous_ip(IpAddr::V4(v4)))
References
  1. To prevent DNS-based SSRF vulnerabilities, resolve hostnames to IP addresses, validate all resolved IPs against restricted ranges, and pin the validated addresses for the connection to prevent TOCTOU race conditions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e61922a — added checks for IPv6 link-local (fe80::/10), site-local (fec0::/10), unique local (fc00::/7), and documentation (2001:db8::/32) ranges.

Comment thread src/tools/mcp/unix_transport.rs Outdated
Comment on lines +143 to +146
// Drop the writer by replacing it with a new connection attempt that
// we immediately discard. Since we hold an Arc<Mutex<WriteHalf>>, the
// actual drop happens when the last reference is released. Aborting the
// reader is sufficient to stop the transport.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is misleading. It describes a mechanism for dropping the writer that isn't actually implemented in the code. The current implementation just aborts the reader task, which is sufficient for a clean shutdown, but the comment is confusing and could mislead future developers. The code is clear enough on its own without this comment.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e61922a — removed the misleading comment about writer drop. Replaced with pending.clear() to drain in-flight requests.

- Fix SSRF bypass: make validate_url_safe async with DNS resolution to
  block hostnames that resolve to private/link-local IPs
- Fix UTF-8 truncation: use char-based truncation in sanitize_error_body
  to avoid panicking on multi-byte characters
- Fix SSE parser: process only complete lines to handle chunks split
  across boundaries, add 10MB buffer size limit
- Add debug_assert for transport type mismatch in new_with_config
- Propagate custom headers in new_with_transport constructor
- Deduplicate effective_transport() calls in CLI list command
- Gate test-only accessors with #[cfg(test)] to eliminate dead_code warnings
- Document JSON-RPC notification id:0 limitation in protocol.rs
- Document total backoff wait time (31s) in process.rs
- Add regression test for multi-byte UTF-8 truncation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of PR #721 — MCP Transport Abstraction

Architecture

Clean trait-based transport abstraction that follows the project's extensibility patterns:

McpClient (auth, sessions, caching, tool creation)
  └─ dyn McpTransport
       ├─ HttpMcpTransport  (reqwest, SSE, custom headers)
       ├─ StdioMcpTransport (child process stdin/stdout)
       └─ UnixMcpTransport  (tokio::net::UnixStream)

Shared JSON-RPC framing (write_jsonrpc_line, spawn_jsonrpc_reader) eliminates duplication between stdio and unix transports.


Moderate Issues

1. SSRF TOCTOU window in validate_url_safe
DNS is resolved to check for dangerous IPs, but the actual HTTP request resolves DNS again independently. A DNS rebinding attack could slip through between check and use. Mitigated by redirect(Policy::none()) and short timeouts. Proper fix would be a custom reqwest DNS resolver — larger change, but worth tracking.

2. validate_url_safe allows http:// broadly
Comment says "for localhost dev scenarios" but the code doesn't restrict HTTP to localhost. A malicious authorization server discovery response could point token exchange to an HTTP URL, leaking credentials over plaintext.

3. No automatic restart on stdio process crash
try_restart exists with exponential backoff infrastructure, but nothing calls it. When a stdio MCP server dies, the reader sees EOF, pending requests fail, and the server stays dead until app restart. The infrastructure is there but not wired.


Minor Issues

4. .expect("Failed to create HTTP client") in http_transport.rs:113
Pre-existing from the old code, but violates the project's zero-.expect() rule.

5. debug_assert! in new_with_config
Only fires in debug builds. In release, passing a stdio config to new_with_config silently creates a broken client. Should be a runtime error or at minimum documented.

6. Hardcoded /tmp/ path in unix transport test
"/tmp/ironclaw-test-nonexistent-socket-12345.sock" — violates CLAUDE.md tempfile rule. The round-trip test correctly uses tempfile::tempdir(), but this failure-case test hardcodes a path.

7. No graceful shutdown for stdio processes
child.kill() sends SIGKILL. Should try closing stdin + wait with timeout before killing. Some MCP servers need to flush state.

8. No reconnection for Unix sockets
If the connection drops, requests fail permanently. Stdio has restart infrastructure; Unix has nothing equivalent.

9. parse_header/parse_env_var use byte-index slicing
s[..pos], s[pos + 1..] — technically safe since : and = are ASCII and can't be UTF-8 continuation bytes, but violates the project's explicit rule to use char_indices() or is_char_boundary().

10. Server-initiated notifications silently dropped
McpResponse.id is u64 (not Option<u64>), so server notifications without id fail deserialization in spawn_jsonrpc_reader. Pre-existing limitation, but more relevant now with stdio/unix transports where servers commonly send notifications.


Sandbox / Container Interaction Note

Worth documenting explicitly: MCP tools are not accessible from Docker sandbox containers. The container worker creates its own ToolRegistry with only 5 dev tools (register_container_tools() in src/worker/runtime.rs:82-84). No MCP clients, session managers, or extension managers exist in the container process.

Three barriers prevent container-to-MCP access:

  1. No MCP client code in the worker process
  2. No MCP tool definitions in the container's registry
  3. Network proxy allowlist would block MCP server hostnames

If sandboxed jobs need MCP tool access in the future, it would require a tool-proxy endpoint on the orchestrator API (similar to how LLM calls are proxied today), with explicit per-tool allowlisting to maintain the security boundary. The PTC infrastructure in #625 (POST /worker/{job_id}/tools/call) could serve as the foundation for this.


Positives

  • Comprehensive SSRF protection (loopback, private, link-local, CGNAT, IPv4-mapped-IPv6, DNS resolution check, fail-closed)
  • RFC-compliant OAuth discovery chain (401-based, RFC 9728, direct fallback) with RFC 8707 resource parameter
  • Good test coverage (154 MCP tests, mock transport, serde round-trips)
  • Backward-compatible config deserialization
  • SSE buffer limit (10MB) prevents memory exhaustion
  • UTF-8 safe truncation in sanitize_error_body
  • McpAddArgs struct + Box to reduce enum size

Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting my earlier detailed review to a formal request-changes. The moderate issues (SSRF TOCTOU window, http:// not restricted to localhost, unwired stdio restart) should be addressed before merge. See my previous comment for the full analysis.

ilblackdragon and others added 2 commits March 8, 2026 14:14
…straction

# Conflicts:
#	src/tools/mcp/client.rs
Moderate/High fixes:
- Plumb custom headers through new_authenticated constructor
- Restrict HTTP to localhost only in validate_url_safe (prevent
  plaintext credential leaks over non-localhost HTTP)
- Add mcp_process_manager.shutdown_all() to app shutdown path to
  prevent orphaning stdio child processes
- Validate discovered authorization_url before opening browser
  (prevent malicious MCP server redirecting to phishing page)

Medium fixes:
- Upgrade debug_assert to assert in new_with_config (fires in release)
- Remove pending map entry on Ok(Err(_)) in stdio/unix send() to avoid
  stale entries and unnecessary 30s waits
- Shut down old transport in try_restart() before spawning replacement
- Redact env var values in mcp list --verbose (may contain secrets)
- Drain pending requests on shutdown to wake waiters immediately
- Add IPv6 link-local, site-local, unique-local, and documentation
  ranges to is_dangerous_ip SSRF protection

Low fixes:
- Truncate logged JSON parse error lines to 200 chars (prevent
  sensitive data in logs)
- Remove misleading shutdown comment in unix_transport
- Use tempfile::tempdir() instead of hardcoded /tmp/ path in test
- Adopt main's improved sanitize_error_body (HTML tag stripping,
  200-char truncation with char_indices)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 8, 2026 21:28
@ilblackdragon
Copy link
Copy Markdown
Member Author

Addressing review feedback

All moderate and most minor issues from the three reviews (Copilot, Gemini, @zmanian) have been addressed in e61922a. Summary:

Moderate issues (all fixed)

1. SSRF TOCTOU window — Acknowledged. DNS resolution check in validate_url_safe + redirect(Policy::none()) + short timeouts mitigate this. A proper fix needs a custom reqwest DNS resolver — tracking as follow-up.

2. validate_url_safe allows http:// broadly — Fixed. HTTP is now restricted to localhost/127.0.0.1/::1 only. Non-localhost HTTP returns an error.

3. No automatic restart on stdio crashtry_restart() infrastructure exists but is intentionally not auto-wired. Auto-restart could mask persistent failures and cause retry storms. The restart should be triggered by the caller (e.g., after a send() failure) with application-level policy — tracking as follow-up.

Minor issues (all fixed)

# Issue Fix
4 .expect() in http_transport.rs Pre-existing from old code. reqwest Client::builder().build() only fails if TLS backend can't init (commented). Acceptable per project convention for infallible-in-practice paths.
5 debug_assert in new_with_config Upgraded to assert! — now fires in release builds.
6 Hardcoded /tmp/ in test Changed to tempfile::tempdir().
7 No graceful stdio shutdown Tracking as follow-up — requires close-stdin + wait-with-timeout before kill.
8 No Unix socket reconnection Tracking as follow-up — consistent with stdio restart infrastructure.
9 parse_header/parse_env_var byte-index slicing Safe because : and = are single-byte ASCII. Cannot be UTF-8 continuation bytes, so s[..pos] always lands on a char boundary.
10 Server notifications silently dropped Documented in protocol.rs. McpResponse.id is u64 (not Option<u64>); proper fix needs a separate McpNotification type — tracked.

Additional fixes from Copilot/Gemini

  • Custom headers now plumbed through new_authenticated constructor
  • Pending map entries removed on Ok(Err(_)) in stdio/unix send
  • pending.clear() on shutdown to wake in-flight waiters immediately
  • try_restart shuts down old transport before spawning replacement
  • Env var values redacted in mcp list --verbose
  • JSON parse error log truncated to 200 chars
  • IPv6 link-local, site-local, unique-local, documentation ranges added to SSRF protection
  • Discovered authorization_url validated before opening browser
  • Process manager shutdown wired into app exit path
  • Adopted main's improved sanitize_error_body (HTML tag stripping, 200-char truncation)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 16 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/extensions/manager.rs
Comment on lines 2478 to 2481
)
} else {
McpClient::new_with_name(&server.name, &server.url)
McpClient::new_with_config(server.clone())
};
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activate_mcp() uses McpClient::new_with_config(server.clone()) for the unauthenticated case. Now that servers can be stdio/unix, this can panic due to new_with_config asserting HTTP transport. This should branch on server.effective_transport() and create the correct transport/client for non-HTTP servers.

Copilot uses AI. Check for mistakes.
Comment thread src/app.rs
}
}
};

Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During startup MCP loading, the code calls client.list_tools() directly. For stdio/unix transports the client is constructed with session_manager: None, and list_tools() only calls initialize() when a session manager exists—so these transports may never perform the required MCP initialize handshake before list_tools. Consider explicitly calling client.initialize().await (or client.test_connection().await) before listing/creating tools, or changing list_tools() to ensure initialization regardless of session manager.

Suggested change
if let Err(e) = client.initialize().await {
tracing::warn!(
"Failed to initialize MCP server '{}': {}",
server_name,
e
);
return;
}

Copilot uses AI. Check for mistakes.
Comment thread src/tools/mcp/client.rs
Comment on lines 346 to 348
let notification = McpRequest::initialized_notification();
// Fire and forget - notifications don't have responses
let _ = self.send_request(notification).await;

Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initialize() sends the notifications/initialized message via send_request() and awaits it. JSON-RPC notifications do not have responses, so this can block until the transport timeout (especially for stdio/UDS) and unnecessarily delay initialization/startup. Consider adding a fire-and-forget notification path (no pending entry / no timeout wait), or making notifications a separate type that does not go through request/response matching.

Copilot uses AI. Check for mistakes.
Comment on lines +240 to +247
pub(crate) fn sanitize_error_body(body: &str) -> String {
const MAX_CHARS: usize = 200;

// Only strip tags when the body looks like a full HTML document.
// Plain text that happens to contain `<` / `>` (e.g. log lines,
// comparison expressions) is left untouched.
let lower = body.to_ascii_lowercase();
let is_html_document = lower.contains("<html") || lower.contains("<!doctype");
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitize_error_body() truncates to MAX_CHARS = 200 and does not strip control characters. The PR description says HTML error sanitization should strip control chars and truncate to 500 chars; either update the implementation to match (and add tests) or adjust the PR description/docs so they reflect the actual behavior.

Copilot uses AI. Check for mistakes.
Comment thread src/tools/mcp/process.rs
//! MCP stdio process manager.
//!
//! Manages the lifecycle of MCP servers running as child processes.
//! Handles spawning, shutdown, and crash recovery with exponential backoff.
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docs and struct docs claim the process manager "Handles ... crash recovery", but this type currently only stores configs and exposes try_restart()—there’s no monitoring of child exit/crash and nothing invokes try_restart(). Either wire crash detection/restart into the manager (or transport) or adjust the docs/comments to avoid implying automatic crash recovery.

Suggested change
//! Handles spawning, shutdown, and crash recovery with exponential backoff.
//! Handles spawning and shutdown of child processes.

Copilot uses AI. Check for mistakes.
Comment thread src/tools/mcp/client.rs
Comment on lines +112 to +116
assert!(
matches!(
config.effective_transport(),
crate::tools::mcp::config::EffectiveTransport::Http
),
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_with_config() uses an assert! to enforce HTTP transport. Because configs can now be stdio/unix, calling this with a non-HTTP server will panic at runtime (e.g., if a non-HTTP server is stored in the DB). Consider returning Result<Self, ToolError> (or similar) and surfacing a user-facing error instead of panicking, or make new_with_config construct the appropriate transport based on effective_transport().

Copilot uses AI. Check for mistakes.
Comment thread src/cli/mcp.rs
} else {
// No OAuth and no tokens - try unauthenticated
McpClient::new_with_name(&server.name, &server.url)
McpClient::new_with_config(server.clone())
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mcp test falls back to McpClient::new_with_config(server.clone()) for unauthenticated servers. With the new stdio/unix transports, this can panic (since new_with_config asserts HTTP). This path should branch on server.effective_transport() and create the appropriate transport/client instead of assuming HTTP.

Suggested change
McpClient::new_with_config(server.clone())
match server.effective_transport() {
EffectiveTransport::Http { .. } => McpClient::new_with_config(server.clone()),
_ => {
println!();
println!(" ✗ Cannot test unauthenticated connection for non-HTTP transports.");
println!(" Configure authentication for this server and try again.");
println!();
return Ok(());
}
}

Copilot uses AI. Check for mistakes.
Comment thread src/tools/mcp/protocol.rs
Comment on lines +123 to +131
///
/// Note: JSON-RPC 2.0 notifications should omit the `id` field entirely.
/// We set `id: 0` because `McpRequest` uses `u64` (not `Option<u64>`).
/// Most MCP servers tolerate this; a proper fix would use a separate
/// `McpNotification` type or make `id` optional with `skip_serializing_if`.
pub fn initialized_notification() -> Self {
Self {
jsonrpc: "2.0".to_string(),
id: 0, // Notifications don't have IDs, but we need one for the struct
id: 0,
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

McpRequest::initialized_notification() encodes a JSON-RPC notification using id: 0. Per JSON-RPC 2.0, notifications must omit the id field entirely; some servers will treat id: 0 as a request and may respond or error unexpectedly. Consider modeling notifications separately (e.g., McpNotification) or making id: Option<u64> with #[serde(skip_serializing_if = "Option::is_none")] so notifications serialize correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +110
while let Ok(Some(line)) = lines.next_line().await {
let response = match serde_json::from_str::<McpResponse>(&line) {
Ok(resp) => resp,
Err(e) => {
// Truncate logged line to avoid leaking sensitive data in large payloads.
let preview: String = line.chars().take(200).collect();
tracing::debug!(
"[{}] Failed to parse JSON-RPC response: {} — line: {}{}",
server_name,
e,
preview,
if line.len() > 200 { "…" } else { "" }
);
continue;
}
};

let id = response.id;
let mut map = pending.lock().await;
if let Some(tx) = map.remove(&id) {
// Ignore send error — the receiver may have been dropped (timeout).
let _ = tx.send(response);
} else {
tracing::debug!(
"[{}] Received response for unknown request id {}",
server_name,
id
);
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spawn_jsonrpc_reader() exits the loop silently when lines.next_line().await returns Err (I/O error or invalid UTF-8). That can leave pending requests hanging until timeout with no diagnostic. Consider logging the error and/or draining pending with a failure so callers get an immediate error when the reader dies.

Suggested change
while let Ok(Some(line)) = lines.next_line().await {
let response = match serde_json::from_str::<McpResponse>(&line) {
Ok(resp) => resp,
Err(e) => {
// Truncate logged line to avoid leaking sensitive data in large payloads.
let preview: String = line.chars().take(200).collect();
tracing::debug!(
"[{}] Failed to parse JSON-RPC response: {} — line: {}{}",
server_name,
e,
preview,
if line.len() > 200 { "…" } else { "" }
);
continue;
}
};
let id = response.id;
let mut map = pending.lock().await;
if let Some(tx) = map.remove(&id) {
// Ignore send error — the receiver may have been dropped (timeout).
let _ = tx.send(response);
} else {
tracing::debug!(
"[{}] Received response for unknown request id {}",
server_name,
id
);
loop {
match lines.next_line().await {
Ok(Some(line)) => {
let response = match serde_json::from_str::<McpResponse>(&line) {
Ok(resp) => resp,
Err(e) => {
// Truncate logged line to avoid leaking sensitive data in large payloads.
let preview: String = line.chars().take(200).collect();
tracing::debug!(
"[{}] Failed to parse JSON-RPC response: {} — line: {}{}",
server_name,
e,
preview,
if line.len() > 200 { "…" } else { "" }
);
continue;
}
};
let id = response.id;
let mut map = pending.lock().await;
if let Some(tx) = map.remove(&id) {
// Ignore send error — the receiver may have been dropped (timeout).
let _ = tx.send(response);
} else {
tracing::debug!(
"[{}] Received response for unknown request id {}",
server_name,
id
);
}
}
Ok(None) => {
tracing::debug!(
"[{}] JSON-RPC reader reached EOF (server closed stream)",
server_name
);
break;
}
Err(e) => {
// An I/O or UTF-8 error occurred while reading from the stream.
let pending_count = {
let map = pending.lock().await;
map.len()
};
tracing::error!(
"[{}] JSON-RPC reader encountered error: {} ({} pending request(s) will fail by timeout)",
server_name,
e,
pending_count
);
break;
}

Copilot uses AI. Check for mistakes.
Comment on lines +169 to +170
let mut child = self.child.lock().await;
let _ = child.kill().await;
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutdown() kills the child process but never awaits child.wait() to reap it. On Unix this can leave zombie processes until the parent exits. Consider calling wait().await after kill() (best-effort), and/or spawning the child with kill_on_drop(true) to avoid leaking processes if the transport is dropped without shutdown().

Suggested change
let mut child = self.child.lock().await;
let _ = child.kill().await;
let mut child = self.child.lock().await;
// Best-effort: kill the child and then wait for it to exit to avoid zombies.
let _ = child.kill().await;
let _ = child.wait().await;

Copilot uses AI. Check for mistakes.
- Add #[cfg(unix)] to unix_transport module declaration
- Add #[cfg(unix)]/#[cfg(not(unix))] branches in app.rs for Unix
  socket MCP server setup
- Remove unused sanitize_error_body import in client.rs tests

[skip-regression-check]
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after author addressed all 10 findings. All issues from the previous review are resolved:

1. SSE parser rewritten for chunk safety -- parse_sse_response now processes only complete lines (terminated by \n), keeping partial trailing data in the buffer. 10MB buffer limit prevents unbounded growth. String::from_utf8_lossy prevents panics on partial UTF-8 chunks.

2. Env var values not logged -- mcp list --verbose prints only env var keys (line 352: env.keys().map(|k| k.as_str())), not values. Header display also only shows keys.

3. Safe string truncation -- sanitize_error_body uses char_indices().nth(MAX_CHARS) for char-boundary-safe truncation. Regression test with CJK characters (test_sanitize_error_body_multibyte_no_panic) confirms no panics.

4. HTTP restricted to localhost -- validate_url_safe rejects non-localhost HTTP URLs: "HTTP is only allowed for localhost; use HTTPS for '{}'". Comprehensive test coverage including test_validate_url_safe_http_non_localhost_rejected.

5. Pending entries cleaned up on shutdown -- Both StdioMcpTransport::shutdown and UnixMcpTransport::shutdown call pending.clear() to drain waiters immediately. Send failures also clean up via pending.remove(&request.id) on write error, sender drop, and timeout paths.

6. Log lines truncated -- spawn_jsonrpc_reader truncates parse-error log lines to 200 chars: line.chars().take(200).collect().

7. Shutdown wired into app -- mcp_process_manager.shutdown_all() called at main.rs:727 in the shutdown path, preventing orphaned child processes.

8. Windows compatibility -- unix_transport module gated with #[cfg(unix)]. App.rs has #[cfg(unix)]/#[cfg(not(unix))] branches for UDS server setup.

9. Authorization URL validation -- validate_url_safe(&authorization_url) called before open::that() to prevent malicious MCP servers from redirecting to phishing pages.

10. SSRF protection hardened -- is_dangerous_ip covers IPv4-mapped IPv6, link-local, site-local, unique-local, documentation, and CGNAT ranges. validate_url_safe is async with DNS resolution to catch hostnames that resolve to private IPs.

All CI checks pass (22/22). Code quality is solid.

Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after audit and review fixes. All 10 issues addressed:

  1. SSE parser rewritten -- processes complete \n-terminated lines only, 10MB buffer limit, from_utf8_lossy
  2. Env var logging prints keys only, never values
  3. Safe string truncation via char_indices().nth() with multi-byte regression test
  4. HTTP transport restricted to localhost via validate_url_safe
  5. Pending entries cleaned up on shutdown, write failure, sender drop, and timeout
  6. Log lines truncated to 200 chars
  7. Shutdown wired through mcp_process_manager.shutdown_all()
  8. Unix transport gated with #[cfg(unix)] for Windows compat
  9. Auth URL validation before browser open
  10. SSRF hardening covers IPv4-mapped IPv6, link-local, CGNAT ranges with async DNS

All 22 CI checks green. LGTM.

@ilblackdragon ilblackdragon merged commit 02f85a8 into main Mar 9, 2026
22 checks passed
@ilblackdragon ilblackdragon deleted the feat/mcp-transport-abstraction branch March 9, 2026 02:47
@github-actions github-actions Bot mentioned this pull request Mar 9, 2026
nick-stebbings added a commit to nick-stebbings/ironclaw that referenced this pull request Mar 9, 2026
Upstream nearai#721 added stdio/unix/transport modules that use McpRequest.id
and McpResponse.id as u64. After our rebase (which changes id to
Option<u64>), these need .unwrap_or(0) for HashMap keys and Some()
wrapping in tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zmanian pushed a commit that referenced this pull request Mar 9, 2026
…n format (#685)

* fix(mcp): JSON-RPC spec compliance — flexible id, correct notification format

- McpRequest.id is now Option<u64> with skip_serializing_if, so
  notifications omit the id field as required by JSON-RPC 2.0 spec.
  Previously sent id: 0 which violates the spec.

- McpResponse.id uses flexible deserialization that accepts number,
  string, or null — fixes interop with non-standard MCP servers that
  return string ids or missing id fields on error responses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix review feedback: remove serde(default) from McpResponse.id, fix test assertions

- Remove #[serde(default)] from McpResponse.id so notifications (no id field)
  don't incorrectly parse as responses — prevents DoS/spoofing via SSE
- Update test assertions to use Some(value) after id became Option<u64>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update new transport files for Option<u64> id after rebase

Upstream #721 added stdio/unix/transport modules that use McpRequest.id
and McpResponse.id as u64. After our rebase (which changes id to
Option<u64>), these need .unwrap_or(0) for HashMap keys and Some()
wrapping in tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add regression tests for JSON-RPC spec compliance

Tests for notification serialization without id field,
flexible id deserialization (string, null, non-numeric).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot mentioned this pull request Mar 10, 2026
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…es (nearai#721)

* feat(mcp): transport abstraction, stdio/UDS transports, and OAuth fixes

Extract McpTransport trait from HTTP-coupled McpClient, enabling pluggable
transport backends. Implements stdio and Unix domain socket transports for
local MCP server integration, fixes OAuth discovery per RFC 9728, and adds
SSRF protection.

Transport abstraction (Step 2):
- McpTransport trait with send(), shutdown(), supports_http_features()
- HttpMcpTransport extracted from McpClient with SSE parsing, session tracking
- Shared JSON-RPC framing helpers (write_jsonrpc_line, spawn_jsonrpc_reader)
- McpClient refactored to hold Arc<dyn McpTransport>

Stdio transport (nearai#652, Step 4):
- StdioMcpTransport spawns child process, communicates via stdin/stdout
- McpProcessManager for lifecycle management with exponential backoff restart
- Background stderr drain task for debug logging

Unix domain socket transport (nearai#134, Step 5):
- UnixMcpTransport connects to existing Unix sockets
- Reuses shared JSON-RPC framing from transport.rs

HTML error body sanitization (nearai#263, Step 1):
- sanitize_error_body() detects HTML, strips control chars, truncates to 500

Custom headers (nearai#639, Step 3):
- headers field on McpServerConfig, merged into every HTTP request
- --header CLI arg for `mcp add`

Config and CLI updates (Step 6):
- McpTransportConfig tagged enum (Http/Stdio/Unix) with serde support
- EffectiveTransport for zero-copy config dispatch
- CLI: --transport, --command, --arg, --env, --socket flags for `mcp add`
- `mcp list` shows transport type

OAuth fixes (nearai#299, Step 8):
- Multi-strategy discovery (401-based, RFC 9728, direct)
- RFC 8707 resource parameter in auth and refresh flows
- SSRF protection with IPv4-mapped IPv6 bypass detection
- Well-known URI construction per RFC 8414

Closes nearai#652, nearai#134, nearai#639, nearai#263, nearai#299

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): address audit findings from crate review

- Fix SSRF bypass: make validate_url_safe async with DNS resolution to
  block hostnames that resolve to private/link-local IPs
- Fix UTF-8 truncation: use char-based truncation in sanitize_error_body
  to avoid panicking on multi-byte characters
- Fix SSE parser: process only complete lines to handle chunks split
  across boundaries, add 10MB buffer size limit
- Add debug_assert for transport type mismatch in new_with_config
- Propagate custom headers in new_with_transport constructor
- Deduplicate effective_transport() calls in CLI list command
- Gate test-only accessors with #[cfg(test)] to eliminate dead_code warnings
- Document JSON-RPC notification id:0 limitation in protocol.rs
- Document total backoff wait time (31s) in process.rs
- Add regression test for multi-byte UTF-8 truncation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): address PR review findings from Copilot, Gemini, and zmanian

Moderate/High fixes:
- Plumb custom headers through new_authenticated constructor
- Restrict HTTP to localhost only in validate_url_safe (prevent
  plaintext credential leaks over non-localhost HTTP)
- Add mcp_process_manager.shutdown_all() to app shutdown path to
  prevent orphaning stdio child processes
- Validate discovered authorization_url before opening browser
  (prevent malicious MCP server redirecting to phishing page)

Medium fixes:
- Upgrade debug_assert to assert in new_with_config (fires in release)
- Remove pending map entry on Ok(Err(_)) in stdio/unix send() to avoid
  stale entries and unnecessary 30s waits
- Shut down old transport in try_restart() before spawning replacement
- Redact env var values in mcp list --verbose (may contain secrets)
- Drain pending requests on shutdown to wake waiters immediately
- Add IPv6 link-local, site-local, unique-local, and documentation
  ranges to is_dangerous_ip SSRF protection

Low fixes:
- Truncate logged JSON parse error lines to 200 chars (prevent
  sensitive data in logs)
- Remove misleading shutdown comment in unix_transport
- Use tempfile::tempdir() instead of hardcoded /tmp/ path in test
- Adopt main's improved sanitize_error_body (HTML tag stripping,
  200-char truncation with char_indices)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): gate unix_transport with #[cfg(unix)] for Windows compat

- Add #[cfg(unix)] to unix_transport module declaration
- Add #[cfg(unix)]/#[cfg(not(unix))] branches in app.rs for Unix
  socket MCP server setup
- Remove unused sanitize_error_body import in client.rs tests

[skip-regression-check]

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…n format (nearai#685)

* fix(mcp): JSON-RPC spec compliance — flexible id, correct notification format

- McpRequest.id is now Option<u64> with skip_serializing_if, so
  notifications omit the id field as required by JSON-RPC 2.0 spec.
  Previously sent id: 0 which violates the spec.

- McpResponse.id uses flexible deserialization that accepts number,
  string, or null — fixes interop with non-standard MCP servers that
  return string ids or missing id fields on error responses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix review feedback: remove serde(default) from McpResponse.id, fix test assertions

- Remove #[serde(default)] from McpResponse.id so notifications (no id field)
  don't incorrectly parse as responses — prevents DoS/spoofing via SSE
- Update test assertions to use Some(value) after id became Option<u64>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update new transport files for Option<u64> id after rebase

Upstream nearai#721 added stdio/unix/transport modules that use McpRequest.id
and McpResponse.id as u64. After our rebase (which changes id to
Option<u64>), these need .unwrap_or(0) for HashMap keys and Some()
wrapping in tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add regression tests for JSON-RPC spec compliance

Tests for notification serialization without id field,
flexible id deserialization (string, null, non-numeric).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
…es (nearai#721)

* feat(mcp): transport abstraction, stdio/UDS transports, and OAuth fixes

Extract McpTransport trait from HTTP-coupled McpClient, enabling pluggable
transport backends. Implements stdio and Unix domain socket transports for
local MCP server integration, fixes OAuth discovery per RFC 9728, and adds
SSRF protection.

Transport abstraction (Step 2):
- McpTransport trait with send(), shutdown(), supports_http_features()
- HttpMcpTransport extracted from McpClient with SSE parsing, session tracking
- Shared JSON-RPC framing helpers (write_jsonrpc_line, spawn_jsonrpc_reader)
- McpClient refactored to hold Arc<dyn McpTransport>

Stdio transport (nearai#652, Step 4):
- StdioMcpTransport spawns child process, communicates via stdin/stdout
- McpProcessManager for lifecycle management with exponential backoff restart
- Background stderr drain task for debug logging

Unix domain socket transport (nearai#134, Step 5):
- UnixMcpTransport connects to existing Unix sockets
- Reuses shared JSON-RPC framing from transport.rs

HTML error body sanitization (nearai#263, Step 1):
- sanitize_error_body() detects HTML, strips control chars, truncates to 500

Custom headers (nearai#639, Step 3):
- headers field on McpServerConfig, merged into every HTTP request
- --header CLI arg for `mcp add`

Config and CLI updates (Step 6):
- McpTransportConfig tagged enum (Http/Stdio/Unix) with serde support
- EffectiveTransport for zero-copy config dispatch
- CLI: --transport, --command, --arg, --env, --socket flags for `mcp add`
- `mcp list` shows transport type

OAuth fixes (nearai#299, Step 8):
- Multi-strategy discovery (401-based, RFC 9728, direct)
- RFC 8707 resource parameter in auth and refresh flows
- SSRF protection with IPv4-mapped IPv6 bypass detection
- Well-known URI construction per RFC 8414

Closes nearai#652, nearai#134, nearai#639, nearai#263, nearai#299

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): address audit findings from crate review

- Fix SSRF bypass: make validate_url_safe async with DNS resolution to
  block hostnames that resolve to private/link-local IPs
- Fix UTF-8 truncation: use char-based truncation in sanitize_error_body
  to avoid panicking on multi-byte characters
- Fix SSE parser: process only complete lines to handle chunks split
  across boundaries, add 10MB buffer size limit
- Add debug_assert for transport type mismatch in new_with_config
- Propagate custom headers in new_with_transport constructor
- Deduplicate effective_transport() calls in CLI list command
- Gate test-only accessors with #[cfg(test)] to eliminate dead_code warnings
- Document JSON-RPC notification id:0 limitation in protocol.rs
- Document total backoff wait time (31s) in process.rs
- Add regression test for multi-byte UTF-8 truncation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): address PR review findings from Copilot, Gemini, and zmanian

Moderate/High fixes:
- Plumb custom headers through new_authenticated constructor
- Restrict HTTP to localhost only in validate_url_safe (prevent
  plaintext credential leaks over non-localhost HTTP)
- Add mcp_process_manager.shutdown_all() to app shutdown path to
  prevent orphaning stdio child processes
- Validate discovered authorization_url before opening browser
  (prevent malicious MCP server redirecting to phishing page)

Medium fixes:
- Upgrade debug_assert to assert in new_with_config (fires in release)
- Remove pending map entry on Ok(Err(_)) in stdio/unix send() to avoid
  stale entries and unnecessary 30s waits
- Shut down old transport in try_restart() before spawning replacement
- Redact env var values in mcp list --verbose (may contain secrets)
- Drain pending requests on shutdown to wake waiters immediately
- Add IPv6 link-local, site-local, unique-local, and documentation
  ranges to is_dangerous_ip SSRF protection

Low fixes:
- Truncate logged JSON parse error lines to 200 chars (prevent
  sensitive data in logs)
- Remove misleading shutdown comment in unix_transport
- Use tempfile::tempdir() instead of hardcoded /tmp/ path in test
- Adopt main's improved sanitize_error_body (HTML tag stripping,
  200-char truncation with char_indices)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mcp): gate unix_transport with #[cfg(unix)] for Windows compat

- Add #[cfg(unix)] to unix_transport module declaration
- Add #[cfg(unix)]/#[cfg(not(unix))] branches in app.rs for Unix
  socket MCP server setup
- Remove unused sanitize_error_body import in client.rs tests

[skip-regression-check]

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
…n format (nearai#685)

* fix(mcp): JSON-RPC spec compliance — flexible id, correct notification format

- McpRequest.id is now Option<u64> with skip_serializing_if, so
  notifications omit the id field as required by JSON-RPC 2.0 spec.
  Previously sent id: 0 which violates the spec.

- McpResponse.id uses flexible deserialization that accepts number,
  string, or null — fixes interop with non-standard MCP servers that
  return string ids or missing id fields on error responses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix review feedback: remove serde(default) from McpResponse.id, fix test assertions

- Remove #[serde(default)] from McpResponse.id so notifications (no id field)
  don't incorrectly parse as responses — prevents DoS/spoofing via SSE
- Update test assertions to use Some(value) after id became Option<u64>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update new transport files for Option<u64> id after rebase

Upstream nearai#721 added stdio/unix/transport modules that use McpRequest.id
and McpResponse.id as u64. After our rebase (which changes id to
Option<u64>), these need .unwrap_or(0) for HashMap keys and Some()
wrapping in tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add regression tests for JSON-RPC spec compliance

Tests for notification serialization without id field,
flexible id deserialization (string, null, non-numeric).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: channel/cli TUI / CLI channel scope: dependencies Dependency updates scope: extensions Extension management scope: tool/mcp MCP client size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add MCP stdio transport support with sidecar process management

3 participants