-
Notifications
You must be signed in to change notification settings - Fork 573
Description
🐞 Bug Summary
The networking validation suite identified two issues that impact the Gateway's ability to operate in highly secure and high-availability environments. These issues prevent seamless Mutual TLS (mTLS) federation and block Zero-Downtime maintenance.
🧩 Affected Component
Select the area of the project impacted:
-
mcpgateway- API -
mcpgateway- UI (admin panel) -
mcpgateway.wrapper- stdio wrapper - Federation or Transports
- CLI, Makefiles, or shell scripts
- Container setup (Docker/Podman/Compose)
- Other (explain below)
🔁 Steps to Reproduce
NET-01: mTLS Integration
The Issue: The Gateway fails to attach the required client certificates during its automated internal health checks.
Impact: When a target MCP server is set to Strict mTLS mode (CERT_REQUIRED), it rejects any connection that lacks a valid client certificate.
Consequence: Even if the server is healthy, the Gateway's "ping" fails, causing the system to incorrectly flag the server as Offline. This prevents any tool execution, as the Gateway refuses to route traffic to a server it perceives as dead.
Step 1.1: Generate the mTLS Trust Chain
# Create CA
openssl req -x509 -newkey rsa:4096 -keyout ca.key -out ca.crt -days 365 -nodes -subj "/CN=Test CA"
# Create Server Cert (CN must match the registration URL)
openssl genrsa -out server.key 2048
openssl req -new -key server.key -out server.csr -subj "/CN=localhost"
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt -days 365
# Create Client Cert
openssl genrsa -out client.key 2048
openssl req -new -key client.key -out client.csr -subj "/CN=Client"
openssl x509 -req -in client.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out client.crt -days 365
Step 1.2: Start the Target MCP Server in Strict Mode
Ensure the target server (port 9000) is configured with ssl_context.verify_mode = ssl.CERT_REQUIRED.
Step 1.3: Register the Gateway via the API
curl -v --cacert ca.crt -X POST "https://localhost:8443/gateways" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "mtls-test-server",
"url": "https://localhost:9000/sse",
"tls_config": {
"ca_cert": "/tmp/certs/ca.crt",
"client_cert": "/tmp/certs/client.crt",
"client_key": "/tmp/certs/client.key"
}
}'
Step 1.4: Attempt to call a tool
curl -v --cacert ca.crt -X POST "https://localhost:8443/mcp/http" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}}'
NET-02: Certificate Rotation
The Issue: The Gateway process lacks a signal handler for SIGHUP and terminates abruptly instead of reloading its configuration.
Impact: The current implementation lacks a dedicated signal handler, resulting in a hard process termination and the immediate loss of all active SSE and WebSocket sessions.
Consequence: Instead of refreshing the SSL context in memory, the process performs a hard exit (Hangup: 1), leading to immediate service downtime. This forces a manual restart and drops all active SSE/WebSocket connections.
Step 2.1: Start the Gateway and a monitor loop
# Terminal 1: Monitor loop
while true; do
curl -s -o /dev/null -w "%{http_code}\n" --cacert /tmp/certs/ca.crt https://localhost:8443/health \
&& echo "$(date +%H:%M:%S) - OK" || echo "$(date +%H:%M:%S) - FAIL";
sleep 0.5;
done
Step 2.2: Generate a new "Rotation" certificate
openssl genrsa -out /tmp/certs/server-new.key 2048
openssl req -new -key /tmp/certs/server-new.key -out /tmp/certs/server-new.csr -subj "/CN=localhost"
openssl x509 -req -in /tmp/certs/server-new.csr -CA /tmp/certs/ca.crt -CAkey /tmp/certs/ca.key \
-CAcreateserial -out /tmp/certs/server-new.crt -days 365
Step 2.3: Perform the hot swap on disk
cp /tmp/certs/server-new.crt /tmp/certs/server.crt
cp /tmp/certs/server-new.key /tmp/certs/server.key
Step 2.4: Send the reload signal
kill -HUP $(pgrep -f mcpgateway)
🤔 Expected Behavior
NET-01: The Gateway should successfully pass its internal health check by attaching the provided client_cert to the request, allowing the tool call to proceed.
NET-02: The monitor loop should continue to show OK without interruption. The server should catch the signal and re-initialize its SSLContext without dropping active connections or exiting.
📓 Logs / Error Output
NET-01: mTLS Initialization Failure
{"message":"Failed to initialize gateway at https://localhost:9000/sse: All connection attempts failed"}
NET-02: SIGHUP Termination
{"asctime": "2026-03-10T10:27:29", "levelname": "ERROR", "message": "WebSocket error: (<CloseCode.ABNORMAL_CLOSURE: 1006>, '')"}
make: *** [dev] Hangup: 1
🧠 Environment Info
| Key | Value |
|---|---|
| Version or commit | 1.0.0-RC2 |
| Runtime | Python 3.14, Uvicorn |
| Platform / OS | macOS |
| Container | none |
🧩 Additional Context
mcpgateway/utils/ssl_context_cache.py: The cert_hash is calculated solely on the ca_certificate. This makes the cache "Client-Blind." Even if a client cert is provided, the cache returns a generic context that lacks the identity files.
mcpgateway/plugins/framework/external/mcp/tls_utils.py: Hostname verification is enabled by default (check_hostname = True). We observed that using 127.0.0.1 against a CN=localhost certificate causes an immediate SSL abort.
Signal Handling: There is currently no SIGHUP handler implemented in the primary lifecycle management of the Gateway, leading to the default OS behavior of process termination.