-
Notifications
You must be signed in to change notification settings - Fork 100
Description
I have tried moving from an istio ingress-only minimal setup to a full service mesh setup, but connections to orderer node fail with TRANSIENT_FAILURE:
failed to get block from channel defaultchannel: Orderer Client Status Code: (2) CONNECTION_FAILED.
Description: dialing connection on target [orderer0-ord.kalyp.network:443]: connection is in TRANSIENT_FAILURE
Note that this issue only occurs when istio sidecar injection is enabled for orderer node. Otherwise, it works as expected.
Problem Summary
The HLF operator fails to establish gRPC connections to orderer services when deployed in an Istio service mesh environment, causing channel operations to fail with TRANSIENT_FAILURE errors. This issue is specific to orderers and does not affect CAs or peers.
Environment
- HLF Operator Version: v1.11.1
- Istio Version: v1.20.3
- Kubernetes Version: v1.28.15-eks-b707fbb
- Fabric SDK Go Version:
github.com/kfsoftware/fabric-sdk-go@v0.0.0-20240114221414-98466038585d
Detailed Test Results
Network Connectivity Validation
- DNS Resolution: ✅ SUCCESS -
orderer0-ord.kalyp.network→172.20.101.62 - TCP Connection: ✅ SUCCESS - Can establish TCP to port 443
- TLS Handshake: ❌ FAILED -
write:errno=104(connection reset by peer) - TLS with InsecureSkipVerify: ❌ FAILED - Still
errno=104 - Basic gRPC Connection: ❌ FAILED - Various TLS credential combinations all fail
Direct Service Testing
Testing direct connection to ord-node1.hlf.svc.cluster.local:7050:
- TLS handshake with SNI: ❌ FAILED -
write:errno=104, no peer certificate available - Plain TCP: ✅ Connected but no TLS negotiation possible
- Orderer logs: Show TLS enabled, proper certificates, no connection errors logged
OpenSSL TLS Tests
# Admin endpoint (7053) - Works
openssl s_client -connect orderer0-ord.kalyp.network:443 -servername admin-orderer0-ord.kalyp.network -cert admin.crt -key admin.key
# Result: Handshake OK, Verification: OK
# Orderer endpoint (7050) - Works at TLS level
openssl s_client -connect orderer0-ord.kalyp.network:443 -servername orderer0-ord.kalyp.network -CAfile ca.crt
# Result: TLS handshake EXIT:0 (server cert verified)SDK Network Configuration
Generated network config includes proper options:
grpcOptions:
allow-insecure: false
ssl-target-name-override: orderer0-ord.kalyp.network
grpc.keepalive_time_ms: "30000"
grpc.keepalive_timeout_ms: "5000"- SDK successfully creates
resmgmt.Client - But
QueryConfigBlockFromOrderer()still fails withTRANSIENT_FAILURE
Fabric SDK Go Limitations
- Confirmed NOT supported:
grpc.authority,grpc.http2.alpn(keys are ignored by SDK) - Confirmed supported:
ssl-target-name-override,allow-insecure,fail-fast,keep-alive-* - Uses
grpc.DialContext()with only TLS credentials, no authority header support
Confirmed Findings
- Istio sidecar interference: TLS handshake fails with
errno=104(connection reset by peer) when Istio sidecars are present - Component-specific issue: Only affects orderer connections; CAs and peers work normally with Istio sidecars
- Transport layer failure: Issue occurs at TLS handshake level before gRPC negotiation begins
- Consistent across configurations: Problem persists across different Istio mesh configurations
Comprehensive Testing Performed
The following solutions were tested and failed:
- DestinationRules with
trafficPolicy.tls.mode: DISABLE - DNS rewrites to bypass Istio ingress
- In-cluster proxy services
- Custom gRPC options in network configuration
- Direct orderer service connections
- Various TLS credential combinations
Component Comparison
| Component | Sidecar Status | Connection Type | Result |
|---|---|---|---|
| Orderer CA | ✅ Enabled | Internal (ord-ca.hlf.svc.cluster.local) |
✅ Works |
| Peer | ✅ Enabled | Internal (peer0.hlf.svc.cluster.local) |
✅ Works |
| Orderer | ❌ Must Disable | External (orderer0-ord.kalyp.network:443) |
❌ Fails with sidecar |
Current Workaround
Selective sidecar injection disable for orderer services resolves the issue:
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"Results after disabling orderer sidecar:
- ✅ FabricMainChannel status: RUNNING
- ✅ Channel creation successful
- ✅
QueryConfigBlockFromOrderer()works normally
Impact
- Severity: High - Blocks production Istio mesh deployments
- Scope: All HLF operator deployments with Istio service mesh
- Components Affected: Channel operations, orderer connectivity only
Expected Behavior
The operator should be able to connect to orderer services in Istio mesh environments without requiring sidecar injection to be disabled.
Additional Context
- CAs and peers work fine with Istio sidecars enabled
- Comprehensive troubleshooting performed with detailed network validation
- Issue is reproducible and consistently resolved by disabling orderer sidecars