Skip to content

Orderer fails with TRANSIENT_FAILURE when using Istio sidecar #275

@JuanAnsio

Description

@JuanAnsio

I have tried moving from an istio ingress-only minimal setup to a full service mesh setup, but connections to orderer node fail with TRANSIENT_FAILURE:

failed to get block from channel defaultchannel: Orderer Client Status Code: (2) CONNECTION_FAILED. 
Description: dialing connection on target [orderer0-ord.kalyp.network:443]: connection is in TRANSIENT_FAILURE

Note that this issue only occurs when istio sidecar injection is enabled for orderer node. Otherwise, it works as expected.

Problem Summary

The HLF operator fails to establish gRPC connections to orderer services when deployed in an Istio service mesh environment, causing channel operations to fail with TRANSIENT_FAILURE errors. This issue is specific to orderers and does not affect CAs or peers.

Environment

  • HLF Operator Version: v1.11.1
  • Istio Version: v1.20.3
  • Kubernetes Version: v1.28.15-eks-b707fbb
  • Fabric SDK Go Version: github.com/kfsoftware/fabric-sdk-go@v0.0.0-20240114221414-98466038585d

Detailed Test Results

Network Connectivity Validation

  • DNS Resolution: ✅ SUCCESS - orderer0-ord.kalyp.network172.20.101.62
  • TCP Connection: ✅ SUCCESS - Can establish TCP to port 443
  • TLS Handshake: ❌ FAILED - write:errno=104 (connection reset by peer)
  • TLS with InsecureSkipVerify: ❌ FAILED - Still errno=104
  • Basic gRPC Connection: ❌ FAILED - Various TLS credential combinations all fail

Direct Service Testing

Testing direct connection to ord-node1.hlf.svc.cluster.local:7050:

  • TLS handshake with SNI: ❌ FAILED - write:errno=104, no peer certificate available
  • Plain TCP: ✅ Connected but no TLS negotiation possible
  • Orderer logs: Show TLS enabled, proper certificates, no connection errors logged

OpenSSL TLS Tests

# Admin endpoint (7053) - Works
openssl s_client -connect orderer0-ord.kalyp.network:443 -servername admin-orderer0-ord.kalyp.network -cert admin.crt -key admin.key
# Result: Handshake OK, Verification: OK

# Orderer endpoint (7050) - Works at TLS level
openssl s_client -connect orderer0-ord.kalyp.network:443 -servername orderer0-ord.kalyp.network -CAfile ca.crt
# Result: TLS handshake EXIT:0 (server cert verified)

SDK Network Configuration

Generated network config includes proper options:

grpcOptions:
  allow-insecure: false
  ssl-target-name-override: orderer0-ord.kalyp.network
  grpc.keepalive_time_ms: "30000"
  grpc.keepalive_timeout_ms: "5000"
  • SDK successfully creates resmgmt.Client
  • But QueryConfigBlockFromOrderer() still fails with TRANSIENT_FAILURE

Fabric SDK Go Limitations

  • Confirmed NOT supported: grpc.authority, grpc.http2.alpn (keys are ignored by SDK)
  • Confirmed supported: ssl-target-name-override, allow-insecure, fail-fast, keep-alive-*
  • Uses grpc.DialContext() with only TLS credentials, no authority header support

Confirmed Findings

  • Istio sidecar interference: TLS handshake fails with errno=104 (connection reset by peer) when Istio sidecars are present
  • Component-specific issue: Only affects orderer connections; CAs and peers work normally with Istio sidecars
  • Transport layer failure: Issue occurs at TLS handshake level before gRPC negotiation begins
  • Consistent across configurations: Problem persists across different Istio mesh configurations

Comprehensive Testing Performed

The following solutions were tested and failed:

  • DestinationRules with trafficPolicy.tls.mode: DISABLE
  • DNS rewrites to bypass Istio ingress
  • In-cluster proxy services
  • Custom gRPC options in network configuration
  • Direct orderer service connections
  • Various TLS credential combinations

Component Comparison

Component Sidecar Status Connection Type Result
Orderer CA ✅ Enabled Internal (ord-ca.hlf.svc.cluster.local) ✅ Works
Peer ✅ Enabled Internal (peer0.hlf.svc.cluster.local) ✅ Works
Orderer ❌ Must Disable External (orderer0-ord.kalyp.network:443) ❌ Fails with sidecar

Current Workaround

Selective sidecar injection disable for orderer services resolves the issue:

spec:
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"

Results after disabling orderer sidecar:

  • ✅ FabricMainChannel status: RUNNING
  • ✅ Channel creation successful
  • QueryConfigBlockFromOrderer() works normally

Impact

  • Severity: High - Blocks production Istio mesh deployments
  • Scope: All HLF operator deployments with Istio service mesh
  • Components Affected: Channel operations, orderer connectivity only

Expected Behavior

The operator should be able to connect to orderer services in Istio mesh environments without requiring sidecar injection to be disabled.

Additional Context

  • CAs and peers work fine with Istio sidecars enabled
  • Comprehensive troubleshooting performed with detailed network validation
  • Issue is reproducible and consistently resolved by disabling orderer sidecars

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions