Summary
The current multi-endpoint client configuration provides client-side load balancing across ready subchannels, but it does not guarantee per-RPC failover semantics. When a request is routed to an endpoint that becomes unavailable or returns a transport error, the failed RPC is not retried on another endpoint automatically.
Current behavior
- Single endpoint uses a direct
GrpcChannel.
- Multi-endpoint uses a static resolver plus
random or round_robin load balancing.
- Endpoint selection happens across ready subchannels.
- There is no explicit retry policy or request-level failover.
Expected behavior
Multi-endpoint configuration should support failover for transient transport-level endpoint failures, so a failed RPC can be retried against another healthy endpoint when it is safe to do so.
Scope
- Define the desired failover contract clearly.
- Evaluate whether this should rely on gRPC retry policy, client-side retry logic, or another mechanism.
- Clarify interaction with idempotency / write semantics.
- Add tests that cover endpoint failure scenarios in multi-endpoint mode.
Notes
This is distinct from load balancing. The current implementation can route new calls to other ready endpoints, but it does not provide request-level automatic failover for a call that already failed.
Summary
The current multi-endpoint client configuration provides client-side load balancing across ready subchannels, but it does not guarantee per-RPC failover semantics. When a request is routed to an endpoint that becomes unavailable or returns a transport error, the failed RPC is not retried on another endpoint automatically.
Current behavior
GrpcChannel.randomorround_robinload balancing.Expected behavior
Multi-endpoint configuration should support failover for transient transport-level endpoint failures, so a failed RPC can be retried against another healthy endpoint when it is safe to do so.
Scope
Notes
This is distinct from load balancing. The current implementation can route new calls to other ready endpoints, but it does not provide request-level automatic failover for a call that already failed.