ServicePartitionClient keeps reusing the TCommunicationClient created by the ICommunicationClientFactory until it's ReportOperationExceptionAsync method returns OperationRetryControl.IsTransient=false. While the current design trusts the ICommunicationClientFactory to determine if the client is still valid, implementing this logic may not be trivial and mistakes can lead to prolonged attempts to use a permanently broken TCommunicationChannel.
We should change the ServicePartitionClient so that it always uses the client returned by the ICommunicationClientFactory.ICommunicationClientFactory.GetClientAsync. The term Get in this API indicates that the factory is responsible for caching the clients, not just creating them. The current implementation of the CommunicationClientFactoryBase would ensure that the endpoint information of the partition would be checked against the current information in the FabricClient and even if the user code didn't detect the endpoint change breaking the client, the next attempt to communicate with this partition it would create a fresh client.
We should also make the CommunicationClientFactoryBase handle the FabricClient.ServiceManagementClient.ServiceNotificationFilterMatched event and re-create recently-used cached clients in the background, so that the next GetClientAsync request is likely to find a new client in the cache and avoid the delay of trying to use a broken client, potentially waiting until it times out, creating a new client and opening the channel.
https://dev.azure.com/msazure/One/_workitems/edit/37022576
ServicePartitionClientkeeps reusing theTCommunicationClientcreated by theICommunicationClientFactoryuntil it'sReportOperationExceptionAsyncmethod returnsOperationRetryControl.IsTransient=false. While the current design trusts theICommunicationClientFactoryto determine if the client is still valid, implementing this logic may not be trivial and mistakes can lead to prolonged attempts to use a permanently brokenTCommunicationChannel.We should change the
ServicePartitionClientso that it always uses the client returned by theICommunicationClientFactory.ICommunicationClientFactory.GetClientAsync. The termGetin this API indicates that the factory is responsible for caching the clients, not just creating them. The current implementation of theCommunicationClientFactoryBasewould ensure that the endpoint information of the partition would be checked against the current information in theFabricClientand even if the user code didn't detect the endpoint change breaking the client, the next attempt to communicate with this partition it would create a fresh client.We should also make the
CommunicationClientFactoryBasehandle theFabricClient.ServiceManagementClient.ServiceNotificationFilterMatchedevent and re-create recently-used cached clients in the background, so that the nextGetClientAsyncrequest is likely to find a new client in the cache and avoid the delay of trying to use a broken client, potentially waiting until it times out, creating a new client and opening the channel.https://dev.azure.com/msazure/One/_workitems/edit/37022576