Skip to content

linkerd proxy does not recover after policy controller issues ? #14298

@matt-mercer

Description

@matt-mercer

What is the issue?

slightly tricky one to describe here, but changes were being applied to the cluster, including applying the helm chart that deploys linkerd.
We are runnning HA settings for linkerd with 3 replicas of each pod:

NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
linkerd-destination       3/3         3                      3                    18d
linkerd-identity              3/3        3                       3                    18d
linkerd-proxy-injector   3/3        3                       3                    18d

When applying the helm chart we use to apply linkerd we do regularly get some temporay connection issues reported by services ( possibly a separate issue to discuss), in this case though the issue did not resolve after the linkerd deployment. From what I can see in the logs ( I've attached logs from an an affected pod and the linkerd pods below), it looks like the apply caused some unavailability for the policy controller, including between linkerd containers.

Workload certs are using cert-manager self-signed issuer.

Note: though the helm chart is being applied, the trust anchor and issuer CA certs are not changing

How can it be reproduced?

Applying a new release of a linkerd helm chart, which had no changes to linkerd, but triggers a new deployment as the chart version has changed. (e.g. triggering a new deployment of linkerd)

also of possible note, cert-manager (self-signed issuer) is being redeployed immediately before linkerd ( no changes, but likely a new deployment is being triggered)
I ruled out cert-manager self-signed issuer being deployed as a possible cause, by disabling updates of this

Logs, error output, etc

logs from linkerd pods:

1753814480041	2025-07-29T18:41:20.041Z	[     4.754479s]  WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.147.96:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8086: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814480008	2025-07-29T18:41:20.008Z	[     4.721224s]  WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.147.96:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814478618	2025-07-29T18:41:18.618Z	[     3.331484s]  WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.147.96:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8086: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814478605	2025-07-29T18:41:18.605Z	[     3.318339s]  WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.147.96:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814477401	2025-07-29T18:41:17.401Z	[     2.113652s]  WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.147.96:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8086: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814477399	2025-07-29T18:41:17.399Z	[     2.112443s]  WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.147.96:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814476296	2025-07-29T18:41:16.296Z	[     1.009704s]  WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.147.96:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814476296	2025-07-29T18:41:16.296Z	[     1.009648s]  WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.147.96:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8086: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814475858	2025-07-29T18:41:15.858Z	[     2.251873s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814475357	2025-07-29T18:41:15.357Z	[     1.750665s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814475357	2025-07-29T18:41:15.357Z	[     1.750624s]  WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814394335	2025-07-29T18:39:54.335Z	time="2025-07-29T18:39:54Z" level=error msg="failed to start webhook admin server: http: Server closed"
1753814394241	2025-07-29T18:39:54.241Z	[     1.256293s]  WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814394223	2025-07-29T18:39:54.223Z	[     1.238097s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814393740	2025-07-29T18:39:53.740Z	[     0.755221s]  WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814393721	2025-07-29T18:39:53.721Z	[     0.736014s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814393623	2025-07-29T18:39:53.623Z	[     2.317570s]  WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.150.175:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.150.175:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814393621	2025-07-29T18:39:53.621Z	[     2.315422s]  WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.150.175:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.150.175:8086: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814393313	2025-07-29T18:39:53.313Z	[     0.328399s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814393299	2025-07-29T18:39:53.299Z	[     0.314138s]  WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814393097	2025-07-29T18:39:53.097Z	[     0.111706s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814393097	2025-07-29T18:39:53.097Z	[     0.111679s]  WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814392990	2025-07-29T18:39:52.990Z	[     0.005272s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814392990	2025-07-29T18:39:52.990Z	[     0.005254s]  WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814392421	2025-07-29T18:39:52.421Z	[     1.115249s]  WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.150.175:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.150.175:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814392418	2025-07-29T18:39:52.418Z	[     1.112114s]  WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.150.175:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.150.175:8086: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814391314	2025-07-29T18:39:51.314Z	[     0.007962s]  WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.150.175:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.150.175:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814391312	2025-07-29T18:39:51.312Z	[     0.006530s]  WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.150.175:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.150.175:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814391060	2025-07-29T18:39:51.060Z	[ 10359.256028s]  WARN ThreadId(01) linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
1753814390992	2025-07-29T18:39:50.992Z	[ 10359.188615s]  WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814390979	2025-07-29T18:39:50.979Z	[ 10359.175768s]  WARN ThreadId(01) linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
1753814390977	2025-07-29T18:39:50.977Z	[ 10359.173618s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814390969	2025-07-29T18:39:50.969Z	[ 10359.165382s]  WARN ThreadId(01) linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
1753814390969	2025-07-29T18:39:50.969Z	[ 10359.1
1753814390562	2025-07-29T18:39:50.562Z	[ 10358.758286s]  WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814390562	2025-07-29T18:39:50.562Z	[ 10358.758283s]  WARN ThreadId(01) linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
1753814390557	2025-07-29T18:39:50.557Z	[ 10358.753843s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814390557	2025-07-29T18:39:50.557Z	[ 10358.753831s]  WARN ThreadId(01) linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
1753814390342	2025-07-29T18:39:50.342Z	[ 10358.538636s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814390342	2025-07-29T18:39:50.342Z	[ 10358.538388s]  WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814390230	2025-07-29T18:39:50.230Z	[ 10358.426793s]  WARN ThreadId(01) linkerd_reconnect: Service failed error=channel closed
1753814390228	2025-07-29T18:39:50.228Z	[ 10358.424249s]  WARN ThreadId(01) linkerd_reconnect: Service failed error=channel closed
1753814390228	2025-07-29T18:39:50.228Z	[ 10358.423944s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Service failed error=endpoint 127.0.0.1:8090: channel closed error.sources=[channel closed]
1753814390224	2025-07-29T18:39:50.224Z	[ 10358.419889s]  WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Service failed error=endpoint 127.0.0.1:8086: channel closed error.sources=[channel closed]
1753814390220	2025-07-29T18:39:50.220Z	[ 10358.416766s]  WARN ThreadId(01) inbound:server{port=8086}:rescue{client.addr=172.31.150.164:47942}: linkerd_app_inbound::http::server: Unexpected error error=client 172.31.150.164:47942: server: 172.31.150.175:8086: server 172.31.150.175:8086: service linkerd-dst-headless.linkerd.svc.cluster.local:8086: operation was canceled error.sources=[server 172.31.150.175:8086: service linkerd-dst-headless.linkerd.svc.cluster.local:8086: operation was canceled, operation was canceled, connection closed]
1753814390220	2025-07-29T18:39:50.220Z	[ 10358.416732s]  WARN ThreadId(01) inbound:server{port=8086}:rescue{client.addr=172.31.150.166:44016}: linkerd_app_inbound::http::server: Unexpected error error=client 172.31.150.166:44016: server: 172.31.150.175:8086: server 172.31.150.175:8086: service linkerd-dst-headless.linkerd.svc.cluster.local:8086: operation was canceled error.sources=[server 172.31.150.175:8086: service linkerd-dst-headless.linkerd.svc.cluster.local:8086: operation was canceled, operation was canceled, connection closed]
1753814390220	2025-07-29T18:39:50.220Z	[ 10358.416662s]  WARN ThreadId(01) inbound:server{port=8086}:rescue{client.addr=172.31.150.166:44016}: linkerd_app_inbound::http::server: Unexpected error error=client 172.31.150.166:44016: server: 172.31.150.175:8086: server 172.31.150.175:8086: service linkerd-dst-headless.linkerd.svc.cluster.local:8086: operation was canceled error.sources=[server 172.31.150.175:8086: service linkerd-dst-headless.linkerd.svc.cluster.local:8086: operation was canceled, operation was canceled, connection closed]
1753814390215	2025-07-29T18:39:50.215Z	time="2025-07-29T18:39:50Z" level=error msg="failed to start webhook admin server: http: Server closed"
1753814389469	2025-07-29T18:39:49.469Z	[     3.805918s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814389149	2025-07-29T18:39:49.149Z	time="2025-07-29T18:39:49Z" level=error msg="failed to start webhook admin server: http: Server closed"
127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814385692	2025-07-29T18:39:45.692Z	[     0.032568s]  WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814385692	2025-07-29T18:39:45.692Z	[     0.032541s]  WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814380638	2025-07-29T18:39:40.638Z	2025/07/29 18:39:40 http: TLS handshake error from 172.31.150.172:33968: remote error: tls: bad certificate
1753814380616	2025-07-29T18:39:40.616Z	2025/07/29 18:39:40 http: TLS handshake error from 172.31.150.172:33966: remote error: tls: bad certificate
1753814380582	2025-07-29T18:39:40.582Z	2025/07/29 18:39:40 http: TLS handshake error from 172.31.150.172:33952: remote error: tls: bad certificate
1753814380531	2025-07-29T18:39:40.531Z	2025/07/29 18:39:40 http: TLS handshake error from 172.31.98.149:35838: remote error: tls: bad certificate
1753814380530	2025-07-29T18:39:40.530Z	2025/07/29 18:39:40 http: TLS handshake error from 172.31.150.172:33938: remote error: tls: bad certificate

logs from an affected pod:

1753814410873	2025-07-29T18:40:10.873Z	[ 32130.028449s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Removing endpoint addr=172.31.96.31:8090
1753814410873	2025-07-29T18:40:10.873Z	[ 32130.028431s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=172.31.116.194:8090
1753814410872	2025-07-29T18:40:10.872Z	[ 32130.028199s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Removing endpoint addr=172.31.96.31:8086
1753814410872	2025-07-29T18:40:10.872Z	[ 32130.028166s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=172.31.116.194:8086
1753814410285	2025-07-29T18:40:10.285Z	[ 31514.472947s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Removing endpoint addr=172.31.96.31:8090
1753814410285	2025-07-29T18:40:10.285Z	[ 31514.472925s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=172.31.116.194:8090
1753814410285	2025-07-29T18:40:10.285Z	[ 31514.472749s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Removing endpoint addr=172.31.96.31:8086
1753814410285	2025-07-29T18:40:10.285Z	[ 31514.472699s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=172.31.116.194:8086
1753814405871	2025-07-29T18:40:05.871Z	[ 32125.027138s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Removing endpoint addr=172.31.127.99:8086
1753814405871	2025-07-29T18:40:05.871Z	[ 32125.027119s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=172.31.150.161:8086
1753814405871	2025-07-29T18:40:05.871Z	[ 32125.027033s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Removing endpoint addr=172.31.127.99:8090
1753814405871	2025-07-29T18:40:05.871Z	[ 32125.027013s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=172.31.150.161:8090
1753814405782	2025-07-29T18:40:05.782Z	[ 32124.937344s]  WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.127.99:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.127.99:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814405283	2025-07-29T18:40:05.283Z	[ 31509.471161s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Removing endpoint addr=172.31.127.99:8086
1753814405283	2025-07-29T18:40:05.283Z	[ 31509.471146s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=172.31.150.161:8086
1753814405283	2025-07-29T18:40:05.283Z	[ 31509.470994s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Removing endpoint addr=172.31.127.99:8090
1753814405283	2025-07-29T18:40:05.283Z	[ 31509.470945s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=172.31.150.161:8090
1753814404675	2025-07-29T18:40:04.675Z	[ 32123.831157s]  WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.127.99:8090}: linkerd_reconnect: Service failed error=endpoint 172.31.127.99:8090: channel closed error.sources=[channel closed]
1753814404571	2025-07-29T18:40:04.571Z	[ 32123.726780s]  WARN ThreadId(01) watch{port=4191}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message="client 172.31.127.98:56700: server: 172.31.96.31:8090: server 172.31.96.31:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast"
1753814404571	2025-07-29T18:40:04.571Z	[ 32123.726763s]  WARN ThreadId(01) watch{port=11211}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message="client 172.31.127.98:56700: server: 172.31.96.31:8090: server 172.31.96.31:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast"
1753814404571	2025-07-29T18:40:04.571Z	[ 32123.726719s]  WARN ThreadId(01) watch{port=9150}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message="client 172.31.127.98:56700: server: 172.31.96.31:8090: server 172.31.96.31:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast"
1753814397991	2025-07-29T18:39:57.991Z	[ 32117.146960s]  WARN ThreadId(01) watch{port=9150}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message="client 172.31.127.98:46618: server: 172.31.127.99:8090: server 172.31.127.99:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast"
1753814397982	2025-07-29T18:39:57.982Z	[ 31502.170125s]  WARN ThreadId(01) watch{port=11211}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message=""
1753814397982	2025-07-29T18:39:57.982Z	[ 31502.170071s]  WARN ThreadId(01) watch{port=4191}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message=""
1753814395868	2025-07-29T18:39:55.868Z	[ 32115.024323s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Removing endpoint addr=172.31.150.175:8086
1753814395868	2025-07-29T18:39:55.868Z	[ 32115.024308s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=172.31.147.96:8086
1753814395868	2025-07-29T18:39:55.868Z	[ 32115.024178s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Removing endpoint addr=172.31.150.175:8090
1753814395868	2025-07-29T18:39:55.868Z	[ 32115.024142s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=172.31.147.96:8090
1753814395281	2025-07-29T18:39:55.281Z	[ 31499.468841s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Removing endpoint addr=172.31.150.175:8086
1753814395281	2025-07-29T18:39:55.281Z	[ 31499.468827s]  INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=172.31.147.96:8086
1753814395281	2025-07-29T18:39:55.281Z	[ 31499.468500s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Removing endpoint addr=172.31.150.175:8090
1753814395281	2025-07-29T18:39:55.281Z	[ 31499.468462s]  INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=172.31.147.96:8090
1753814391237	2025-07-29T18:39:51.237Z	[ 32110.392422s]  WARN ThreadId(01) watch{port=9150}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message="client 172.31.127.98:34624: server: 172.31.150.175:8090: server 172.31.150.175:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast"

output of linkerd check -o short

Status check results are √

Environment

  • Kubernetes Version: 1.33
  • Cluster Environment: EKS (Auto)
  • Host OS: Bottlerocket
  • Linkerd version: 25.6.1
  • Native Sidecar: true

Possible solution

No response

Additional context

helm command

helm upgrade --install --render-subchart-notes  --set-file linkerd-control-plane.identityTrustAnchorsPEM=./.certs/root-ca.crt \
	--set-file linkerd-control-plane.identity.issuer.tls.crtPEM=./.certs/$(cluster)-ca.crt \
  	--set-file linkerd-control-plane.identity.issuer.tls.keyPEM=./.certs/$(cluster)-ca.key \
        ./

helm chart.yml:

apiVersion: v2
name: linkerd-control-plane
description: linkerd-control-plane
type: application
version: 1.1.20
appVersion: "25.6.1"
dependencies:

  - name: linkerd-control-plane
    version: 2025.6.1
    repository: https://helm.linkerd.io/edge

helm chart values (resources removed for brevity):

linkerd-control-plane:
  disableHeartBeat: true
  clusterNetworks: "10.0.0.0/8,100.64.0.0/10,192.168.0.0/16,fd00::/8,172.31.96.0/19,172.31.128.0/19,172.31.160.0/19"

  proxy:
    defaultInboundPolicy: all-authenticated
    nativeSidecar: true
    metrics:
      hostnameLabels: true

  enablePodAntiAffinity: true
  enablePodDisruptionBudget: true

  controller:
    podDisruptionBudget:
      maxUnavailable: 1
      minAvailable: 1

  deploymentStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 25%

  controllerReplicas: 3

  webhookFailurePolicy: Fail

  highAvailability: true

Would you like to work on fixing this bug?

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions