-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
What is the issue?
slightly tricky one to describe here, but changes were being applied to the cluster, including applying the helm chart that deploys linkerd.
We are runnning HA settings for linkerd with 3 replicas of each pod:
NAME READY UP-TO-DATE AVAILABLE AGE
linkerd-destination 3/3 3 3 18d
linkerd-identity 3/3 3 3 18d
linkerd-proxy-injector 3/3 3 3 18d
When applying the helm chart we use to apply linkerd we do regularly get some temporay connection issues reported by services ( possibly a separate issue to discuss), in this case though the issue did not resolve after the linkerd deployment. From what I can see in the logs ( I've attached logs from an an affected pod and the linkerd pods below), it looks like the apply caused some unavailability for the policy controller, including between linkerd containers.
Workload certs are using cert-manager self-signed issuer.
Note: though the helm chart is being applied, the trust anchor and issuer CA certs are not changing
How can it be reproduced?
Applying a new release of a linkerd helm chart, which had no changes to linkerd, but triggers a new deployment as the chart version has changed. (e.g. triggering a new deployment of linkerd)
also of possible note, cert-manager (self-signed issuer) is being redeployed immediately before linkerd ( no changes, but likely a new deployment is being triggered)
I ruled out cert-manager self-signed issuer being deployed as a possible cause, by disabling updates of this
Logs, error output, etc
logs from linkerd pods:
1753814480041 2025-07-29T18:41:20.041Z [ 4.754479s] WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.147.96:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8086: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814480008 2025-07-29T18:41:20.008Z [ 4.721224s] WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.147.96:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814478618 2025-07-29T18:41:18.618Z [ 3.331484s] WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.147.96:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8086: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814478605 2025-07-29T18:41:18.605Z [ 3.318339s] WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.147.96:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814477401 2025-07-29T18:41:17.401Z [ 2.113652s] WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.147.96:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8086: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814477399 2025-07-29T18:41:17.399Z [ 2.112443s] WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.147.96:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814476296 2025-07-29T18:41:16.296Z [ 1.009704s] WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.147.96:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814476296 2025-07-29T18:41:16.296Z [ 1.009648s] WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.147.96:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.147.96:8086: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814475858 2025-07-29T18:41:15.858Z [ 2.251873s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814475357 2025-07-29T18:41:15.357Z [ 1.750665s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814475357 2025-07-29T18:41:15.357Z [ 1.750624s] WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814394335 2025-07-29T18:39:54.335Z time="2025-07-29T18:39:54Z" level=error msg="failed to start webhook admin server: http: Server closed"
1753814394241 2025-07-29T18:39:54.241Z [ 1.256293s] WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814394223 2025-07-29T18:39:54.223Z [ 1.238097s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814393740 2025-07-29T18:39:53.740Z [ 0.755221s] WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814393721 2025-07-29T18:39:53.721Z [ 0.736014s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814393623 2025-07-29T18:39:53.623Z [ 2.317570s] WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.150.175:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.150.175:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814393621 2025-07-29T18:39:53.621Z [ 2.315422s] WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.150.175:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.150.175:8086: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814393313 2025-07-29T18:39:53.313Z [ 0.328399s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814393299 2025-07-29T18:39:53.299Z [ 0.314138s] WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814393097 2025-07-29T18:39:53.097Z [ 0.111706s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814393097 2025-07-29T18:39:53.097Z [ 0.111679s] WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814392990 2025-07-29T18:39:52.990Z [ 0.005272s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814392990 2025-07-29T18:39:52.990Z [ 0.005254s] WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814392421 2025-07-29T18:39:52.421Z [ 1.115249s] WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.150.175:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.150.175:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814392418 2025-07-29T18:39:52.418Z [ 1.112114s] WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.150.175:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.150.175:8086: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814391314 2025-07-29T18:39:51.314Z [ 0.007962s] WARN ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=172.31.150.175:8086}: linkerd_reconnect: Failed to connect error=endpoint 172.31.150.175:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814391312 2025-07-29T18:39:51.312Z [ 0.006530s] WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.150.175:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.150.175:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814391060 2025-07-29T18:39:51.060Z [ 10359.256028s] WARN ThreadId(01) linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
1753814390992 2025-07-29T18:39:50.992Z [ 10359.188615s] WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814390979 2025-07-29T18:39:50.979Z [ 10359.175768s] WARN ThreadId(01) linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
1753814390977 2025-07-29T18:39:50.977Z [ 10359.173618s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814390969 2025-07-29T18:39:50.969Z [ 10359.165382s] WARN ThreadId(01) linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
1753814390969 2025-07-29T18:39:50.969Z [ 10359.1
1753814390562 2025-07-29T18:39:50.562Z [ 10358.758286s] WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814390562 2025-07-29T18:39:50.562Z [ 10358.758283s] WARN ThreadId(01) linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
1753814390557 2025-07-29T18:39:50.557Z [ 10358.753843s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814390557 2025-07-29T18:39:50.557Z [ 10358.753831s] WARN ThreadId(01) linkerd_reconnect: Failed to connect error=Connection refused (os error 111)
1753814390342 2025-07-29T18:39:50.342Z [ 10358.538636s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814390342 2025-07-29T18:39:50.342Z [ 10358.538388s] WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814390230 2025-07-29T18:39:50.230Z [ 10358.426793s] WARN ThreadId(01) linkerd_reconnect: Service failed error=channel closed
1753814390228 2025-07-29T18:39:50.228Z [ 10358.424249s] WARN ThreadId(01) linkerd_reconnect: Service failed error=channel closed
1753814390228 2025-07-29T18:39:50.228Z [ 10358.423944s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Service failed error=endpoint 127.0.0.1:8090: channel closed error.sources=[channel closed]
1753814390224 2025-07-29T18:39:50.224Z [ 10358.419889s] WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Service failed error=endpoint 127.0.0.1:8086: channel closed error.sources=[channel closed]
1753814390220 2025-07-29T18:39:50.220Z [ 10358.416766s] WARN ThreadId(01) inbound:server{port=8086}:rescue{client.addr=172.31.150.164:47942}: linkerd_app_inbound::http::server: Unexpected error error=client 172.31.150.164:47942: server: 172.31.150.175:8086: server 172.31.150.175:8086: service linkerd-dst-headless.linkerd.svc.cluster.local:8086: operation was canceled error.sources=[server 172.31.150.175:8086: service linkerd-dst-headless.linkerd.svc.cluster.local:8086: operation was canceled, operation was canceled, connection closed]
1753814390220 2025-07-29T18:39:50.220Z [ 10358.416732s] WARN ThreadId(01) inbound:server{port=8086}:rescue{client.addr=172.31.150.166:44016}: linkerd_app_inbound::http::server: Unexpected error error=client 172.31.150.166:44016: server: 172.31.150.175:8086: server 172.31.150.175:8086: service linkerd-dst-headless.linkerd.svc.cluster.local:8086: operation was canceled error.sources=[server 172.31.150.175:8086: service linkerd-dst-headless.linkerd.svc.cluster.local:8086: operation was canceled, operation was canceled, connection closed]
1753814390220 2025-07-29T18:39:50.220Z [ 10358.416662s] WARN ThreadId(01) inbound:server{port=8086}:rescue{client.addr=172.31.150.166:44016}: linkerd_app_inbound::http::server: Unexpected error error=client 172.31.150.166:44016: server: 172.31.150.175:8086: server 172.31.150.175:8086: service linkerd-dst-headless.linkerd.svc.cluster.local:8086: operation was canceled error.sources=[server 172.31.150.175:8086: service linkerd-dst-headless.linkerd.svc.cluster.local:8086: operation was canceled, operation was canceled, connection closed]
1753814390215 2025-07-29T18:39:50.215Z time="2025-07-29T18:39:50Z" level=error msg="failed to start webhook admin server: http: Server closed"
1753814389469 2025-07-29T18:39:49.469Z [ 3.805918s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814389149 2025-07-29T18:39:49.149Z time="2025-07-29T18:39:49Z" level=error msg="failed to start webhook admin server: http: Server closed"
127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814385692 2025-07-29T18:39:45.692Z [ 0.032568s] WARN ThreadId(01) policy:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814385692 2025-07-29T18:39:45.692Z [ 0.032541s] WARN ThreadId(01) dst:controller{addr=localhost:8086}:endpoint{addr=127.0.0.1:8086}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
1753814380638 2025-07-29T18:39:40.638Z 2025/07/29 18:39:40 http: TLS handshake error from 172.31.150.172:33968: remote error: tls: bad certificate
1753814380616 2025-07-29T18:39:40.616Z 2025/07/29 18:39:40 http: TLS handshake error from 172.31.150.172:33966: remote error: tls: bad certificate
1753814380582 2025-07-29T18:39:40.582Z 2025/07/29 18:39:40 http: TLS handshake error from 172.31.150.172:33952: remote error: tls: bad certificate
1753814380531 2025-07-29T18:39:40.531Z 2025/07/29 18:39:40 http: TLS handshake error from 172.31.98.149:35838: remote error: tls: bad certificate
1753814380530 2025-07-29T18:39:40.530Z 2025/07/29 18:39:40 http: TLS handshake error from 172.31.150.172:33938: remote error: tls: bad certificate
logs from an affected pod:
1753814410873 2025-07-29T18:40:10.873Z [ 32130.028449s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Removing endpoint addr=172.31.96.31:8090
1753814410873 2025-07-29T18:40:10.873Z [ 32130.028431s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=172.31.116.194:8090
1753814410872 2025-07-29T18:40:10.872Z [ 32130.028199s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Removing endpoint addr=172.31.96.31:8086
1753814410872 2025-07-29T18:40:10.872Z [ 32130.028166s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=172.31.116.194:8086
1753814410285 2025-07-29T18:40:10.285Z [ 31514.472947s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Removing endpoint addr=172.31.96.31:8090
1753814410285 2025-07-29T18:40:10.285Z [ 31514.472925s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=172.31.116.194:8090
1753814410285 2025-07-29T18:40:10.285Z [ 31514.472749s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Removing endpoint addr=172.31.96.31:8086
1753814410285 2025-07-29T18:40:10.285Z [ 31514.472699s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=172.31.116.194:8086
1753814405871 2025-07-29T18:40:05.871Z [ 32125.027138s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Removing endpoint addr=172.31.127.99:8086
1753814405871 2025-07-29T18:40:05.871Z [ 32125.027119s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=172.31.150.161:8086
1753814405871 2025-07-29T18:40:05.871Z [ 32125.027033s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Removing endpoint addr=172.31.127.99:8090
1753814405871 2025-07-29T18:40:05.871Z [ 32125.027013s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=172.31.150.161:8090
1753814405782 2025-07-29T18:40:05.782Z [ 32124.937344s] WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.127.99:8090}: linkerd_reconnect: Failed to connect error=endpoint 172.31.127.99:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
1753814405283 2025-07-29T18:40:05.283Z [ 31509.471161s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Removing endpoint addr=172.31.127.99:8086
1753814405283 2025-07-29T18:40:05.283Z [ 31509.471146s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=172.31.150.161:8086
1753814405283 2025-07-29T18:40:05.283Z [ 31509.470994s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Removing endpoint addr=172.31.127.99:8090
1753814405283 2025-07-29T18:40:05.283Z [ 31509.470945s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=172.31.150.161:8090
1753814404675 2025-07-29T18:40:04.675Z [ 32123.831157s] WARN ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=172.31.127.99:8090}: linkerd_reconnect: Service failed error=endpoint 172.31.127.99:8090: channel closed error.sources=[channel closed]
1753814404571 2025-07-29T18:40:04.571Z [ 32123.726780s] WARN ThreadId(01) watch{port=4191}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message="client 172.31.127.98:56700: server: 172.31.96.31:8090: server 172.31.96.31:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast"
1753814404571 2025-07-29T18:40:04.571Z [ 32123.726763s] WARN ThreadId(01) watch{port=11211}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message="client 172.31.127.98:56700: server: 172.31.96.31:8090: server 172.31.96.31:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast"
1753814404571 2025-07-29T18:40:04.571Z [ 32123.726719s] WARN ThreadId(01) watch{port=9150}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message="client 172.31.127.98:56700: server: 172.31.96.31:8090: server 172.31.96.31:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast"
1753814397991 2025-07-29T18:39:57.991Z [ 32117.146960s] WARN ThreadId(01) watch{port=9150}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message="client 172.31.127.98:46618: server: 172.31.127.99:8090: server 172.31.127.99:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast"
1753814397982 2025-07-29T18:39:57.982Z [ 31502.170125s] WARN ThreadId(01) watch{port=11211}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message=""
1753814397982 2025-07-29T18:39:57.982Z [ 31502.170071s] WARN ThreadId(01) watch{port=4191}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message=""
1753814395868 2025-07-29T18:39:55.868Z [ 32115.024323s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Removing endpoint addr=172.31.150.175:8086
1753814395868 2025-07-29T18:39:55.868Z [ 32115.024308s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=172.31.147.96:8086
1753814395868 2025-07-29T18:39:55.868Z [ 32115.024178s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Removing endpoint addr=172.31.150.175:8090
1753814395868 2025-07-29T18:39:55.868Z [ 32115.024142s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=172.31.147.96:8090
1753814395281 2025-07-29T18:39:55.281Z [ 31499.468841s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Removing endpoint addr=172.31.150.175:8086
1753814395281 2025-07-29T18:39:55.281Z [ 31499.468827s] INFO ThreadId(01) dst:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}: linkerd_pool_p2c: Adding endpoint addr=172.31.147.96:8086
1753814395281 2025-07-29T18:39:55.281Z [ 31499.468500s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Removing endpoint addr=172.31.150.175:8090
1753814395281 2025-07-29T18:39:55.281Z [ 31499.468462s] INFO ThreadId(01) policy:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}: linkerd_pool_p2c: Adding endpoint addr=172.31.147.96:8090
1753814391237 2025-07-29T18:39:51.237Z [ 32110.392422s] WARN ThreadId(01) watch{port=9150}: linkerd_app_inbound::policy::api: Unexpected policy controller response; retrying with a backoff grpc.status=Deadline expired before operation could complete grpc.message="client 172.31.127.98:34624: server: 172.31.150.175:8090: server 172.31.150.175:8090: service linkerd-policy.linkerd.svc.cluster.local:8090: service in fail-fast"
output of linkerd check -o short
Status check results are √
Environment
- Kubernetes Version: 1.33
- Cluster Environment: EKS (Auto)
- Host OS: Bottlerocket
- Linkerd version: 25.6.1
- Native Sidecar: true
Possible solution
No response
Additional context
helm command
helm upgrade --install --render-subchart-notes --set-file linkerd-control-plane.identityTrustAnchorsPEM=./.certs/root-ca.crt \
--set-file linkerd-control-plane.identity.issuer.tls.crtPEM=./.certs/$(cluster)-ca.crt \
--set-file linkerd-control-plane.identity.issuer.tls.keyPEM=./.certs/$(cluster)-ca.key \
./
helm chart.yml:
apiVersion: v2
name: linkerd-control-plane
description: linkerd-control-plane
type: application
version: 1.1.20
appVersion: "25.6.1"
dependencies:
- name: linkerd-control-plane
version: 2025.6.1
repository: https://helm.linkerd.io/edge
helm chart values (resources removed for brevity):
linkerd-control-plane:
disableHeartBeat: true
clusterNetworks: "10.0.0.0/8,100.64.0.0/10,192.168.0.0/16,fd00::/8,172.31.96.0/19,172.31.128.0/19,172.31.160.0/19"
proxy:
defaultInboundPolicy: all-authenticated
nativeSidecar: true
metrics:
hostnameLabels: true
enablePodAntiAffinity: true
enablePodDisruptionBudget: true
controller:
podDisruptionBudget:
maxUnavailable: 1
minAvailable: 1
deploymentStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 25%
controllerReplicas: 3
webhookFailurePolicy: Fail
highAvailability: true
Would you like to work on fixing this bug?
None