-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
What is the issue?
We were doing perf on stage env with Linkerd proxy enabled on our Ingress layer traefik. Linkerd destination deployment memory limit was set to 500Mi. When traffic increased as part of perf, destination pod started to restart due to OOM. When all destination pods became unavailable open connection on the Traefik started to increase and that in-turn increased the traefik memory usage. (Traefik is just an example here. It can happen for any app)
We did another experiment by taking down all the the Linkerd destination pod. In that case also the same behavior of open connection issue was seen. Another observation was that, when destination pods were down, the number calls to kubernetes api server increased and that increased the k8s api latency. This restricted us from patching any deployment because calls to k8s api was timing out. In short, if destination pod is down the entire cluster can potentially go down.
Two questions:
- Does linkerd-destination pod has to be up and running every time to make data plane proxy work without any issue ?
- Why does calls to k8s api server increased when linkerd-destination pod were down ?
How can it be reproduced?
Running perf by taking down linkerd-destination pods will reproduce the issue.
Logs, error output, etc
Sharing some screenshots. At 18:20 we did a perf with linkerd-destination pod running and everything was working as expected..
Around 18:50 we did another perf with memory limit on linkerd-destination deployment and it started to restart due to OOM and at the same time open connection on Traefik increased. Number of requests and latency to k8s api also increased.
output of linkerd check -o short
linkerd-multicluster
--------------------
× all mirror services have endpoints
Some mirror services do not have endpoints:
api-web-dev-serve.perf mirrored from cluster [dev-serve]
test-nginx-svc-dev-serve.test-ns mirrored from cluster [dev-serve]
podinfo-dev-serve.test mirrored from cluster [dev-serve]
podinfo-dev-serve.test1 mirrored from cluster [dev-serve]
see https://linkerd.io/2.14/checks/#l5d-multicluster-services-endpoints for hints
\ Running viz extension check linkerd-failover
----------------
‼ Linkerd extension command linkerd-failover exists
exec: "linkerd-failover": executable file not found in $PATH
see https://linkerd.io/2.14/checks/#extensions for hints
linkerd-smi
-----------
‼ Linkerd extension command linkerd-smi exists
exec: "linkerd-smi": executable file not found in $PATH
see https://linkerd.io/2.14/checks/#extensions for hints
Status check results are ×
Environment
Kubernetes version: 1.24
Cluster Environment: EKS
Host OS: Ubuntu
Linkerd version: 2.14.0
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
None