Skip to content

Linkerd-destinaton pod restarts increases open connection on application where data plane proxy is enabled.  #11315

@nomis4u

Description

@nomis4u

What is the issue?

We were doing perf on stage env with Linkerd proxy enabled on our Ingress layer traefik. Linkerd destination deployment memory limit was set to 500Mi. When traffic increased as part of perf, destination pod started to restart due to OOM. When all destination pods became unavailable open connection on the Traefik started to increase and that in-turn increased the traefik memory usage. (Traefik is just an example here. It can happen for any app)

We did another experiment by taking down all the the Linkerd destination pod. In that case also the same behavior of open connection issue was seen. Another observation was that, when destination pods were down, the number calls to kubernetes api server increased and that increased the k8s api latency. This restricted us from patching any deployment because calls to k8s api was timing out. In short, if destination pod is down the entire cluster can potentially go down.

Two questions:

  1. Does linkerd-destination pod has to be up and running every time to make data plane proxy work without any issue ?
  2. Why does calls to k8s api server increased when linkerd-destination pod were down ?

How can it be reproduced?

Running perf by taking down linkerd-destination pods will reproduce the issue.

Logs, error output, etc

Sharing some screenshots. At 18:20 we did a perf with linkerd-destination pod running and everything was working as expected..
Around 18:50 we did another perf with memory limit on linkerd-destination deployment and it started to restart due to OOM and at the same time open connection on Traefik increased. Number of requests and latency to k8s api also increased.

Screenshot 2023-08-30 at 8 53 30 PM

Screenshot 2023-08-30 at 7 52 02 PM

Screenshot 2023-08-30 at 7 52 36 PM

output of linkerd check -o short

linkerd-multicluster
--------------------
× all mirror services have endpoints
    Some mirror services do not have endpoints:
    api-web-dev-serve.perf mirrored from cluster [dev-serve]
    test-nginx-svc-dev-serve.test-ns mirrored from cluster [dev-serve]
    podinfo-dev-serve.test mirrored from cluster [dev-serve]
    podinfo-dev-serve.test1 mirrored from cluster [dev-serve]
    see https://linkerd.io/2.14/checks/#l5d-multicluster-services-endpoints for hints

\ Running viz extension check linkerd-failover
----------------
‼ Linkerd extension command linkerd-failover exists
    exec: "linkerd-failover": executable file not found in $PATH
    see https://linkerd.io/2.14/checks/#extensions for hints

linkerd-smi
-----------
‼ Linkerd extension command linkerd-smi exists
    exec: "linkerd-smi": executable file not found in $PATH
    see https://linkerd.io/2.14/checks/#extensions for hints

Status check results are ×

Environment

Kubernetes version: 1.24
Cluster Environment: EKS
Host OS: Ubuntu
Linkerd version: 2.14.0

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions