Skip to content

Linkerd Destination and Proxy Injector Intermittent OOMKilled #8270

@RajKuni

Description

@RajKuni

What is the issue?

I've noticed the linkerd destination and proxy injector control plane components restart every now and then due to an OOMKilled error.

I am running linkerd w/ the recommended production level configs (e.g. 3 instances of each control plane component).

The destination and injector components have been assigned a 250Mi memory limit.

I notice that all three replicas of these components restart at about the same time (give or take a few minutes) - exiting w/ the same OOMKilled error (error 137).

Here are some resource usage charts. The first one is linkerd destination's resource usage over the past month:

Screen Shot 2022-04-15 at 9 56 46 AM

And this one shows the proxy injector's resource usage over the past month:

Screen Shot 2022-04-15 at 9 56 46 AM

Why do these spikes occur? Perhaps these spikes are associated w/ rollout of a lot pods? But that doesn't explain some of the spikes because I know for sure we didn't do any major rollout.

The linkerd identity component does not show the same behavior.

The cluster that linkerd is running on has several hundred pods running. Could linkerd be running into issues w/ handling that many pods? How many pods can linkerd handle w/ the production level configuration?

Thank you for the help.

How can it be reproduced?

N/A

Logs, error output, etc

This is what the pod state shows for all of the linkerd destination and injector replicas (the times vary by a few minutes):

State:          Running
      Started:      Fri, 15 Apr 2022 05:41:27 -0400
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 04 Apr 2022 19:17:12 -0400
      Finished:     Fri, 15 Apr 2022 05:41:26 -0400

output of linkerd check -o short

Linkerd core checks
===================

kubernetes-version
------------------
× is running the minimum kubectl version
    exec: "kubectl": executable file not found in $PATH
    see https://linkerd.io/2.11/checks/#kubectl-version for hints

linkerd-webhooks-and-apisvc-tls
-------------------------------
‼ proxy-injector cert is valid for at least 60 days
    certificate will expire on 2022-04-16T10:19:53Z
    see https://linkerd.io/2.11/checks/#l5d-proxy-injector-webhook-cert-not-expiring-soon for hints
‼ sp-validator cert is valid for at least 60 days
    certificate will expire on 2022-04-16T10:19:27Z
    see https://linkerd.io/2.11/checks/#l5d-sp-validator-webhook-cert-not-expiring-soon for hints

Status check results are ×

Linkerd extensions checks
=========================

linkerd-viz
-----------
‼ tap API server cert is valid for at least 60 days
    certificate will expire on 2022-06-08T17:09:25Z
    see https://linkerd.io/2.11/checks/#l5d-tap-cert-not-expiring-soon for hints
‼ linkerd-viz pods are injected
    could not find proxy container for prometheus-797c7d558b-hrfqc pod
    see https://linkerd.io/2.11/checks/#l5d-viz-pods-injection for hints
‼ viz extension proxies and cli versions match
    prometheus-797c7d558b-hrfqc running  but cli running stable-2.11.1
    see https://linkerd.io/2.11/checks/#l5d-viz-proxy-cli-version for hints

Status check results are √

Environment

  • Kubernetes Version: 1.20.15-gke.2500
  • Cluster Environment: GKE
  • Host OS: cos_containerd
  • Linkerd version: 2.11.1

Possible solution

N/A

Additional context

N/A

Would you like to work on fixing this bug?

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions