Skip to content

[Bug]: Increased and erratic memory in the Nginx Pods leading to OOM Kills - appears to be introduced by v3.7.x #6860

Closed
@MarkTopping

Description

@MarkTopping

Version

3.7.0

What Kubernetes platforms are you running on?

AKS Azure

Steps to reproduce

I believe that changes in version 3.7.0 or 3.7.1 have introduced a memory consumption issue.

We had to rollback a version bump from v3.6.2 to v3.7.1 today after our Nginx IC Pods all crashed due to OOM Kills. To make matters worse, due to Bug 4604 the Pods then failed to restart (without manual intervention) leading to obvious impact.

Our subsequent investigation after our outage revealed that the memory consumption on the Nginx Pods changed quite dramatically after the release as shown by the following 2 charts.

1st Example
In our least used environment we didn't incur any OOM Kills, but todays investigation revealed how memory usage has both increased, and also become more 'spikey' since we performed the upgrade:

Image

2nd Example
This screenshot shows the IC Pods memory consumption after a release of v3.7.1 into a more busy environment and a subsequent rollback this morning.

Image

What this graph doesn't capture is that the memory went above the 1500MiB line for all Pods in the deployment and thus were OOM Killed. This isn't shown because the metrics are exported every minute and so we just have the last datapoint that happened to be collected before the OOM Kill.

I guess it's worth noting that we also bumped our Helm Chart (not just the image version) with our release. The only notable change with that chart was the explicit creation of the Leader Election resource which I think Nginx used to just create by itself after deployment.

Some environment notes:

  • Azure AKS - 1.30.5
  • Using feature: Mergable Ingress Types
  • Ingress resource count: 516
  • IC Pod Count: 6
  • Memory Request & Limit: 1500MiB per pod
  • ReadOnlyRootFileSystem: true

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugAn issue reporting a potential bugin reviewGathering informationneeds triageAn issue that needs to be triaged

    Type

    No type

    Projects

    Status

    Done 🚀

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions