Skip to content

Add default requests and limits for ocs-operator pods #3339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mrudraia1
Copy link

@mrudraia1 mrudraia1 commented Jul 1, 2025

This PR adds the default requests and limits for the ocs-operator pods.
The default metrics are set for the following pods

  • ocs-operator
  • ocs-metircs-exporter
  • ux-backend
  • crash-collector
  • log-collector

Copy link
Contributor

openshift-ci bot commented Jul 1, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 1, 2025
Copy link
Contributor

openshift-ci bot commented Jul 1, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mrudraia1
Once this PR has been reviewed and has the lgtm label, please assign nbalacha for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mrudraia1 mrudraia1 force-pushed the req-lim branch 3 times, most recently from b49ab89 to 1a61568 Compare July 1, 2025 09:23
@mrudraia1 mrudraia1 marked this pull request as ready for review July 1, 2025 09:23
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 1, 2025
},
Limits: corev1.ResourceList{
"memory": resource.MustParse("1.5Gi"),
"cpu": resource.MustParse("1m"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1m?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be "1"

"cpu": resource.MustParse("250m"),
},
Limits: corev1.ResourceList{
"memory": resource.MustParse("1.5Gi"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the exporter go till 1.5Gi, seems a lot.

Copy link
Member

@Nikhil-Ladha Nikhil-Ladha Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have seen memory consumption reaching till 1.22Gi for performance, scale tests ran during testing, therefore setting a safe limit of 1.5Gi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the exporter stateless? In your tests what made the mem increase and maybe we have mem leaks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is stateless 🤔
The tests comprised of IO to fill up the cluster, PVC creation, deletion, PVC expansion, small file IOs and such.
Though, this was the only set of tests where we observed such a hike. In the other cases, the maximum consumption was ~ 700Mi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a bit more investigation, I understand even w/o this PR the exporter has no bounds but we better take this moment to re-look at the exporter and come to a conclusion for bumping the values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@Nikhil-Ladha Nikhil-Ladha Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering we don't have any ideas as to why we are seeing a spike, I would suggest to take this PR in and work on parallel to identify any memory leaks or such issues that could lead to a memory spike in exporter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants