Skip to content

Skip Pod metrics if any container skipped#1781

Open
dippynark wants to merge 1 commit into
kubernetes-sigs:masterfrom
dippynark:skip-pod-if-any-container-skipped
Open

Skip Pod metrics if any container skipped#1781
dippynark wants to merge 1 commit into
kubernetes-sigs:masterfrom
dippynark:skip-pod-if-any-container-skipped

Conversation

@dippynark
Copy link
Copy Markdown

@dippynark dippynark commented Mar 23, 2026

What this PR does / why we need it:

Currently, if a container's start time or CPU seconds decreases, we exclude that container's resource usage from the exposed metrics for its Pod:

if last.StartTime.Before(prev.StartTime) {
return corev1.ResourceList{}, api.TimeInfo{}, fmt.Errorf("unexpected decrease in startTime of node/container")
}
if last.CumulativeCPUUsed < prev.CumulativeCPUUsed {
return corev1.ResourceList{}, api.TimeInfo{}, fmt.Errorf("unexpected decrease in cumulative CPU usage value")
}

In particular, if a Pod has multiple containers and one of the containers hits this condition, the Pod's metrics are still reported but with that container's contribution excluded. This can look like the Pod's utilisation is significantly lower than it actually is which can be especially problematic when using this information for horizontal or vertical scaling.

This PR changes this behaviour to exclude the entire Pod, favouring accurate metrics over unreliable/inconsistent metrics.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dippynark
Once this PR has been reviewed and has the lgtm label, please assign serathius for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from serathius March 23, 2026 18:04
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 23, 2026
@github-project-automation github-project-automation Bot moved this to Needs Triage in SIG Instrumentation Mar 23, 2026
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 23, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @dippynark. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 23, 2026
@dippynark
Copy link
Copy Markdown
Author

/assign @RainbowMango

@dgrisonnet
Copy link
Copy Markdown
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 4, 2026
@dgrisonnet
Copy link
Copy Markdown
Member

@dippynark thank you for the contribution. Do you think you could write a test for the scenario you described?

@dgrisonnet
Copy link
Copy Markdown
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

4 participants