Skip to content

Conversation

ffromani
Copy link

@ffromani ffromani commented Aug 4, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

Backport of the upstream fix for https://issues.redhat.com/browse/OCPBUGS-56785

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/OCPBUGS-56785

Special notes for your reviewer:

NA

Does this PR introduce a user-facing change?

Change the node-local podresources API endpoint to only consider of active pods. Because this fix changes a long-established behavior, users observing a regressions can use the KubeletPodResourcesListUseActivePods feature gate (default on) to restore the old behavior. Please file an issue if you encounter problems and have to use the Feature Gate.

The podresources API List implementation uses the internal data of the
resource managers as source of truth.
Looking at the implementation here:
https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/pkg/kubelet/apis/podresources/server_v1.go#L60
we take care of syncing the device allocation data before querying the
device manager to return its pod->devices assignment.
This is needed because otherwise the device manager (and all the other
resource managers) would do the cleanup asynchronously, so the `List` call
will return incorrect data.

But we don't do this syncing neither for CPUs or for memory,
so when we report these we will get stale data as the issue kubernetes#132020 demonstrates.

For CPU manager, we however have the reconcile loop which cleans the stale data periodically.
Turns out this timing interplay was actually the reason the existing issue kubernetes#119423 seemed fixed
(see: kubernetes#119423 (comment)).
But it's actually timing. If in the reproducer we set the `cpuManagerReconcilePeriod` to a time
very high (>= 5 minutes), then the issue still reproduces against current master branch
(https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/test/e2e_node/podresources_test.go#L983).

Taking a step back, we can see multiple problems:
1. not syncing the resource managers internal data before to query for
   pod assignment (no removeStaleState calls) but most importantly
2. the List call iterate overs all the pod known to the kubelet. But the
   resource managers do NOT hold resources for non-running pod, so it is
   better, actually it's correct to iterate only over the active pods.
   This will also avoid issue 1 above.

Furthermore, the resource managers all iterate over the active pods
anyway:
`List` is using all the pods known about:
1. https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/pkg/kubelet/kubelet.go#L3135 goes in
2. https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/pkg/kubelet/pod/pod_manager.go#L215

But all the resource managers are using the list of active pods:
1. https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/pkg/kubelet/kubelet.go#L1666 goes in
2. https://github.com/kubernetes/kubernetes/blob/v1.34.0-alpha.0/pkg/kubelet/kubelet_pods.go#L198

So this change will also make the `List` view consistent with the
resource managers view, which is also a promise of the API currently
broken.

We also need to acknowledge the the warning in the docstring of GetActivePods.
Arguably, having the endpoint using a different podset wrt the resource managers with the
related desync causes way more harm than good.
And arguably, it's better to fix this issue in just one place instead of
having the `List` use a different pod set for unclear reason.
For these reasons, while important, I don't think the warning per se
invalidated this change.

We need to further acknowledge the `List` endpoint used the full pod
list since its inception. So, we will add a Feature Gate to disable this
fix and restore the old behavior. We plan to keep this Feature Gate for
quite a long time (at least 4 more releases) considering how stable this
change was. Should a consumer of the API being broken by this change,
we have the option to restore the old behavior and to craft a more
elaborate fix.

The old `v1alpha1` endpoint will be not modified intentionally.

***RELEASE-4.19 BACKPORT NOTE***
dropped the versioned feature gate entry as we don't have the versioned
geature gates in this version.

Signed-off-by: Francesco Romani <[email protected]>
In order to facilitate backports (see OCPBUGS-56785) we prefer
to remove the feature gate added as safety measure upstream and
disable this escape hatch upstream added.

This commit must be dropped once we rebase on top of 1.34.

Signed-off-by: Francesco Romani <[email protected]>
@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 4, 2025
@openshift-ci-robot openshift-ci-robot added the backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. label Aug 4, 2025
@openshift-ci-robot
Copy link

@ffromani: the contents of this pull request could not be automatically validated.

The following commits are valid:

The following commits could not be validated and must be approved by a top-level approver:

Comment /validate-backports to re-evaluate validity of the upstream PRs, for example when they are merged upstream.

@ffromani ffromani changed the title Podresources list active pods backport 4.19 UPSTREAM: 132028: podresources: list: use active pods Aug 4, 2025
@ffromani
Copy link
Author

ffromani commented Aug 4, 2025

/jira cherrypick OCPBUGS-56785

@openshift-ci-robot
Copy link

@ffromani: An error was encountered cloning bug for cherrypick for bug OCPBUGS-56785 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message. No Link Issue Permission for issue 'OCPBUGS-60074': request failed. Please analyze the request body for more details. Status code: 401:

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.
/retitle : UPSTREAM: 132028: podresources: list: use active pods

In response to this:

/jira cherrypick OCPBUGS-56785

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot changed the title UPSTREAM: 132028: podresources: list: use active pods : UPSTREAM: 132028: podresources: list: use active pods Aug 4, 2025
@openshift-ci openshift-ci bot requested review from mrunalp and sjenning August 4, 2025 06:33
@ffromani ffromani changed the title : UPSTREAM: 132028: podresources: list: use active pods OCPBUGS-60074: UPSTREAM: 132028: podresources: list: use active pods Aug 4, 2025
@openshift-ci-robot
Copy link

@ffromani: An error was encountered adding this pull request to the external tracker bugs for bug OCPBUGS-60074 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message. failed to add remote link: failed to add link: No Link Issue Permission for issue 'OCPBUGS-60074'.: request failed. Please analyze the request body for more details. Status code: 403:

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

In response to this:

What type of PR is this?

/kind bug

What this PR does / why we need it:

Backport of the upstream fix for https://issues.redhat.com/browse/OCPBUGS-56785

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/OCPBUGS-56785

Special notes for your reviewer:

NA

Does this PR introduce a user-facing change?

Change the node-local podresources API endpoint to only consider of active pods. Because this fix changes a long-established behavior, users observing a regressions can use the KubeletPodResourcesListUseActivePods feature gate (default on) to restore the old behavior. Please file an issue if you encounter problems and have to use the Feature Gate.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ffromani
Copy link
Author

ffromani commented Aug 4, 2025

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Aug 4, 2025
@openshift-ci-robot
Copy link

@ffromani: This pull request references Jira Issue OCPBUGS-60074, which is invalid:

  • expected the bug to target the "4.19.z" version, but no target version was set
  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected Jira Issue OCPBUGS-60074 to depend on a bug targeting a version in 4.20.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ffromani
Copy link
Author

ffromani commented Aug 5, 2025

/retest-required

@ffromani
Copy link
Author

ffromani commented Aug 5, 2025

/jira refresh

@openshift-ci-robot
Copy link

@ffromani: This pull request references Jira Issue OCPBUGS-60074, which is invalid:

  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected Jira Issue OCPBUGS-60074 to depend on a bug targeting a version in 4.20.0 and in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ffromani
Copy link
Author

ffromani commented Aug 6, 2025

/retest

@ffromani
Copy link
Author

ffromani commented Aug 6, 2025

/jira refresh

@openshift-ci-robot
Copy link

@ffromani: This pull request references Jira Issue OCPBUGS-60074, which is invalid:

  • expected dependent Jira Issue OCPBUGS-56785 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is ON_QA instead
  • expected dependent Jira Issue OCPBUGS-56785 to target a version in 4.20.0, but it targets "4.20" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ffromani
Copy link
Author

ffromani commented Aug 6, 2025

/jira refresh

@openshift-ci-robot
Copy link

@ffromani: This pull request references Jira Issue OCPBUGS-60074, which is invalid:

  • expected dependent Jira Issue OCPBUGS-56785 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is ON_QA instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ffromani
Copy link
Author

ffromani commented Aug 7, 2025

/retest

@ffromani
Copy link
Author

/jira refresh

@ffromani
Copy link
Author

/retest

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Aug 12, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 13, 2025
@mrunalp mrunalp added approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. and removed backports/unvalidated-commits Indicates that not all commits come to merged upstream PRs. labels Aug 13, 2025
Copy link

openshift-ci bot commented Aug 13, 2025

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: ffromani, haircommander

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ffromani
Copy link
Author

/retest-required

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 138db52 and 2 for PR HEAD 223cf89 in total

Copy link

openshift-ci bot commented Aug 14, 2025

@ffromani: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-disruptive 223cf89 link false /test e2e-aws-disruptive
ci/prow/e2e-aws-single-node 223cf89 link false /test e2e-aws-single-node
ci/prow/e2e-openstack-csi-manila 223cf89 link false /test e2e-openstack-csi-manila

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 138db52 and 2 for PR HEAD 223cf89 in total

@openshift-merge-bot openshift-merge-bot bot merged commit 97b7f2e into openshift:release-4.19 Aug 14, 2025
37 checks passed
@openshift-ci-robot
Copy link

@ffromani: Jira Issue OCPBUGS-60074: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-60074 has been moved to the MODIFIED state.

In response to this:

What type of PR is this?

/kind bug

What this PR does / why we need it:

Backport of the upstream fix for https://issues.redhat.com/browse/OCPBUGS-56785

Which issue(s) this PR fixes:

Fixes https://issues.redhat.com/browse/OCPBUGS-56785

Special notes for your reviewer:

NA

Does this PR introduce a user-facing change?

Change the node-local podresources API endpoint to only consider of active pods. Because this fix changes a long-established behavior, users observing a regressions can use the KubeletPodResourcesListUseActivePods feature gate (default on) to restore the old behavior. Please file an issue if you encounter problems and have to use the Feature Gate.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot

@ffromani: #2391 failed to apply on top of branch "release-4.18":

Applying: UPSTREAM: 132028: podresources: list: use active pods in list
Using index info to reconstruct a base tree...
M	pkg/features/kube_features.go
M	pkg/kubelet/apis/podresources/server_v1.go
M	pkg/kubelet/kubelet.go
A	test/featuregates_linter/test_data/versioned_feature_list.yaml
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): test/featuregates_linter/test_data/versioned_feature_list.yaml deleted in HEAD and modified in UPSTREAM: 132028: podresources: list: use active pods in list. Version UPSTREAM: 132028: podresources: list: use active pods in list of test/featuregates_linter/test_data/versioned_feature_list.yaml left in tree.
Auto-merging pkg/kubelet/kubelet.go
Auto-merging pkg/kubelet/apis/podresources/server_v1.go
CONFLICT (content): Merge conflict in pkg/kubelet/apis/podresources/server_v1.go
Auto-merging pkg/features/kube_features.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 UPSTREAM: 132028: podresources: list: use active pods in list

In response to this:

/cherry-pick release-4.18

better luck this time?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-pod
This PR has been included in build openshift-enterprise-pod-container-v4.19.0-202508140737.p0.g97b7f2e.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: kube-proxy
This PR has been included in build kube-proxy-container-v4.19.0-202508140737.p0.g97b7f2e.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-hyperkube
This PR has been included in build openshift-enterprise-hyperkube-container-v4.19.0-202508140737.p0.g97b7f2e.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-kube-apiserver-artifacts
This PR has been included in build ose-installer-kube-apiserver-artifacts-container-v4.19.0-202508140737.p0.g97b7f2e.assembly.stream.el9.
All builds following this will include this PR.

@haircommander
Copy link
Member

/cherry-pick release-4.18

@openshift-cherrypick-robot

@haircommander: #2391 failed to apply on top of branch "release-4.18":

Applying: UPSTREAM: 132028: podresources: list: use active pods in list
Using index info to reconstruct a base tree...
M	pkg/features/kube_features.go
M	pkg/kubelet/apis/podresources/server_v1.go
M	pkg/kubelet/kubelet.go
A	test/featuregates_linter/test_data/versioned_feature_list.yaml
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): test/featuregates_linter/test_data/versioned_feature_list.yaml deleted in HEAD and modified in UPSTREAM: 132028: podresources: list: use active pods in list. Version UPSTREAM: 132028: podresources: list: use active pods in list of test/featuregates_linter/test_data/versioned_feature_list.yaml left in tree.
Auto-merging pkg/kubelet/kubelet.go
Auto-merging pkg/kubelet/apis/podresources/server_v1.go
CONFLICT (content): Merge conflict in pkg/kubelet/apis/podresources/server_v1.go
Auto-merging pkg/features/kube_features.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 UPSTREAM: 132028: podresources: list: use active pods in list

In response to this:

/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-robot
Copy link

Fix included in accepted release 4.19.0-0.nightly-2025-08-14-135013

shajmakh added a commit to shajmakh/numaresources-operator that referenced this pull request Sep 4, 2025
openshift/kubernetes#2391 has landed in
openshift, which means that the behavior described is enabled by default and we want
the scheduler to adapt to this behavior by default iff the user didn't
explicitly set the informer mode in the CR.

Until now, the scheduler works in a dedicated informer mode which is meant to take into
account the pods in terminal state (such as pods that ran and completed)
in the PFP computation to ensure it matches the PFP computed by the RTE
and reported in NRT.

The intended behavior which the new kubelet behavior is about ignoring
such pods and accounting only for active pods, so if this behavior is
enabled in the RTE-NRT, while kept the default dedicated in the
scheduler there will be misalignment in the computed vs the expected PFP
from scheduler's POV vs NRT's POV and a scheduler stall will happen that
will never recover.

In this commit we adjust the informer default value to Shared (instead
of Dedicated) only if both below conditions are met:
1. the cluster version supports the fixed kubelet which is met if the
   cluster version is equal or greater to the known-to-be fixed OCP version.
2. the user didn't set the Spec.SchedulerInformer field in the NRS CR
This modification will enable the shared mode which in turn includes only
running pods (= active pods) in the PFP computation from POV allowing
ultimately PFP alignment with NRT.

Signed-off-by: Shereen Haj <[email protected]>
(cherry picked from commit 6a56840)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.