Skip to content

add CPU request validation in NRI CreateContainer hook#110

Open
AutuSnow wants to merge 1 commit into
kubernetes-sigs:mainfrom
AutuSnow:feat/add_req_validation
Open

add CPU request validation in NRI CreateContainer hook#110
AutuSnow wants to merge 1 commit into
kubernetes-sigs:mainfrom
AutuSnow:feat/add_req_validation

Conversation

@AutuSnow
Copy link
Copy Markdown
Contributor

@AutuSnow AutuSnow commented Apr 6, 2026

Validation rules:

  1. If container CPU request is specified, it must exactly match claim allocation
  2. Pod-level resources validation (PLR) is a placeholder for future implementation

fixs:#108

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 6, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: AutuSnow
Once this PR has been reviewed and has the lgtm label, please assign klueska for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 6, 2026
@AutuSnow AutuSnow changed the title add CPU request validation in NRI CreateContainer hook [WIP] add CPU request validation in NRI CreateContainer hook Apr 6, 2026
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 6, 2026
@AutuSnow AutuSnow force-pushed the feat/add_req_validation branch 6 times, most recently from 3f4ac1e to 6564d4f Compare April 12, 2026 08:19
@AutuSnow AutuSnow changed the title [WIP] add CPU request validation in NRI CreateContainer hook add CPU request validation in NRI CreateContainer hook Apr 12, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 12, 2026
@AutuSnow
Copy link
Copy Markdown
Contributor Author

AutuSnow commented Apr 12, 2026

/cc @pravk03 @ffromani

Copy link
Copy Markdown
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I see the point here and I'm supportive of this change. Mostly improvement suggestions inline.

Comment thread pkg/driver/validation.go
@@ -0,0 +1,75 @@
/*
Copyright The Kubernetes Authors.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to have a pkg/validate or pkg/driver/validate (sub)package, and make this API public?
perhaps it is time to start our own internal hierarchy?

I'm actually asking, I don't have strong objections keeping this code here, except the fact the validation function is correctly private and we should try to not test directly private functions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I have been thinking about creating a new level of the project because the 0.1 version was not completed during the development phase, so I still added and modified files on the original level

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should maintain the current structure because this is the first verification function, and it is uncertain how much verification logic will be added in the future. If more verifications (such as memory, device, etc.) are added in the future, they can be refactored as pkg/validation/packages (if shared among multiple packages)
Regarding the issue of testing private functions: I believe that for pure logic validation functions, unit testing private functions is reasonable because the validation logic is complex and requires detailed unit testing coverage. E2E testing already covers integration scenarios, and if only public API testing is used, it will make the test cases too complex

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may end up accepting testing of private function but as general rule I want to try hard against testing private functions directly (yes, this means there's some tech debt to clear over time). Each time should be a documented exception, not a habit we gradually develop.
So let's think a bit harder indeed. Perhaps we can turn the return value of parseDRAEnvToClaimAllocations into a proper type and add a Validate method to it with the current logic within?

Comment thread pkg/driver/nri_hooks.go Outdated
klog.Infof("No guaranteed CPUs found in DRA env for pod %s/%s container %s. Using shared CPUs %s", pod.Namespace, pod.Name, ctr.Name, sharedCPUs.String())
adjust.SetLinuxCPUSetCPUs(sharedCPUs.String())
} else {
// Validate CPU requests match claim allocations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment reiterate in english what the pretty explicit code is down a line below, so I'd remove it or repurpose to explain the "why", if it deserves explanation at all

Comment thread pkg/driver/validation.go Outdated
Comment on lines +40 to +42
// minCPUShares is the Kubernetes minimum for best-effort containers (no CPU request).
// Shares == 2 means no explicit CPU request was set; skip validation in that case.
const minCPUShares = uint64(2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a file-level constant, it is important enough.

Comment thread pkg/driver/validation.go Outdated
containerCPUShares := ctr.Linux.Resources.Cpu.Shares.Value
containerCPURequest := float64(containerCPUShares) / 1024.0

const tolerance = 0.01
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise. And how this value was computed? If it's just our first educated guess, fine, but let's explicitly document it.

Comment thread pkg/driver/validation.go Outdated
Comment on lines +59 to +72
if pod.Linux != nil && pod.Linux.PodResources != nil && pod.Linux.PodResources.Cpu != nil {
if pod.Linux.PodResources.Cpu.Shares != nil && pod.Linux.PodResources.Cpu.Shares.Value > 0 {
podLevelCPUShares := pod.Linux.PodResources.Cpu.Shares.Value
podLevelCPURequest := float64(podLevelCPUShares) / 1024.0

klog.V(4).InfoS("pod has pod-level CPU request",
"namespace", pod.Namespace,
"pod", pod.Name,
"podLevelCPURequest", podLevelCPURequest,
"container", ctr.Name,
"claimCPUs", totalClaimCPUs,
)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this just logging? If so, why are we doing inside the validation? Should it be its own little function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! The purpose of this code is to:

  1. Current status: This is a placeholder used to record the existence of Pod Level Resources, in preparation for future PLR (Pod Level Resources) validation
  2. Why is it inside the validation function: Because we have already accessed the resource information of pods and containers here to avoid repeated traversal
  3. Why is it just a log: According to the PR description, PLR validation is "placeholder for future implementation"

},
}

for _, tc := range tests {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code has tests which exercise significant differences, it would be nice to add tests which exercise the smallest difference and any possible edge cases (can't really think of anything atm, but I haven't tried hard enough yet).

Copy link
Copy Markdown
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added missing notes about the e2e tests

Comment on lines +110 to +117
ginkgo.By("waiting for pod to fail with CreateContainerError")
gomega.Eventually(ctx, func(ctx context.Context) (*v1.Pod, error) {
return fxt.K8SClientset.CoreV1().Pods(fxt.Namespace.Name).Get(ctx, pod.Name, metav1.GetOptions{})
}).
WithTimeout(2*time.Minute).
WithPolling(5*time.Second).
Should(BeFailedToCreate(fxt), "pod should fail to create container")
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test LGTM, but I wonder if we have a documented API (not the reason/error message text format) to catch this specific rejection

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your reminder. I understand your concerns, but although CreateContainerError is not formally documented, it is not a formally documented constant in the Kubernetes API. It is a string value dynamically set by kubelet at runtime. But in the Kubernetes ecosystem, it is a factual standard, which is the standard reason value used by kubelets when container creation fails. If a more robust solution is needed, specific error text in the Message field can be checked additionally

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if strings.Contains(cntSt.State.Waiting.Message, "CPU request validation failed") {
               xxxxx
}

Comment on lines +137 to +143
ginkgo.By("waiting for pod to fail with CreateContainerError")
gomega.Eventually(ctx, func(ctx context.Context) (*v1.Pod, error) {
return fxt.K8SClientset.CoreV1().Pods(fxt.Namespace.Name).Get(ctx, pod.Name, metav1.GetOptions{})
}).
WithTimeout(2*time.Minute).
WithPolling(5*time.Second).
Should(BeFailedToCreate(fxt), "pod should fail to create container")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines +147 to +167
ginkgo.Context("without CPU requests specified", func() {
ginkgo.It("should successfully create container", func(ctx context.Context) {
fxt := rootFxt.WithPrefix("no-request")
gomega.Expect(fxt.Setup(ctx)).To(gomega.Succeed())
ginkgo.DeferCleanup(fxt.Teardown)

claimCPUs := int64(1)

ginkgo.By("creating a ResourceClaim")
claim := makeResourceClaim(fxt.Namespace.Name, "test-claim", claimCPUs, cpuDeviceMode)
claim, err := fxt.K8SClientset.ResourceV1().ResourceClaims(fxt.Namespace.Name).Create(ctx, claim, metav1.CreateOptions{})
gomega.Expect(err).ToNot(gomega.HaveOccurred())

ginkgo.By("creating a Pod without CPU request")
pod := makePodWithClaim(fxt.Namespace.Name, "test-pod", claim.Name, nil, nil)
pod, err = fxt.K8SClientset.CoreV1().Pods(fxt.Namespace.Name).Create(ctx, pod, metav1.CreateOptions{})
gomega.Expect(err).ToNot(gomega.HaveOccurred())

ginkgo.By("waiting for pod to be running")
err = e2epod.WaitToBeRunning(ctx, fxt.K8SClientset, pod.Namespace, pod.Name)
gomega.Expect(err).ToNot(gomega.HaveOccurred(), "pod should be running")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you added this but I'm thinking if this should actually fail - changing the current driver behavior

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question about the "without CPU requests specified" test case.
After discussion with @pravk03 , we decided to keep the validation lightweight in the CreateContainer hook:

  • Validate when CPU request is set: Enforce that container.resources.requests.cpu matches the claim allocation
  • Skip validation when no CPU request: Allow best-effort containers with claims (shares=2) to pass through
  • Rationale: Avoid making CreateContainer too heavy. Full PLR validation will be handled by the scheduler (KEP-5517). The driver serves as a "final line of defense" for the most common misconfiguration
    (explicit request mismatch).
    This approach balances validation coverage with performance. WDYT?

@AutuSnow AutuSnow force-pushed the feat/add_req_validation branch 2 times, most recently from b4cdc44 to a59ffbd Compare April 13, 2026 15:12
Comment thread pkg/driver/validation.go
totalClaimCPUs := ca.TotalCPUs()

if ctr.Linux != nil && ctr.Linux.Resources != nil && ctr.Linux.Resources.Cpu != nil {
if ctr.Linux.Resources.Cpu.Shares != nil && ctr.Linux.Resources.Cpu.Shares.Value > minCPUShares {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets also document that if shares are not set at container level, validation will pass. This is a valid scenario when pod level resources are set.

Comment thread pkg/driver/validation.go Outdated
}
}

if pod.Linux != nil && pod.Linux.PodResources != nil && pod.Linux.PodResources.Cpu != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect. From reviewing the kubelet code, this check doesn't strictly confirm that Pod Level Resources (PLR) are enabled. If PLR isn't specified, the kubelet defaults this value to the sum of container requests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a way to distinguish explicit PLR set in the pod. If we can, we could add an additional validation step - fail the validation if neither a container-level request nor a pod-level resource is specified.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the NRI hook, pod.Linux.PodResources is always populated by kubelet — either from explicit PLR or as the sum of container requests (when PLR feature gate is disabled). There's no reliable way to distinguish the two at this layer. The current approach skips validation when container-level shares are not set (shares <= 2), which correctly handles both best-effort containers and PLR scenarios. Full PLR validation is deferred to the scheduler per KEP-5517.Removed the incorrect PLR detection block accordingly.

@AutuSnow AutuSnow force-pushed the feat/add_req_validation branch 2 times, most recently from e0b4db8 to 96f2585 Compare April 14, 2026 12:59
@AutuSnow
Copy link
Copy Markdown
Contributor Author

/retest

Signed-off-by: qiuxue <liuyutao36@gmail.com>
@AutuSnow AutuSnow force-pushed the feat/add_req_validation branch from 96f2585 to c9ff84b Compare April 14, 2026 14:17
@AutuSnow
Copy link
Copy Markdown
Contributor Author

/test pull-dra-driver-cpu-e2e-device-mode-grouped-arm64
/test pull-dra-driver-cpu-e2e-device-mode-individual-arm64

@pravk03
Copy link
Copy Markdown
Contributor

pravk03 commented Apr 17, 2026

I've been thinking a bit more about this. Since KEP-5517 is currently in alpha, I'm concerned that the new validation introduced in this PR might prevent experimentation when the alpha feature gate is enabled.

Initially, I was hoping we could include some validation for Pod Level Resources—specifically checking that the pod-level budget is at least equal to the sum of container-level standard requests + DRA claims. However, during the PR's implementation and review, it has become clear that we can't reliably perform these pod-level validations.

Given this, I'm wondering if we should hold off on adding this validation in code for now, and instead focus on improving our documentation around workload requirements (both with and without KEP-5517)?

@AutuSnow, I know we previously discussed having this validation and I was initially on board with it, but I hadn't fully thought through the KEP-5517 implications at the time. Sorry for the back and forth here!

I would love to hear your thoughts on this @AutuSnow and @ffromani.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 17, 2026
@ffromani
Copy link
Copy Markdown
Contributor

I've been thinking a bit more about this. Since KEP-5517 is currently in alpha, I'm concerned that the new validation introduced in this PR might prevent experimentation when the alpha feature gate is enabled.

Initially, I was hoping we could include some validation for Pod Level Resources—specifically checking that the pod-level budget is at least equal to the sum of container-level standard requests + DRA claims. However, during the PR's implementation and review, it has become clear that we can't reliably perform these pod-level validations.

Given this, I'm wondering if we should hold off on adding this validation in code for now, and instead focus on improving our documentation around workload requirements (both with and without KEP-5517)?

@AutuSnow, I know we previously discussed having this validation and I was initially on board with it, but I hadn't fully thought through the KEP-5517 implications at the time. Sorry for the back and forth here!

I would love to hear your thoughts on this @AutuSnow and @ffromani.

/hold

My very initial thought is that the conflict between validation and KEP-5517 largely depends on the version skew we allow and support. IOW, which versions of the driver is compatible with the kubernetes version?
If this validation would have been merged in time for 0.1.0, the merge process would have been much more straightforward I reckon, because the (implicit) pairing would have been kube 1.35 with the driver 0.1.0.

I think is worth clarifying the interactions we expect. We probably need just a simple table (e.g. kube <= 1.35 -> driver 0.1.0) and/or few versions range.

That said, I'll deep dive in the PR and provide more informed comments.

@AutuSnow
Copy link
Copy Markdown
Contributor Author

This question has been debated for several days. Can we add the -- enable cpu request validation flag (default true) to be disabled during the KEP-5517 experiment, so as to preserve the safety net of regular scenarios

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants