add CPU request validation in NRI CreateContainer hook#110
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: AutuSnow The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
3f4ac1e to
6564d4f
Compare
ffromani
left a comment
There was a problem hiding this comment.
Thanks! I see the point here and I'm supportive of this change. Mostly improvement suggestions inline.
| @@ -0,0 +1,75 @@ | |||
| /* | |||
| Copyright The Kubernetes Authors. | |||
There was a problem hiding this comment.
do we want to have a pkg/validate or pkg/driver/validate (sub)package, and make this API public?
perhaps it is time to start our own internal hierarchy?
I'm actually asking, I don't have strong objections keeping this code here, except the fact the validation function is correctly private and we should try to not test directly private functions.
There was a problem hiding this comment.
Actually, I have been thinking about creating a new level of the project because the 0.1 version was not completed during the development phase, so I still added and modified files on the original level
There was a problem hiding this comment.
I think we should maintain the current structure because this is the first verification function, and it is uncertain how much verification logic will be added in the future. If more verifications (such as memory, device, etc.) are added in the future, they can be refactored as pkg/validation/packages (if shared among multiple packages)
Regarding the issue of testing private functions: I believe that for pure logic validation functions, unit testing private functions is reasonable because the validation logic is complex and requires detailed unit testing coverage. E2E testing already covers integration scenarios, and if only public API testing is used, it will make the test cases too complex
There was a problem hiding this comment.
We may end up accepting testing of private function but as general rule I want to try hard against testing private functions directly (yes, this means there's some tech debt to clear over time). Each time should be a documented exception, not a habit we gradually develop.
So let's think a bit harder indeed. Perhaps we can turn the return value of parseDRAEnvToClaimAllocations into a proper type and add a Validate method to it with the current logic within?
| klog.Infof("No guaranteed CPUs found in DRA env for pod %s/%s container %s. Using shared CPUs %s", pod.Namespace, pod.Name, ctr.Name, sharedCPUs.String()) | ||
| adjust.SetLinuxCPUSetCPUs(sharedCPUs.String()) | ||
| } else { | ||
| // Validate CPU requests match claim allocations |
There was a problem hiding this comment.
the comment reiterate in english what the pretty explicit code is down a line below, so I'd remove it or repurpose to explain the "why", if it deserves explanation at all
| // minCPUShares is the Kubernetes minimum for best-effort containers (no CPU request). | ||
| // Shares == 2 means no explicit CPU request was set; skip validation in that case. | ||
| const minCPUShares = uint64(2) |
There was a problem hiding this comment.
this should be a file-level constant, it is important enough.
| containerCPUShares := ctr.Linux.Resources.Cpu.Shares.Value | ||
| containerCPURequest := float64(containerCPUShares) / 1024.0 | ||
|
|
||
| const tolerance = 0.01 |
There was a problem hiding this comment.
likewise. And how this value was computed? If it's just our first educated guess, fine, but let's explicitly document it.
| if pod.Linux != nil && pod.Linux.PodResources != nil && pod.Linux.PodResources.Cpu != nil { | ||
| if pod.Linux.PodResources.Cpu.Shares != nil && pod.Linux.PodResources.Cpu.Shares.Value > 0 { | ||
| podLevelCPUShares := pod.Linux.PodResources.Cpu.Shares.Value | ||
| podLevelCPURequest := float64(podLevelCPUShares) / 1024.0 | ||
|
|
||
| klog.V(4).InfoS("pod has pod-level CPU request", | ||
| "namespace", pod.Namespace, | ||
| "pod", pod.Name, | ||
| "podLevelCPURequest", podLevelCPURequest, | ||
| "container", ctr.Name, | ||
| "claimCPUs", totalClaimCPUs, | ||
| ) | ||
| } | ||
| } |
There was a problem hiding this comment.
If this just logging? If so, why are we doing inside the validation? Should it be its own little function?
There was a problem hiding this comment.
Good question! The purpose of this code is to:
- Current status: This is a placeholder used to record the existence of Pod Level Resources, in preparation for future PLR (Pod Level Resources) validation
- Why is it inside the validation function: Because we have already accessed the resource information of pods and containers here to avoid repeated traversal
- Why is it just a log: According to the PR description, PLR validation is "placeholder for future implementation"
| }, | ||
| } | ||
|
|
||
| for _, tc := range tests { |
There was a problem hiding this comment.
the code has tests which exercise significant differences, it would be nice to add tests which exercise the smallest difference and any possible edge cases (can't really think of anything atm, but I haven't tried hard enough yet).
ffromani
left a comment
There was a problem hiding this comment.
added missing notes about the e2e tests
| ginkgo.By("waiting for pod to fail with CreateContainerError") | ||
| gomega.Eventually(ctx, func(ctx context.Context) (*v1.Pod, error) { | ||
| return fxt.K8SClientset.CoreV1().Pods(fxt.Namespace.Name).Get(ctx, pod.Name, metav1.GetOptions{}) | ||
| }). | ||
| WithTimeout(2*time.Minute). | ||
| WithPolling(5*time.Second). | ||
| Should(BeFailedToCreate(fxt), "pod should fail to create container") | ||
| }) |
There was a problem hiding this comment.
the test LGTM, but I wonder if we have a documented API (not the reason/error message text format) to catch this specific rejection
There was a problem hiding this comment.
Thank you for your reminder. I understand your concerns, but although CreateContainerError is not formally documented, it is not a formally documented constant in the Kubernetes API. It is a string value dynamically set by kubelet at runtime. But in the Kubernetes ecosystem, it is a factual standard, which is the standard reason value used by kubelets when container creation fails. If a more robust solution is needed, specific error text in the Message field can be checked additionally
There was a problem hiding this comment.
if strings.Contains(cntSt.State.Waiting.Message, "CPU request validation failed") {
xxxxx
}
| ginkgo.By("waiting for pod to fail with CreateContainerError") | ||
| gomega.Eventually(ctx, func(ctx context.Context) (*v1.Pod, error) { | ||
| return fxt.K8SClientset.CoreV1().Pods(fxt.Namespace.Name).Get(ctx, pod.Name, metav1.GetOptions{}) | ||
| }). | ||
| WithTimeout(2*time.Minute). | ||
| WithPolling(5*time.Second). | ||
| Should(BeFailedToCreate(fxt), "pod should fail to create container") |
| ginkgo.Context("without CPU requests specified", func() { | ||
| ginkgo.It("should successfully create container", func(ctx context.Context) { | ||
| fxt := rootFxt.WithPrefix("no-request") | ||
| gomega.Expect(fxt.Setup(ctx)).To(gomega.Succeed()) | ||
| ginkgo.DeferCleanup(fxt.Teardown) | ||
|
|
||
| claimCPUs := int64(1) | ||
|
|
||
| ginkgo.By("creating a ResourceClaim") | ||
| claim := makeResourceClaim(fxt.Namespace.Name, "test-claim", claimCPUs, cpuDeviceMode) | ||
| claim, err := fxt.K8SClientset.ResourceV1().ResourceClaims(fxt.Namespace.Name).Create(ctx, claim, metav1.CreateOptions{}) | ||
| gomega.Expect(err).ToNot(gomega.HaveOccurred()) | ||
|
|
||
| ginkgo.By("creating a Pod without CPU request") | ||
| pod := makePodWithClaim(fxt.Namespace.Name, "test-pod", claim.Name, nil, nil) | ||
| pod, err = fxt.K8SClientset.CoreV1().Pods(fxt.Namespace.Name).Create(ctx, pod, metav1.CreateOptions{}) | ||
| gomega.Expect(err).ToNot(gomega.HaveOccurred()) | ||
|
|
||
| ginkgo.By("waiting for pod to be running") | ||
| err = e2epod.WaitToBeRunning(ctx, fxt.K8SClientset, pod.Namespace, pod.Name) | ||
| gomega.Expect(err).ToNot(gomega.HaveOccurred(), "pod should be running") |
There was a problem hiding this comment.
I see why you added this but I'm thinking if this should actually fail - changing the current driver behavior
There was a problem hiding this comment.
Thanks for the question about the "without CPU requests specified" test case.
After discussion with @pravk03 , we decided to keep the validation lightweight in the CreateContainer hook:
- Validate when CPU request is set: Enforce that container.resources.requests.cpu matches the claim allocation
- Skip validation when no CPU request: Allow best-effort containers with claims (shares=2) to pass through
- Rationale: Avoid making CreateContainer too heavy. Full PLR validation will be handled by the scheduler (KEP-5517). The driver serves as a "final line of defense" for the most common misconfiguration
(explicit request mismatch).
This approach balances validation coverage with performance. WDYT?
b4cdc44 to
a59ffbd
Compare
| totalClaimCPUs := ca.TotalCPUs() | ||
|
|
||
| if ctr.Linux != nil && ctr.Linux.Resources != nil && ctr.Linux.Resources.Cpu != nil { | ||
| if ctr.Linux.Resources.Cpu.Shares != nil && ctr.Linux.Resources.Cpu.Shares.Value > minCPUShares { |
There was a problem hiding this comment.
Lets also document that if shares are not set at container level, validation will pass. This is a valid scenario when pod level resources are set.
| } | ||
| } | ||
|
|
||
| if pod.Linux != nil && pod.Linux.PodResources != nil && pod.Linux.PodResources.Cpu != nil { |
There was a problem hiding this comment.
I think this is incorrect. From reviewing the kubelet code, this check doesn't strictly confirm that Pod Level Resources (PLR) are enabled. If PLR isn't specified, the kubelet defaults this value to the sum of container requests.
There was a problem hiding this comment.
I wonder if there is a way to distinguish explicit PLR set in the pod. If we can, we could add an additional validation step - fail the validation if neither a container-level request nor a pod-level resource is specified.
There was a problem hiding this comment.
From the NRI hook, pod.Linux.PodResources is always populated by kubelet — either from explicit PLR or as the sum of container requests (when PLR feature gate is disabled). There's no reliable way to distinguish the two at this layer. The current approach skips validation when container-level shares are not set (shares <= 2), which correctly handles both best-effort containers and PLR scenarios. Full PLR validation is deferred to the scheduler per KEP-5517.Removed the incorrect PLR detection block accordingly.
e0b4db8 to
96f2585
Compare
|
/retest |
Signed-off-by: qiuxue <liuyutao36@gmail.com>
96f2585 to
c9ff84b
Compare
|
/test pull-dra-driver-cpu-e2e-device-mode-grouped-arm64 |
|
I've been thinking a bit more about this. Since KEP-5517 is currently in alpha, I'm concerned that the new validation introduced in this PR might prevent experimentation when the alpha feature gate is enabled. Initially, I was hoping we could include some validation for Pod Level Resources—specifically checking that the pod-level budget is at least equal to the sum of container-level standard requests + DRA claims. However, during the PR's implementation and review, it has become clear that we can't reliably perform these pod-level validations. Given this, I'm wondering if we should hold off on adding this validation in code for now, and instead focus on improving our documentation around workload requirements (both with and without KEP-5517)? @AutuSnow, I know we previously discussed having this validation and I was initially on board with it, but I hadn't fully thought through the KEP-5517 implications at the time. Sorry for the back and forth here! I would love to hear your thoughts on this @AutuSnow and @ffromani. /hold |
My very initial thought is that the conflict between validation and KEP-5517 largely depends on the version skew we allow and support. IOW, which versions of the driver is compatible with the kubernetes version? I think is worth clarifying the interactions we expect. We probably need just a simple table (e.g. kube <= 1.35 -> driver 0.1.0) and/or few versions range. That said, I'll deep dive in the PR and provide more informed comments. |
|
This question has been debated for several days. Can we add the -- enable cpu request validation flag (default true) to be disabled during the KEP-5517 experiment, so as to preserve the safety net of regular scenarios |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Validation rules:
fixs:#108