Skip CPU allocation when available CPU pool is empty#135
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: johnahull The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @johnahull! |
|
Hi @johnahull. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Thanks @johnahull for filing the issue.
I am a little confused by this part. The Kubelet and the DRA driver scan the machine topology independently and I am not clear as to how enabling static CPU policy is making the DRA driver's pool empty? Are you passing a specific Right now, the DRA driver and the Kubelet's static CPU policy do not work well together. There is not way to prevent Kubelet (with static policy) allocate the CPUs already allocated to the claim by the driver. The current recommendation is to disable the static CPU policy in Kubelet. When DRA driver is used with Kubelet static policy disabled
Does this mean you want to use the DRA driver just to guide the scheduler (i'e ensure the numa domain / socket in |
I think we somehow missed to make this explicitly and loud in the docs. People already heavily invested and/or working in this area took that for granted and we always worked with this assumption in mind but it was so obvious to us that we failed to record. I think this is the real true bug here. The contributed fix seems to have true merit (didn't review it yet had a quite crazy past few days), but the doc gaps seems to be the real thing IMO. I'll file a PR to fix the docs and the README.md |
Woah. I thought this was already covered in the docs. For some reason, I had it in my head that this limitation was documented since it's been discussed several times. Definitely agree - let's update the docs to make this clear. |
|
@pravk03 Thanks for the feedback. ▎ I am a little confused by this part. Are you passing a specific --reserved-cpus setting to the driver? No custom --reserved-cpus. The pool is empty because cpuManagerPolicy: static takes ownership of all non-reserved CPUs. ▎ Does this mean you want to use the DRA driver just to guide the scheduler... If yes, why are we enabling Kubelet static CPU policy? I had the DRA CPU driver still deployed from earlier testing when I switched to cpuManagerPolicy: static to test a different approach. In that approach, the kubelet CPU manager handles pinning and a kubelet patch (#138732) reads numaNode from GPU/NIC ResourceSlices to align CPU pinning with DRA devices. The DRA CPU driver isn't needed for that — but it was still running and crashed on the empty pool. The driver shouldn't crash if it happens to be deployed alongside cpuManagerPolicy: static. I think a skip with a warning log is better than a hard failure. But if you'd prefer to just document the incompatibility and close this, I have no problem. |
Here: #136 @johnahull the doc bug was clear and major. I'm open to improve the code and I'll review your PR carefully ASAP. |
| klog.Infof("NUMA node %d CPUs:%s available CPUs: %s", numaNodeID, numaCPUs.String(), availableCPUsForDevice.String()) | ||
| } | ||
|
|
||
| if availableCPUsForDevice.Size() == 0 { |
There was a problem hiding this comment.
I think there's a merit in adding this safety guard, it's probably the last line of defence we can add to detect a misconfigured cluster. But we should hard fail in this path rather than carry on.
@pravk03 thoughts?
There was a problem hiding this comment.
Another thought. Since we are moving towards a golang-based setup helper, which users can run when they wish to (e.g. as kubernetes job) and not necessarily as init container, would it make sense to validate the kubelet config?
There was a problem hiding this comment.
I am ok with adding this check, but I wonder if this check would ever get triggered in practice.
From my understanding, the only reason it may be triggered is when a pod with scheduled on the node, but we do not have enough CPUs available on the node (or socket/numa if we use some form of filtering in the claim). But this should not happen in the first place, because the scheduler should not select the node.
Even with a misconfiguration (i.e., enabling the static CPU policy while using the driver), kubelet might have assigned the CPUs to a different pod. However, from the DRA driver's point of view, if a CPU is not assigned to another claim, it should still be considered available. The DRA driver does not know what is assigned by kubelet, and hence we would not trigger this check.
|
/ok-to-test |
| klog.Infof("NUMA node %d CPUs:%s available CPUs: %s", numaNodeID, numaCPUs.String(), availableCPUsForDevice.String()) | ||
| } | ||
|
|
||
| if availableCPUsForDevice.Size() == 0 { |
There was a problem hiding this comment.
I am ok with adding this check, but I wonder if this check would ever get triggered in practice.
From my understanding, the only reason it may be triggered is when a pod with scheduled on the node, but we do not have enough CPUs available on the node (or socket/numa if we use some form of filtering in the claim). But this should not happen in the first place, because the scheduler should not select the node.
Even with a misconfiguration (i.e., enabling the static CPU policy while using the driver), kubelet might have assigned the CPUs to a different pod. However, from the DRA driver's point of view, if a CPU is not assigned to another claim, it should still be considered available. The DRA driver does not know what is assigned by kubelet, and hence we would not trigger this check.
When cpuManagerPolicy: static is active, the kubelet CPU manager owns all non-reserved CPUs. The DRA CPU driver's shared pool is empty because no CPUs are available for DRA allocation. In this case, TakeByTopologyNUMAPacked fails with "not enough cpus available to satisfy request." Skip the allocation attempt when availableCPUsForDevice is empty. When no CPUs are assigned for the entire claim, return a PrepareResult with Device entries (pool, device, request) but no CDI cpuset injection. The NRI CreateContainer hook handles missing DRA cpuset env vars by falling through to the shared CPU pool, and the kubelet CPU manager's cgroup cpuset takes precedence for dedicated pods. This allows the DRA CPU driver to coexist with cpuManagerPolicy: static without failing prepare for pods that have DRA CPU claims.
f3fa402 to
84fa2d3
Compare
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
When
cpuManagerPolicy: staticis active, the kubelet CPU manager owns all non-reserved CPUs. The DRA CPU driver's shared pool is empty, causingTakeByTopologyNUMAPackedto fail with "not enough cpus available to satisfy request."This fix:
availableCPUsForDeviceis emptyPrepareResultwith Device entries (pool, device, request) but no CDI cpuset injectionThe NRI
CreateContainerhook already handles missing DRA cpuset env vars by falling through to the shared CPU pool. The kubelet CPU manager's cgroup cpuset takes precedence for dedicated pods.This allows the DRA CPU driver to coexist with
cpuManagerPolicy: staticwithout failing prepare for pods that have DRA CPU claims used purely as topology markers formatchAttributealignment.Which issue(s) this PR is related to:
Fixes #134
Does this PR introduce a user-facing change?