Skip to content

Skip CPU allocation when available CPU pool is empty#135

Open
johnahull wants to merge 1 commit into
kubernetes-sigs:mainfrom
johnahull:fix/skip-allocation-dedicated-cpus
Open

Skip CPU allocation when available CPU pool is empty#135
johnahull wants to merge 1 commit into
kubernetes-sigs:mainfrom
johnahull:fix/skip-allocation-dedicated-cpus

Conversation

@johnahull
Copy link
Copy Markdown

What type of PR is this?

/kind bug

What this PR does / why we need it:

When cpuManagerPolicy: static is active, the kubelet CPU manager owns all non-reserved CPUs. The DRA CPU driver's shared pool is empty, causing TakeByTopologyNUMAPacked to fail with "not enough cpus available to satisfy request."

This fix:

  1. Skips the allocation attempt when availableCPUsForDevice is empty
  2. When no CPUs are assigned for the entire claim, returns a PrepareResult with Device entries (pool, device, request) but no CDI cpuset injection

The NRI CreateContainer hook already handles missing DRA cpuset env vars by falling through to the shared CPU pool. The kubelet CPU manager's cgroup cpuset takes precedence for dedicated pods.

This allows the DRA CPU driver to coexist with cpuManagerPolicy: static without failing prepare for pods that have DRA CPU claims used purely as topology markers for matchAttribute alignment.

Which issue(s) this PR is related to:

Fixes #134

Does this PR introduce a user-facing change?

Fix DRA CPU driver to skip allocation when kubelet CPU manager owns all CPUs (cpuManagerPolicy: static).

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. labels May 1, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: johnahull
Once this PR has been reviewed and has the lgtm label, please assign klueska for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @johnahull!

It looks like this is your first PR to kubernetes-sigs/dra-driver-cpu 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/dra-driver-cpu has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @johnahull. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 1, 2026
@johnahull johnahull marked this pull request as ready for review May 1, 2026 19:25
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 1, 2026
@k8s-ci-robot k8s-ci-robot requested a review from ffromani May 1, 2026 19:25
@pravk03
Copy link
Copy Markdown
Contributor

pravk03 commented May 1, 2026

Thanks @johnahull for filing the issue.

When cpuManagerPolicy: static is active, the kubelet CPU manager owns all non-reserved CPUs. The DRA CPU driver's shared pool is empty, causing TakeByTopologyNUMAPacked to fail with "not enough cpus available to satisfy request."

I am a little confused by this part. The Kubelet and the DRA driver scan the machine topology independently and I am not clear as to how enabling static CPU policy is making the DRA driver's pool empty? Are you passing a specific --reserved-cpus setting to the driver ?

Right now, the DRA driver and the Kubelet's static CPU policy do not work well together. There is not way to prevent Kubelet (with static policy) allocate the CPUs already allocated to the claim by the driver. The current recommendation is to disable the static CPU policy in Kubelet.

When DRA driver is used with Kubelet static policy disabled

  1. It pins CPUs to containers with claims
  2. It maintains a shared pool and assigns them to container without claims.

DRA CPU claims used purely as topology markers for matchAttribute alignment.

Does this mean you want to use the DRA driver just to guide the scheduler (i'e ensure the numa domain / socket in matchAttribute has enough capacity), without the driver actually pinning the CPUs?. If yes, why are we enabling Kubelet static CPU policy ?. I am not sure if I understand the use case here, could you please share more details.

@ffromani
Copy link
Copy Markdown
Contributor

ffromani commented May 4, 2026

Right now, the DRA driver and the Kubelet's static CPU policy do not work well together. There is not way to prevent Kubelet (with static policy) allocate the CPUs already allocated to the claim by the driver. The current recommendation is to disable the static CPU policy in Kubelet.

I think we somehow missed to make this explicitly and loud in the docs. People already heavily invested and/or working in this area took that for granted and we always worked with this assumption in mind but it was so obvious to us that we failed to record. I think this is the real true bug here. The contributed fix seems to have true merit (didn't review it yet had a quite crazy past few days), but the doc gaps seems to be the real thing IMO.

I'll file a PR to fix the docs and the README.md

@pravk03
Copy link
Copy Markdown
Contributor

pravk03 commented May 4, 2026

I think we somehow missed to make this explicitly and loud in the docs.

Woah. I thought this was already covered in the docs. For some reason, I had it in my head that this limitation was documented since it's been discussed several times. Definitely agree - let's update the docs to make this clear.

@johnahull
Copy link
Copy Markdown
Author

@pravk03 Thanks for the feedback.

▎ I am a little confused by this part. Are you passing a specific --reserved-cpus setting to the driver?

No custom --reserved-cpus. The pool is empty because cpuManagerPolicy: static takes ownership of all non-reserved CPUs.

▎ Does this mean you want to use the DRA driver just to guide the scheduler... If yes, why are we enabling Kubelet static CPU policy?

I had the DRA CPU driver still deployed from earlier testing when I switched to cpuManagerPolicy: static to test a different approach. In that approach, the kubelet CPU manager handles pinning and a kubelet patch (#138732) reads numaNode from GPU/NIC ResourceSlices to align CPU pinning with DRA devices. The DRA CPU driver isn't needed for that — but it was still running and crashed on the empty pool.

The driver shouldn't crash if it happens to be deployed alongside cpuManagerPolicy: static. I think a skip with a warning log is better than a hard failure. But if you'd prefer to just document the incompatibility and close this, I have no problem.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 4, 2026
@ffromani
Copy link
Copy Markdown
Contributor

ffromani commented May 5, 2026

I think we somehow missed to make this explicitly and loud in the docs.

Woah. I thought this was already covered in the docs. For some reason, I had it in my head that this limitation was documented since it's been discussed several times. Definitely agree - let's update the docs to make this clear.

Here: #136

@johnahull the doc bug was clear and major. I'm open to improve the code and I'll review your PR carefully ASAP.

Comment thread pkg/driver/dra_hooks.go
klog.Infof("NUMA node %d CPUs:%s available CPUs: %s", numaNodeID, numaCPUs.String(), availableCPUsForDevice.String())
}

if availableCPUsForDevice.Size() == 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a merit in adding this safety guard, it's probably the last line of defence we can add to detect a misconfigured cluster. But we should hard fail in this path rather than carry on.
@pravk03 thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thought. Since we are moving towards a golang-based setup helper, which users can run when they wish to (e.g. as kubernetes job) and not necessarily as init container, would it make sense to validate the kubelet config?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with adding this check, but I wonder if this check would ever get triggered in practice.

From my understanding, the only reason it may be triggered is when a pod with scheduled on the node, but we do not have enough CPUs available on the node (or socket/numa if we use some form of filtering in the claim). But this should not happen in the first place, because the scheduler should not select the node.

Even with a misconfiguration (i.e., enabling the static CPU policy while using the driver), kubelet might have assigned the CPUs to a different pod. However, from the DRA driver's point of view, if a CPU is not assigned to another claim, it should still be considered available. The DRA driver does not know what is assigned by kubelet, and hence we would not trigger this check.

@ffromani
Copy link
Copy Markdown
Contributor

ffromani commented May 6, 2026

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 6, 2026
Comment thread pkg/driver/dra_hooks.go
klog.Infof("NUMA node %d CPUs:%s available CPUs: %s", numaNodeID, numaCPUs.String(), availableCPUsForDevice.String())
}

if availableCPUsForDevice.Size() == 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with adding this check, but I wonder if this check would ever get triggered in practice.

From my understanding, the only reason it may be triggered is when a pod with scheduled on the node, but we do not have enough CPUs available on the node (or socket/numa if we use some form of filtering in the claim). But this should not happen in the first place, because the scheduler should not select the node.

Even with a misconfiguration (i.e., enabling the static CPU policy while using the driver), kubelet might have assigned the CPUs to a different pod. However, from the DRA driver's point of view, if a CPU is not assigned to another claim, it should still be considered available. The DRA driver does not know what is assigned by kubelet, and hence we would not trigger this check.

Comment thread pkg/driver/dra_hooks.go
When cpuManagerPolicy: static is active, the kubelet CPU manager owns
all non-reserved CPUs. The DRA CPU driver's shared pool is empty
because no CPUs are available for DRA allocation. In this case,
TakeByTopologyNUMAPacked fails with "not enough cpus available to
satisfy request."

Skip the allocation attempt when availableCPUsForDevice is empty.
When no CPUs are assigned for the entire claim, return a
PrepareResult with Device entries (pool, device, request) but no CDI
cpuset injection. The NRI CreateContainer hook handles missing DRA
cpuset env vars by falling through to the shared CPU pool, and the
kubelet CPU manager's cgroup cpuset takes precedence for dedicated
pods.

This allows the DRA CPU driver to coexist with cpuManagerPolicy:
static without failing prepare for pods that have DRA CPU claims.
@johnahull johnahull force-pushed the fix/skip-allocation-dedicated-cpus branch from f3fa402 to 84fa2d3 Compare May 14, 2026 21:48
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 14, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

prepareGroupedResourceClaim fails when kubelet CPU manager owns all CPUs

4 participants