Skip to content

prepareGroupedResourceClaim fails when kubelet CPU manager owns all CPUs #134

@johnahull

Description

@johnahull

What happened?

When cpuManagerPolicy: static is configured, the kubelet CPU manager owns all non-reserved CPUs. The DRA CPU driver's shared pool (GetSharedCPUs()) is empty because no CPUs are available for DRA allocation. TakeByTopologyNUMAPacked is called with an empty available set and fails with "not enough cpus available to satisfy request", causing NodePrepareResources to fail.

This happens when a pod has DRA CPU claims but the kubelet CPU manager is responsible for CPU pinning (option B in the DRA topology-aware placement design). The DRA CPU device is used only as a topology marker for matchAttribute: resource.kubernetes.io/numaNode alignment — the actual CPU pinning is handled by the kubelet.

What did you expect to happen?

When the available CPU pool is empty, the driver should skip allocation and return a PrepareResult with the device entries but no CDI cpuset injection. The NRI CreateContainer hook already handles this case — it falls through to the shared CPU pool when no DRA_CPUSET_* env var is set.

How to reproduce it?

  1. Configure kubelet with cpuManagerPolicy: static and reservedSystemCPUs: "0-3"
  2. Deploy the DRA CPU driver
  3. Create a ResourceClaim requesting a CPU device
  4. Create a Guaranteed QoS pod referencing the claim
  5. NodePrepareResources fails with "not enough cpus available to satisfy request"

Environment

  • Kubernetes v1.36+
  • DRA CPU driver with --cpu-device-mode=grouped or --cpu-device-mode=individual
  • cpuManagerPolicy: static

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions