Skip to content

KEP-5517: DRA Node Allocatable Resources alpha2 changes#6082

Open
pravk03 wants to merge 1 commit into
kubernetes:masterfrom
pravk03:native-dra-alpha2
Open

KEP-5517: DRA Node Allocatable Resources alpha2 changes#6082
pravk03 wants to merge 1 commit into
kubernetes:masterfrom
pravk03:native-dra-alpha2

Conversation

@pravk03
Copy link
Copy Markdown
Contributor

@pravk03 pravk03 commented May 15, 2026

  • One-line PR description: DRA Node Allocatable Resources update for alpha2
  • Other comments:

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 15, 2026
@k8s-ci-robot k8s-ci-robot requested review from dom4ha and macsko May 15, 2026 06:30
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 15, 2026
@github-project-automation github-project-automation Bot moved this to Needs Triage in SIG Scheduling May 15, 2026
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 15, 2026
@pravk03 pravk03 force-pushed the native-dra-alpha2 branch 2 times, most recently from a1ba25a to e2e59ce Compare May 15, 2026 17:55
@pravk03 pravk03 marked this pull request as ready for review May 15, 2026 18:00
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 15, 2026
@pravk03
Copy link
Copy Markdown
Contributor Author

pravk03 commented May 15, 2026

/sig node
/cc @johnbelamaric @liggitt @pohly (for DRA API review)
/cc @dom4ha @macsko (for sig-scheduling)
/cc @tallclair @ffromani @kad (for sig-node)
/cc @ndixita @natasha41575 (for IPPR and PLR integrations)

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label May 15, 2026
@pravk03 pravk03 force-pushed the native-dra-alpha2 branch 4 times, most recently from 2961cb8 to 78582e3 Compare May 19, 2026 23:42
Copy link
Copy Markdown
Contributor

@natasha41575 natasha41575 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity: we'd previously discussed a potential implementation where we use the allocation manager as the source of truth, by overwriting the allocated pod spec limits to be the aggregate spec limits + dra resources, and leaving the kuberuntime manager code untouched. But I think the proposal as-is cleanly handles the complex use cases of shared claims. It does however mean we have to be more careful to ensure we cover every part of kubelet that may be taking assumptions about the pod allocation.

We might need to revisit some of this when dra is made mutable, but we don't need to worry about that now.

Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md
* **Behavior with DRA**: No change. To ensure steady-state reconciliation loops (`computePodResizeAction`)
do not trigger unnecessary CRI updates or cgroup resets, Kubelet maintains the internal `actuated`
checkpoint strictly limited to standard Spec requests and limits. DRA allocations are excluded from
the checkpoint.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution is very elegant. However, we need to be mindful that this introduces an intentional, hidden abstraction layer where the internal actuation checkpoint deliberately diverges from the actual cgroup limits. It works perfectly for preventing reconciliation loops, but we should explicitly call out and carefully document this structural divergence.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a note to capture that this divergence is a deliberate design choice.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why the divergence exist. However this could result in observability gap. Do you think it makes sense to add the DRA-allocated resources to the PodSpec resources in PodStatus though?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it makes sense to add the DRA-allocated resources to the PodSpec resources in PodStatus though?

DRA allocations are currently surfaced in PodStatus under status.nodeAllocatableResourceClaimStatuses and the status limits will also reflect the DRA resources.

Are you suggesting we should also include them in the allocatedResources (e.g., status.containerStatuses[*].allocatedResources), or did you have a different tracking field in mind?

If yes, I think we'd have to nest it under a new field, since existing allocatedResources field only use requests (not limits). Do you think it's necessary to duplicate the DRA limits under allocatedResources?

We emit events when a pod is resized, and the event emits the entire pod allocation. Do we emit any events when a new pod is admitted? It might be enough for observability to ensure that when such an event is emitted, the emitted allocation in the event includes DRA resources.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If yes, I think we'd have to nest it under a new field, since existing allocatedResources field only use requests (not limits). Do you think it's necessary to duplicate the DRA limits under allocatedResources?

+1. Because allocatedResources is a flat map (map[ResourceName]resource.Quantity), introducing DRA specific fields inside it is not possible, and adding a new struct field like status.allocatedDRAResources would introduce a redundant overlap with .status.nodeAllocatableResourceClaimStatuses. I would prefer for allocatedResources status (at both the pod and container level) to remain strictly Spec based, and resources status (at both the pod and container level) reflects what is read from the cgroup (which includes the DRA driver's enforcement). To make this clear, we could update the API documentation to explicitly state that allocatedResources does not include DRA-based requests and a reference to status.nodeAllocatableResourceClaimStatuses for DRA information.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +1352 to +1354
* **Behavior with DRA**: No change. Because Kubelet does not update cgroup requests based on DRA claims
(keeping CPU shares pure to standard Spec), the `allocated` checkpoint and reported `allocatedResources`
remain strictly limited to standard Spec requests and limits. DRA allocations are completely excluded.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my other comment: This solution is simple and elegant but may be a bit unexpected. We are not checkpointing DRA resources as part of the allocation, but DRA resources are inherently evaluated when kubelet performs its allocation (resource fit) checks via canAdmitPod. This means that the DRA resources are practically part of the pod's allocation; just not recorded explicitly in the checkpoint.

This works because DRA resources are immutable. We will need to take care to audit all parts of kubelet that may be making assumptions that the kubelet allocation checkpoint is the source of truth for the entire pod footprint. I think your KEP does a good job covering it, but it should be documented very clearly in the code too.

We'd also have to audit the output messages in Pending / Deferred conditions and events to make sure they are still saying something reasonable and meaningful to users too.

I considered suggesting that we add the DRA resources to the allocation / actuation checkpoint for the sake of cleanly tracking, but given that it's not necessary at this point in time, it's probably not worth it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Yes, we rely on the fact that DRA resources are (currently) immutable after allocation and this simplifies the design a lot. I have updated the ### Kubelet Admission Control section to include the above details. I will also make sure to include these design decisions as comments in the code.

@pravk03 pravk03 force-pushed the native-dra-alpha2 branch 3 times, most recently from c98aec5 to 2354869 Compare May 21, 2026 01:25
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md
@pravk03 pravk03 force-pushed the native-dra-alpha2 branch from 2354869 to 4aecbc2 Compare May 21, 2026 17:14
Copy link
Copy Markdown
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made a first full pass. Great work here. The only unavoidable challenge is that the sheer size of this work makes it hard to keep everything in mind. But I'm confident few more passes will help me.
I may have added comments that actually have answers later in the doc. Sorry about that, I can only say this is another byproduct of the size of the KEP. Please bear with me :)
The part I'm most wary atm (and the main/only reason is my limited experience in the area) is interaction with IPPR, but I see SMEs already engaging, so I think we're good it.

Overall this is great and I didn't spot any obvious issue, but more passes are warranted anyway to ensure the work gets the careful review it deserves.

* Until Kubelet is made DRA-aware for node allocatable resources (a non-goal for Alpha), QoS and node-level
enforcement will not fully reflect DRA allocations. This is an accepted limitation for the initial
Alpha scope.
* While the Kubelet considers DRA for cgroup enforcement, QoS class classification remains purely based on the standard Spec.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, but not clear why this is a risk and, if so, what's the mitigation

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving my comment from elsewhere to here

As we change the resource model, I feel existing QoS class classification is becoming outdated and no longer honors its original intention. In the past, QoS class could tell you something about how the pod is consuming physical resources and therefore the Kubelet could make a reasonable decision about who to evict, but between this KEP and the proposal to change QoS shape I feel that our direction is making QoS class a bit meaningless / arbitrary.

It's been proposed before, but a mitigation could be to allow users to explicitly set a QoS class. I think revisiting what QoS class means and how it should be enforced should be discussed as a separate effort / KEP though.

cc @tallclair

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ffromani. I can include a section about risks of not updating QOS and potential sectiosn.

@natasha41575 I completely agree. This has come up multiple times in the initial discussions on this KEP. Instead of adding even more variables to how QoS class is inferred (container spec, pod-level resources, IPPR, DRA), I would prefer we revisit the decision to implicitly infer QoS and explore making it explicitly definable in the pod spec.

@kad @dchen1107

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natasha41575 from my side, the current implicit determination of QoS is not a good design in overall, but it is something that is deep in tech debt and can't be easily changed without breakages. My intention few years back with KEP 3008 was exactly to have QoS explicitly declarable, and this KEP would be able to benefit from it... but for sake of scope of this kep, I think better not to touch QoS calculations. let it be compatible with existing ecosystem.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but for sake of scope of this kep, I think better not to touch QoS calculations. let it be compatible with existing ecosystem.

100% agree that it should be left out of scope of this KEP.

Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
containers: ["c1", "c2"]
overhead:
- name: cpu
quantity: "2" # Pre-resolved total sum: 1 CPU per pod + (500m * 2 containers)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the example made me realize this value is slightly confusing because we kinda conflate the originating values from NodeAllocatableOverhead with their computed value quantity which is IMO more akin a status field or anyway a derived value of sorts.
Honestly: maybe it's just me. I can't say if it's worth another level of nesting to separate the computed values. Probably not?

Copy link
Copy Markdown
Contributor Author

@pravk03 pravk03 May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I don't have a strong preference, but here are a few more options we could consider:

  1. We could remove the quantity field entirely from this strict. perPodReference and perContainerReference are sufficient to calculate the cgroup settings. But this forces us to do the math to figure out the total allocated quantity: perPodReference + numContianerRef * perContianerReference.
  2. Would renaming the quantity field make it more obvious ?
    • Alternatives: total, allocated

cc @liggitt (for API review feedback on this)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking a bit more, having duplicate information is confusing. I have removed the quantity field, we can get all the information from perPodReference and perContainerReference fields.

* Until Kubelet is made DRA-aware for node allocatable resources (a non-goal for Alpha), QoS and node-level
enforcement will not fully reflect DRA allocations. This is an accepted limitation for the initial
Alpha scope.
* While the Kubelet considers DRA for cgroup enforcement, QoS class classification remains purely based on the standard Spec.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving my comment from elsewhere to here

As we change the resource model, I feel existing QoS class classification is becoming outdated and no longer honors its original intention. In the past, QoS class could tell you something about how the pod is consuming physical resources and therefore the Kubelet could make a reasonable decision about who to evict, but between this KEP and the proposal to change QoS shape I feel that our direction is making QoS class a bit meaningless / arbitrary.

It's been proposed before, but a mitigation could be to allow users to explicitly set a QoS class. I think revisiting what QoS class means and how it should be enforced should be discussed as a separate effort / KEP though.

cc @tallclair

Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
@pravk03 pravk03 force-pushed the native-dra-alpha2 branch from 4aecbc2 to d50b636 Compare May 22, 2026 23:23
2. **Kube-Scheduler Changes**: Modifications in `NodeResourcesFit` and `DynamicResources` plugins to synchronize node resource usage tracking, delegating authoritative node-fit checks to the `DynamicResources` plugin when a pod utilizes DRA claims.
3. **Kubelet Changes**: Updates in Kubelet to take into account resources allocated through DRA in the cgroup enforcement.

### Conceptual Mapping: Pod Spec Requests and Limits with DRA
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnbelamaric @liggitt FYI. I have updated the KEP and added this section since our last discussion. The DRA based allocation would be considered as both requests and limits at the scheduler and the node.

@pravk03
Copy link
Copy Markdown
Contributor Author

pravk03 commented May 26, 2026

/cc @mrunalp (for SIG Node approvals)

@pravk03
Copy link
Copy Markdown
Contributor Author

pravk03 commented May 26, 2026

/assign @tallclair

@pravk03 pravk03 force-pushed the native-dra-alpha2 branch 2 times, most recently from 2d86e86 to f811f47 Compare May 27, 2026 02:23
@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation May 27, 2026
// blocks sharing direct-mapped device claims across multiple pods.
// +optional
// +oneOf=MappingType
Direct *NodeAllocatableDirectMapping `json:"direct,omitempty" protobuf:"bytes,1,opt,name=direct"`
Copy link
Copy Markdown
Member

@johnbelamaric johnbelamaric May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name is OK. An alternative could be to name the parent field nodeAllocatableResources and rename "direct" to "mapping", something like:

device:
  nodeAllocatableResources:
    "cpu":
      mapping:
        capacityKey: "dra.example.com/cores"
        allocationMultiplier: "2"
    "memory":
      overhead:
        perPodReference: "1Gi"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Good suggestion. I am ok with either option here.

Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
fields, such as `AccountingPolicy`, to the `NodeAllocatableResourceMapping` struct to specify the desired policy. The impact of
these accounting policies on existing features like Pod Level Resources and In-Place Pod Vertical Scaling also
needs more consideration.
Once the scheduler selects a node and resolves DRA claim allocations, it sums the pod spec standard requests with the newly calculated DRA cgroup-burst resource requests. It evaluates this unified footprint against the remaining namespace `ResourceQuota`. If the computed usage exceeds the remaining quota, the node is filtered out during the scheduling cycle.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems quota probably should be taken into account during the Prioritized List device evaluation process. cc @mortent @pohly

@pravk03 pravk03 force-pushed the native-dra-alpha2 branch from f811f47 to 61a223e Compare May 28, 2026 00:36
Copy link
Copy Markdown
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did another pass to the node enforcement portion.
Apologies if I'm missing something in the doc about the cpu quota disable for integral exclusive CPUS

CPU Shares = MilliCPUToShares( Sum(Spec.Requests[cpu]) + DRADirectMapped(cpu) + DRAOverheadMappedPodTotal(cpu) )
CPU Quota = Sum(Spec.Limits[cpu]) + DRADirectMapped(cpu) + DRAOverheadMappedPodTotal(cpu)
Memory Limit = Sum(Spec.Limits[memory]) + DRADirectMapped(memory) + DRAOverheadMappedPodTotal(memory)
HugePages Limit = Sum(Spec.Limits[hugepages-<size>]) + DRADirectMapped(hugepages-<size>) + DRAOverheadMappedPodTotal(hugepages-<size>)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm torn if we should mention that hugepages limit translates to the hugetlb.<size>.max setting AND the hugetlb.<size>.rsvd.max setting. Could be just a note somewhere once. Probably not worth?

Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md

###### Long-Term Mitigation - Explicit QoS Class

A robust long-term solution would be to allow workloads to declare an explicit QoS class directly in the Pod Spec, rather than relying on implicit derivations inside Kubelet.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is KEP 3008 relevant here also? cc @marquiz @kad

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of this work. Thanks for sharing. Yes, It seems very relevant, and I will read more.

Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md
@pravk03
Copy link
Copy Markdown
Contributor Author

pravk03 commented May 28, 2026

/assign @dom4ha @johnbelamaric @ffromani
/label api-review

@k8s-ci-robot k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label May 28, 2026
@pravk03 pravk03 force-pushed the native-dra-alpha2 branch from 61a223e to 47dbb5e Compare May 29, 2026 01:17
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pravk03
Once this PR has been reviewed and has the lgtm label, please ask for approval from dom4ha. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pravk03 pravk03 force-pushed the native-dra-alpha2 branch 2 times, most recently from 8e162eb to eb1a85e Compare May 29, 2026 17:24
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
Comment thread keps/sig-scheduling/5517-dra-node-allocatable-resources/README.md Outdated
Comment on lines +1287 to +1289
**Risk**: With Kubelet enforcing quotas while the DRA driver allocates exclusive physical CPUs, the workload could experience the same throttling issues as in [issue 70585](https://github.com/kubernetes/kubernetes/issues/70585).
**Mitigation**: While the DRA driver can use container-level hooks to override Kubelet's defaults and set the container cgroup to unlimited, it cannot modify Kubelet-managed
pod-level parent cgroups. To mitigate this, the container requesting exclusive CPUs through the DRA claim can skip setting limits in the container spec. Under this configuration, Kubelet's cgroup manager natively skips quota configuration at both container and pod levels and they remain unlimited (`cpu.max = -1`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this section. While the mitigation won't avoid the regression, I concur there's not much we can do in the scope of this already massive KEP. We can perhaps revisit this point in the beta cycle, because after all it's a regression whose best mitigation creates a awkward UX.

I believe a proper fix would involve some more handshakes between the DRA drivers and the kubelet, handshakes which we will need to design.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I left a note about the long term mitigation.

@pravk03 pravk03 force-pushed the native-dra-alpha2 branch from eb1a85e to 18c6a4a Compare June 3, 2026 22:39
@pravk03 pravk03 force-pushed the native-dra-alpha2 branch from 18c6a4a to 59429a1 Compare June 3, 2026 22:56
// Direct is used when the device directly models a node allocatable resource like standard CPU or memory
// (e.g., with a CPU DRA driver). The calculated quantity is accounted for exactly once per claim instance
// on the node. To prevent node cgroup isolation friction, the scheduler explicitly
// blocks sharing direct-mapped device claims across multiple pods.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blocking is not really good. Use case: fabric attached memory. It is mapped 1:1 practically to native memory resource, it can be shared between pods that are sharing same claim, but it is not an "overhead" type as it not "overflows" or not "shared" with rest of the system (separate NUMA node).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for blocking sharing right now is that we currently do not have a mechanism to update cgroup settings for already-running pods when a new pod attaches to an existing claim.

The topic of sharable and non-sharable claims seems to extend beyond node allocatable devices, as it came up in few other threads as well.

@pohly @johnbelamaric Do you think its worth exploring a common solution for configuring this ?. In this specific case, we would need the configuration at the device level, and another control we want is the ability to prevent sharing across pods, or even within containers of the same pod.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it keeps coming up. We have allowMultipleAllocations at the device level for sharing devices between claims. This would be some control that decides if a claim is sharable between containers and pods. Maybe we need a sharingPolicy: {None, SamePod, MultiPod} or something, at the Device level, that controls the shareability of claims that contain that device?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a separate KEP, I think

@ffromani
Copy link
Copy Markdown
Contributor

ffromani commented Jun 4, 2026

provisional LGTM from my side for the node enforcement portion. Will have a final pass ASAP, hopefully the end of this week.

@ffromani
Copy link
Copy Markdown
Contributor

ffromani commented Jun 5, 2026

LGTM for the node enforcement portion. This is a good incremental update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review Categorizes an issue or PR as actively needing an API review. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Projects

Status: No status
Status: 👀 In review
Status: Needs Triage

Development

Successfully merging this pull request may close these issues.