[WIP] Host managed imex#1163
Conversation
Adds an alpha, install-wide HostManagedIMEX feature gate for clusters where the operator owns the host nvidia-imex daemon lifecycle. When enabled, the driver keeps the ComputeDomain API and the DRA channel-0 injection path but stops creating per-ComputeDomain IMEX DaemonSets, daemon ResourceClaimTemplates, daemon DeviceClasses/RBAC, and ComputeDomain node labels. - featuregates: register HostManagedIMEX (alpha, default false) and force IMEXDaemonsWithDNSNames and ComputeDomainCliques off before dependency validation runs. - controller: reconcile only the workload ResourceClaimTemplate and the ComputeDomain finalizer; report Ready without per-node daemon tracking. - kubelet plugin: accept only allocationMode Single/empty, reject daemon claims, require a non-empty NVLink clique, skip node-label add/remove, and omit daemon devices from the published ResourceSlice. - helm: hide the daemon DeviceClass and daemon RBAC when the gate is on, using an explicit "true" check so --set-string ...=false is not treated as enabled. Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Operator-facing artifacts for the HostManagedIMEX alpha gate (no driver code change): - docs/prerequisites.md: a "Host-managed IMEX" subsection — host nvidia-imex must be running (not masked), channel-0 device prereqs, the two compatible gates are auto-forced off, and Single-only / numNodes:0 guidance. - demo/specs/imex/host-managed/: a smoke spec (channel0 injection), a negative allocationMode:All spec, a DGXC GB200 Helm values overlay (skyhook toleration, arm64 controller pin, nvidiaDriverRoot=/run/nvidia/driver for the containerized driver), and a README runbook. Signed-off-by: Davanum Srinivas <davanum@gmail.com>
A provisional, KEP-style proposal (per docs/proposals/README.md) for an alpha, install-wide HostManagedIMEX feature gate: for clusters where the operator owns the host nvidia-imex daemon, the driver keeps the ComputeDomain API + channel-0 DRA injection but stops creating per-ComputeDomain IMEX DaemonSets. Written forward-looking; status provisional. Signed-off-by: Davanum Srinivas <davanum@gmail.com>
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for dra-driver-nvidia-gpu ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@dims we need to merge this fix for the mock nvml test to work again: NVIDIA/go-nvlib#89 |
@shivamerla Merged! |
|
/retest |
|
|
||
| ## Summary | ||
|
|
||
| This proposal introduces an alpha, install-wide `HostManagedIMEX` feature gate |
There was a problem hiding this comment.
is the nvidia driver also supposed to be host managed in this case?
|
|
||
| - Introduces `HostManagedIMEX` at **Alpha** stability, default `false`, applied | ||
| install-wide. | ||
| - The gate is operationally mutually exclusive with driver-managed daemon |
There was a problem hiding this comment.
This is imp. Can we add a check to enforce it?
|
|
||
| ### Feature gate & graduation | ||
|
|
||
| - Introduces `HostManagedIMEX` at **Alpha** stability, default `false`, applied |
There was a problem hiding this comment.
would be good to add a table differentiating HostManaged vs DriverManaged, what are the design differences, components involved, scalability expectation, latency etc
|
By default, I would assume having a host managed IMEX daemon would imply that the DRA driver does channel management on top of it. Meaning that each workload would be assigned a per-clique channel ID to inject into its pods. It's fine if we (additionally) want to support a mode where users don't care about channel isolation (as this PR does, by only injecting channel 0 into all workloads), but that should be its own flag then. If this second variant is what you want to support first, then you can create the flag, make it default to doing per-workload channel allocation, but error out in this default mode (forcing one to explicitly set it to "no-isolation" or whatever you want to call it). So ... HostManagedIMEX = false --> ignore IsolationStrategy With all of that said, it feels like this should be actual helm options of sorts and not just feature gates. Feature gates are meant to be something that eventually has a path to being always on by default. You can protect the setting of the helm options by the feature gate, but you shouldn't just use the feature gate as a (forever) optional toggle by itself. |
Helm option makes sense here. Something like below. Various combinations possible are
So I think these should be Helm options, but with a clear upgrade rule that changing any of |
YES please go for it. |
What type of PR is this?
Please see
docs/proposals/0001-host-managed-imex.mdWhat this PR does / why we need it:
Which issue(s) this PR is related to:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation (design docs, usage docs, etc.):
Checklist
make check testpasses locallymake check-generatepasses ifapi/changed (CRDs, deepcopy, informers, listers, clientset)make check-modulespasses ifgo.mod/go.sumchangeddeployments/helm) updated if flags, RBAC, or defaults changed