You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On systems where GPUs are pre-bound to vfio-pci at boot via the kernel cmdline parameter vfio-pci.ids=<vendor>:<device> (e.g., vfio-pci.ids=10de:2330), the VFIO lifecycle doesn't recognize these GPUs as intentionally pre-bound. This causes two issues:
Unconfigure rebinds them to nvidia — When a VM using a pre-bound GPU is deleted, Unconfigure rebinds the GPU back to the nvidia driver. On NVLink systems (e.g., H100 SXM5), this triggers NVLink fabric reconfiguration that hangs for 30+ seconds and can leave the GPU in a bad state. The admin explicitly configured these GPUs for VFIO passthrough at boot — Unconfigure should respect that.
No way to distinguish intentional VFIO from runtime-bound VFIO — The preConfigureDriver tracking in Fix VFIO discovery and Unconfigure for pre-bound GPUs #1090 handles GPUs that were already on vfio-pci when Configure ran, but this state is lost across plugin restarts. The kernel cmdline vfio-pci.ids parameter is a persistent, authoritative signal that the GPU should stay on vfio-pci.
Proposed Solution:
Add an isVfioPciPrebound() check that reads /proc/cmdline for the vfio-pci.ids parameter and matches the GPU's device ID:
This would be used in Unconfigure as a fallback when preConfigureDriver is empty (e.g., after plugin restart):
ifinfo.preConfigureDriver==vfioPciDriver||isVfioPciPrebound(info.deviceID) {
klog.Infof("GPU %s is pre-bound to vfio-pci, leaving on vfio-pci", info.PciBusID)
returnnil
}
Alternatives Considered:
Checkpoint the pre-Configure driver — persist preConfigureDriver to disk so it survives plugin restarts. More complex and still doesn't cover the case where the plugin is freshly installed on a system with pre-bound GPUs.
Never rebind on NVLink systems — too broad; some users want dynamic bind/unbind for non-NVLink GPUs.
Scope: Small: CLI flag, config option, minor behavior change
Additional Context:
This was originally part of #1089 / #1090 but @varunrsekar asked for it to be tracked independently as a feature request. The core bug fix (preConfigureDriver tracking) is in #1090. This feature request covers the persistent detection via kernel cmdline.
Test environment: Dell XE8640 with 4x NVIDIA A40, kernel cmdline vfio-pci.ids=10de:2235 to pre-bind 2 GPUs for KubeVirt VM passthrough.
Component: gpu-kubelet-plugin
Problem Statement:
On systems where GPUs are pre-bound to
vfio-pciat boot via the kernel cmdline parametervfio-pci.ids=<vendor>:<device>(e.g.,vfio-pci.ids=10de:2330), the VFIO lifecycle doesn't recognize these GPUs as intentionally pre-bound. This causes two issues:Unconfigure rebinds them to nvidia — When a VM using a pre-bound GPU is deleted,
Unconfigurerebinds the GPU back to the nvidia driver. On NVLink systems (e.g., H100 SXM5), this triggers NVLink fabric reconfiguration that hangs for 30+ seconds and can leave the GPU in a bad state. The admin explicitly configured these GPUs for VFIO passthrough at boot — Unconfigure should respect that.No way to distinguish intentional VFIO from runtime-bound VFIO — The
preConfigureDrivertracking in Fix VFIO discovery and Unconfigure for pre-bound GPUs #1090 handles GPUs that were already onvfio-pciwhenConfigureran, but this state is lost across plugin restarts. The kernel cmdlinevfio-pci.idsparameter is a persistent, authoritative signal that the GPU should stay onvfio-pci.Proposed Solution:
Add an
isVfioPciPrebound()check that reads/proc/cmdlinefor thevfio-pci.idsparameter and matches the GPU's device ID:This would be used in
Unconfigureas a fallback whenpreConfigureDriveris empty (e.g., after plugin restart):Alternatives Considered:
preConfigureDriverto disk so it survives plugin restarts. More complex and still doesn't cover the case where the plugin is freshly installed on a system with pre-bound GPUs.Scope: Small: CLI flag, config option, minor behavior change
Additional Context:
This was originally part of #1089 / #1090 but @varunrsekar asked for it to be tracked independently as a feature request. The core bug fix (
preConfigureDrivertracking) is in #1090. This feature request covers the persistent detection via kernel cmdline.Test environment: Dell XE8640 with 4x NVIDIA A40, kernel cmdline
vfio-pci.ids=10de:2235to pre-bind 2 GPUs for KubeVirt VM passthrough.