Component: gpu-kubelet-plugin
Bug Description:
Four related bugs in the VFIO passthrough lifecycle that prevent GPU passthrough to VMs on multi-GPU and NVLink systems:
-
CDI spec missing /dev/vfio/vfio — GetCommonEdits only includes /dev/vfio/vfio when enableAPIDevice=true. Libvirt requires it to detect VFIO support regardless of the API device setting.
-
VFIO discovery advertises non-vfio GPUs — enumerateGpuVfioDevices treats any GPU not on the nvidia driver as a VFIO candidate, including driverless GPUs (stuck after a failed unbind). The scheduler allocates them, and prepare fails or hangs.
-
Unconfigure rebinds pre-bound GPUs — On H100 SXM5 with NVLink, Unconfigure tries to rebind vfio-pci GPUs back to nvidia, which hangs indefinitely during NVLink fabric reconfiguration. GPUs pre-bound to vfio-pci at boot (via vfio-pci.ids kernel cmdline) should stay on vfio-pci.
-
Sysfs checks fail inside containers — checkVfioPCIModuleLoaded and checkIommuEnabled check /host-root/sys/ which doesn't expose host sysfs inside containers. The VfioPciManager fails to initialize even though vfio_pci and IOMMU are working on the host.
Steps to Reproduce:
- System with H100 SXM5 GPUs (NVLink), GPUs pre-bound to vfio-pci via
vfio-pci.ids=10de:2330
- Deploy NVIDIA DRA driver with
PassthroughSupport=true
- Create a ResourceClaim requesting a VFIO GPU
- Various failures depending on which bug is hit
Expected Behavior:
VFIO prepare should succeed for GPUs already bound to vfio-pci. CDI spec should always include /dev/vfio/vfio. Only GPUs actually on vfio-pci should be advertised. Sysfs checks should work inside containers.
DRA Driver Version: v25.12.0
Kubernetes Version: v1.36.0
GPU Model: NVIDIA H100 SXM5 80GB HBM3
NVIDIA Driver Version: 595.58
OS / Kernel: Fedora 44, kernel 6.19.14
Container Runtime: containerd 2.2.3
Feature Gates: PassthroughSupport=true, DeviceMetadata=true
Component: gpu-kubelet-plugin
Bug Description:
Four related bugs in the VFIO passthrough lifecycle that prevent GPU passthrough to VMs on multi-GPU and NVLink systems:
CDI spec missing /dev/vfio/vfio —
GetCommonEditsonly includes/dev/vfio/vfiowhenenableAPIDevice=true. Libvirt requires it to detect VFIO support regardless of the API device setting.VFIO discovery advertises non-vfio GPUs —
enumerateGpuVfioDevicestreats any GPU not on the nvidia driver as a VFIO candidate, including driverless GPUs (stuck after a failed unbind). The scheduler allocates them, and prepare fails or hangs.Unconfigure rebinds pre-bound GPUs — On H100 SXM5 with NVLink,
Unconfiguretries to rebind vfio-pci GPUs back to nvidia, which hangs indefinitely during NVLink fabric reconfiguration. GPUs pre-bound to vfio-pci at boot (viavfio-pci.idskernel cmdline) should stay on vfio-pci.Sysfs checks fail inside containers —
checkVfioPCIModuleLoadedandcheckIommuEnabledcheck/host-root/sys/which doesn't expose host sysfs inside containers. The VfioPciManager fails to initialize even though vfio_pci and IOMMU are working on the host.Steps to Reproduce:
vfio-pci.ids=10de:2330PassthroughSupport=trueExpected Behavior:
VFIO prepare should succeed for GPUs already bound to vfio-pci. CDI spec should always include
/dev/vfio/vfio. Only GPUs actually on vfio-pci should be advertised. Sysfs checks should work inside containers.DRA Driver Version: v25.12.0
Kubernetes Version: v1.36.0
GPU Model: NVIDIA H100 SXM5 80GB HBM3
NVIDIA Driver Version: 595.58
OS / Kernel: Fedora 44, kernel 6.19.14
Container Runtime: containerd 2.2.3
Feature Gates: PassthroughSupport=true, DeviceMetadata=true