Skip to content

[Feature]: Support GPUs pre-bound to vfio-pci via kernel cmdline (vfio-pci.ids) #1099

@johnahull

Description

@johnahull

Component: gpu-kubelet-plugin

Problem Statement:

On systems where GPUs are pre-bound to vfio-pci at boot via the kernel cmdline parameter vfio-pci.ids=<vendor>:<device> (e.g., vfio-pci.ids=10de:2330), the VFIO lifecycle doesn't recognize these GPUs as intentionally pre-bound. This causes two issues:

  1. Unconfigure rebinds them to nvidia — When a VM using a pre-bound GPU is deleted, Unconfigure rebinds the GPU back to the nvidia driver. On NVLink systems (e.g., H100 SXM5), this triggers NVLink fabric reconfiguration that hangs for 30+ seconds and can leave the GPU in a bad state. The admin explicitly configured these GPUs for VFIO passthrough at boot — Unconfigure should respect that.

  2. No way to distinguish intentional VFIO from runtime-bound VFIO — The preConfigureDriver tracking in Fix VFIO discovery and Unconfigure for pre-bound GPUs #1090 handles GPUs that were already on vfio-pci when Configure ran, but this state is lost across plugin restarts. The kernel cmdline vfio-pci.ids parameter is a persistent, authoritative signal that the GPU should stay on vfio-pci.

Proposed Solution:

Add an isVfioPciPrebound() check that reads /proc/cmdline for the vfio-pci.ids parameter and matches the GPU's device ID:

func isVfioPciPrebound(deviceID string) bool {
    cmdline, err := os.ReadFile(filepath.Join(hostRoot, "/proc/cmdline"))
    if err != nil {
        return false
    }
    id := strings.TrimPrefix(deviceID, "0x")
    for _, param := range strings.Fields(string(cmdline)) {
        if !strings.HasPrefix(param, "vfio-pci.ids=") {
            continue
        }
        ids := strings.TrimPrefix(param, "vfio-pci.ids=")
        for _, entry := range strings.Split(ids, ",") {
            parts := strings.Split(entry, ":")
            if len(parts) == 2 && strings.EqualFold(parts[1], id) {
                return true
            }
        }
    }
    return false
}

This would be used in Unconfigure as a fallback when preConfigureDriver is empty (e.g., after plugin restart):

if info.preConfigureDriver == vfioPciDriver || isVfioPciPrebound(info.deviceID) {
    klog.Infof("GPU %s is pre-bound to vfio-pci, leaving on vfio-pci", info.PciBusID)
    return nil
}

Alternatives Considered:

  • Checkpoint the pre-Configure driver — persist preConfigureDriver to disk so it survives plugin restarts. More complex and still doesn't cover the case where the plugin is freshly installed on a system with pre-bound GPUs.
  • Never rebind on NVLink systems — too broad; some users want dynamic bind/unbind for non-NVLink GPUs.

Scope: Small: CLI flag, config option, minor behavior change

Additional Context:

This was originally part of #1089 / #1090 but @varunrsekar asked for it to be tracked independently as a feature request. The core bug fix (preConfigureDriver tracking) is in #1090. This feature request covers the persistent detection via kernel cmdline.

Test environment: Dell XE8640 with 4x NVIDIA A40, kernel cmdline vfio-pci.ids=10de:2235 to pre-bind 2 GPUs for KubeVirt VM passthrough.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

Status
Backlog

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions