Skip to content

NVIDIA GPU scheduling issue with multiple models #26584

@RainbowHerbicides

Description

@RainbowHerbicides

Nomad version

Nomad v1.9.5
BuildDate 2025-01-14T18:35:12Z
Revision 0b7bb8b60758981dae2a78a0946742e09f8316f5+CHANGES

Issue

I am not entirely sure whenever this is a legit limitation of nomad + nomad-device-nvidia plugin or legit bug. According to documentation posted in https://developer.hashicorp.com/nomad/docs/job-specification/device#multiple-nvidia-gpu multiple GPU is supported but in reality - its not specified whenever those GPU should be the same model + located on same node or models can be different and only same node placement play the role. in our case - we have 2 NVIDIA GPU placed and available on one node. Setting them specifically like:

        device "nvidia/gpu/NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition" {
          count = 1
        }

or

        device "nvidia/gpu/NVIDIA RTX 5000 Ada Generation" {
          count = 1
        }

work without any issues, same as running container manually on node - nvidia-smi report that both cards are visible and can be utilised. But setting them as:

        device "nvidia/gpu" {
          count = 2
        }

result into placement failure

Reproduction steps

Have 2 NVIDIA GPU that are correctly fingerprinted:

❯ nomad node status -json fc9077e8 | jq '.NodeResources.Devices'                                                                                                                                                                                                            ~
[
  {
    "Attributes": {
      "cores_clock": {
        "Int": 210,
        "Unit": "MHz"
      },
      "pci_bandwidth": {
        "Int": 32768,
        "Unit": "MB/s"
      },
      "driver_version": {
        "String": "580.65.06",
        "Unit": ""
      },
      "memory": {
        "Int": 32760,
        "Unit": "MiB"
      },
      "bar1": {
        "Int": 256,
        "Unit": "MiB"
      },
      "display_state": {
        "String": "0",
        "Unit": ""
      },
      "power": {
        "Int": 14,
        "Unit": "W"
      },
      "memory_clock": {
        "Int": 405,
        "Unit": "MHz"
      },
      "persistence_mode": {
        "String": "0",
        "Unit": ""
      }
    },
    "Instances": [
      {
        "HealthDescription": "",
        "Healthy": true,
        "ID": "GPU-45bc2781-22da-689e-59d5-f3778161164f",
        "Locality": {
          "PciBusID": "00000000:00:1B.0\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"
        }
      }
    ],
    "Name": "NVIDIA RTX 5000 Ada Generation",
    "Type": "gpu",
    "Vendor": "nvidia"
  },
  {
    "Attributes": {
      "memory": {
        "Int": 97887,
        "Unit": "MiB"
      },
      "memory_clock": {
        "Int": 405,
        "Unit": "MHz"
      },
      "power": {
        "Int": 8,
        "Unit": "W"
      },
      "pci_bandwidth": {
        "Int": 49152,
        "Unit": "MB/s"
      },
      "cores_clock": {
        "Int": 180,
        "Unit": "MHz"
      },
      "bar1": {
        "Int": 256,
        "Unit": "MiB"
      },
      "persistence_mode": {
        "String": "0",
        "Unit": ""
      },
      "driver_version": {
        "String": "580.65.06",
        "Unit": ""
      },
      "display_state": {
        "String": "0",
        "Unit": ""
      }
    },
    "Instances": [
      {
        "HealthDescription": "",
        "Healthy": true,
        "ID": "GPU-fb44165c-1a4f-a9dd-aa1b-30fd8c5658e3",
        "Locality": {
          "PciBusID": "00000000:00:10.0\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"
        }
      }
    ],
    "Name": "NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition",
    "Type": "gpu",
    "Vendor": "nvidia"
  }
]

Create job that have defined device set just to <vendor>/<type> or just <vendor> or just type and set to count = 2:

        device "nvidia/gpu" {
          count = 2
        }

Expected Result

job evaluated that there is 2 GPU on node that match <vendor>/<type> or <vendor> or <type> combination and reserve those GPU + create container with correct NVIDIA_VISIBLE_DEVICES env variable

Actual Result

Evaluation + placement failure

I had tried to add constraints that will point to one of the defined in list ids, model or even resource attributes but all of those resulted in placement failure. As soon as count had been reduced from = 2 to = 1 - job without any problem evaluated into one of those cards. I checked whenever some process lock card making its inaccessible but no, nothing. We have recent version of nvidia-device-plugin 1.1.0 plus we also compiled code from master branch (version reported to 1.2.0) but have pretty much the same result.

Question is: is this is a legit limitation and documentation does not tell specifically that Multiple GPU means Multiple GPU with same model or this is a bug? If there is a question "why do we put different models of GPU in same container" - we are working heavily with ollama. Couple of our servers have configuration where we have more and less power hungry cards on the same node. Due to the fact that different models have different size - ollama can automatically choose what cards it can use currently and will be enough for current model/task

Metadata

Metadata

Assignees

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions