Skip to content

Changing just the node_pool of a job will not result in allocation moving if there is an ephemeral disk #26600

@regner

Description

@regner

Nomad version

Nomad v1.10.1
BuildDate 2025-05-13T07:40:43Z
Revision 3431f13e8036b4716aac0e3b8c5854ddca212e5c

Operating system and Environment details

Nomad cluster is a mix of Linux and Windows machines. All the machines relevant to the issue are running Ubuntu 24.04.

Issue

Jobs with an ephemeral disk will not migrate to a new client when changing the node_pool value. I don't know if it impacts this but our scheduler is spread rather than binpack.

Reproduction steps

Assuming you have multiple nodes with at least 1 node in a pool called "pool1" and 1 node in a pool called "pool2" this should work.

Run this job:

job "example" {
  region      = "global"
  datacenters = ["dc1"]
  node_pool   = "pool1"
  type        = "service"

  group "example" {
    ephemeral_disk {
      migrate = true
    }

    task "example" {
      driver = "docker"

      config {
        image = "busybox"

        command = "sleep"
        args = [
          "infinity",
        ]
      }
    }
  }
}

Change the node_pool to "pool2" as follows:

job "example" {
  region      = "global"
  datacenters = ["dc1"]
  node_pool   = "pool2"
  type        = "service"

  group "example" {
    ephemeral_disk {
      migrate = true
    }

    task "example" {
      driver = "docker"

      config {
        image = "busybox"

        command = "sleep"
        args = [
          "infinity",
        ]
      }
    }
  }
}

Note that when the allocation is running again it will still be running on the same node it was previously running on, the node in pool1 rather than a node in pool2.

We have found that to get the allocation to migrate to a new node in the correct pool we need to apply a constraint to the job. For example if we update the job as follows it will migrate correctly to a node in pool2:

job "example" {
  region      = "global"
  datacenters = ["dc1"]
  node_pool   = "pool2"
  type        = "service"

  constraint {
    attribute = "${node.pool}"
    operator  = "!="
    value     = "pool1"
  }

  group "example" {
    ephemeral_disk {
      migrate = true
    }

    task "example" {
      driver = "docker"

      config {
        image = "busybox"

        command = "sleep"
        args = [
          "infinity",
        ]
      }
    }
  }
}

Expected Result

I would expect that changing the node_pool attribute of a job results in the scheduler migrating the job accordingly.

Actual Result

The job stays on the same node rather than being migrated.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions