Skip to content

scheduler: skip setting preferred node when nodepool changes#26662

Merged
mismithhisler merged 8 commits intomainfrom
f-update-node-preference-with-ephemeral-disks
Oct 8, 2025
Merged

scheduler: skip setting preferred node when nodepool changes#26662
mismithhisler merged 8 commits intomainfrom
f-update-node-preference-with-ephemeral-disks

Conversation

@mismithhisler
Copy link
Copy Markdown
Member

@mismithhisler mismithhisler commented Aug 29, 2025

Description

Preferred node is used when a task group has an ephemeral disk, so we ideally stay on the same node. However if the jobs node pool changes, we should not select the current node as the preferred node, and let the scheduler decide which node to pick from the correct node pool.

Testing & Reproduction steps

Links

Fixes GH #26600

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.
  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

Preferred node is used when a task group has an ephemeral disk, so we
ideally stay on the same node. However if the jobs node pool changes, we
should not select the current node as the preferred node, and let the
scheduler decide which node to pick from the correct node pool.
Copy link
Copy Markdown
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me, but seeing as @pkazmierczak has been deep in this code, I'd like for him to take a quick look.

It would be nice if we could tighten the use of must for the added test cases; such as:

		if err := h.Process(NewServiceScheduler, eval); err != nil {
			t.Fatalf("err: %v", err)
		}
		must.NoError(t, h.Process(NewServiceScheduler, eval))

PR also needs a changelog entry.

pkazmierczak
pkazmierczak previously approved these changes Sep 2, 2025
Copy link
Copy Markdown
Contributor

@pkazmierczak pkazmierczak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @mismithhisler, LGTM. As noted by James, a small refactoring of the tests would be nice and a changelog entry.

jrasell
jrasell previously approved these changes Sep 3, 2025
Comment thread scheduler/generic_sched_test.go Outdated
Comment thread scheduler/generic_sched_test.go Outdated
Comment thread scheduler/generic_sched_test.go
tgross
tgross previously approved these changes Sep 3, 2025
Copy link
Copy Markdown
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tgross
Copy link
Copy Markdown
Member

tgross commented Sep 3, 2025

I'm now realizing this bug probably happens with datacenters too (if they're not overlapping sets between versions), because we use both pool and DC to get the set of eligible nodes.

tgross
tgross previously approved these changes Sep 8, 2025
Copy link
Copy Markdown
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread scheduler/generic_sched_test.go Outdated
Comment thread scheduler/generic_sched_test.go Outdated
Comment thread scheduler/generic_sched_test.go
Comment thread scheduler/generic_sched.go Outdated
Copy link
Copy Markdown
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mismithhisler mismithhisler merged commit a825ee3 into main Oct 8, 2025
40 checks passed
@mismithhisler mismithhisler deleted the f-update-node-preference-with-ephemeral-disks branch October 8, 2025 16:28
mismithhisler added a commit that referenced this pull request Oct 8, 2025
Preferred node is used when a task group has an ephemeral disk, so we
ideally stay on the same node. However if the jobs node pool changes, we
should not select the current node as the preferred node, and let the
scheduler decide which node to pick from the correct node pool.
mismithhisler added a commit that referenced this pull request Oct 8, 2025
Preferred node is used when a task group has an ephemeral disk, so we
ideally stay on the same node. However if the jobs node pool changes, we
should not select the current node as the preferred node, and let the
scheduler decide which node to pick from the correct node pool.
mismithhisler added a commit that referenced this pull request Oct 8, 2025
Preferred node is used when a task group has an ephemeral disk, so we
ideally stay on the same node. However if the jobs node pool changes, we
should not select the current node as the preferred node, and let the
scheduler decide which node to pick from the correct node pool.
mismithhisler added a commit that referenced this pull request Oct 8, 2025
…nges (#26662)

Preferred node is used when a task group has an ephemeral disk, so we
ideally stay on the same node. However if the jobs node pool changes, we
should not select the current node as the preferred node, and let the
scheduler decide which node to pick from the correct node pool.
mismithhisler added a commit that referenced this pull request Oct 8, 2025
…nges (#26662) (#26916)

Preferred node is used when a task group has an ephemeral disk, so we
ideally stay on the same node. However if the jobs node pool changes, we
should not select the current node as the preferred node, and let the
scheduler decide which node to pick from the correct node pool.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 6, 2026

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Feb 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Changing just the node_pool of a job will not result in allocation moving if there is an ephemeral disk

5 participants