Skip to content

Conversation

allisonlarson
Copy link
Member

@allisonlarson allisonlarson commented May 14, 2025

Description

When running a system job with a constraint, any run after an initial startup returns an exit(2) and a warning about unplaced allocations due to constraints. An error that is not encountered on the initial run, though the constraint stays the same. This is because the node that satisfies the condition is already running the allocation, and the placement is ignored. Another placement is attempted, but the only node(s) left are the ones that do not satisfy the constraint. Nomad views this case (no allocations that were attempted to placed could be placed successfully) as an error, and reports it as such. In reality, no allocations should be placed or updated in this case, but it should not be treated as an error.

This change uses the ignored & in-place updated placements from diffSystemAlloc to attempt to determine if the case encountered is an error (no ignored/in-place updates placements means that nothing is already running, and is an error), or is not one (an ignored placement means that the task is already running somewhere on a node). It does this at the point where failedTGAlloc is populated, so placement functionality isn't changed, just the field that populates the error.

There is functionality that should be preserved which (correctly) notifies a user if a job is attempted that cannot be run on any node due to the constraints filtering out all available nodes. This should still behave as expected, and an explicit test has been added for it.

Testing & Reproduction steps

Define a system jobspec with a constraint on a node in the node pool, and run it. Once an allocation is running on an available node, run (or plan) the job again. In the below example, there are 3 nodes and the constraint on the job is defined as

constraint {
    attribute = "${attr.unique.hostname}"
    operator  = "="
    value     = "nomad-client01"
 }

Previous behavior (on second run):

 $ nomad job run job.nomad.hcl
==> 2025-05-14T11:04:05-07:00: Monitoring evaluation "da38faeb"
    2025-05-14T11:04:05-07:00: Evaluation triggered by job "example"
    2025-05-14T11:04:06-07:00: Evaluation status changed: "pending" -> "complete"
==> 2025-05-14T11:04:06-07:00: Evaluation "da38faeb" finished with status "complete" but failed to place all allocations:
    2025-05-14T11:04:06-07:00: Task Group "cache" (failed to place 1 allocation):
      * Constraint "${attr.unique.hostname} = nomad-client01": 2 nodes excluded by filter

Reports a failure to place an allocation due to the constraint filtering out the node

New behavior (on second run):

$ nomad job run job.nomad.hcl
==> 2025-05-14T11:08:27-07:00: Monitoring evaluation "446123ac"
    2025-05-14T11:08:27-07:00: Evaluation triggered by job "example"
    2025-05-14T11:08:28-07:00: Evaluation status changed: "pending" -> "complete"
==> 2025-05-14T11:08:28-07:00: Evaluation "446123ac" finished with status "complete"

Links

Fixes #12748 #12016 #19413 #12366

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

When running `nomad job run <JOB>` multiple times with constraints
defined, there should be no error as a result of filtering out nodes
that do not/have not ever satsified the constraints.

When running a systems job with constraint, any run after an initial
startup returns an exit(2) and a warning about unplaced allocations due
to constraints. An error that is not encountered on the initial run,
though the constraint stays the same.

This is because the node that satisfies the condition is already running
the allocation, and the placement is ignored. Another placement is
attempted, but the only node(s) left are the ones that do not satisfy
the constraint. Nomad views this case (no allocations that were
attempted to placed could be placed successfully) as an error, and
reports it as such. In reality, no allocations should be placed or
updated in this case, but it should not be treated as an error.

This change uses the `ignored` placements from diffSystemAlloc to attempt to
determine if the case encountered is an error (no ignored placements
means that nothing is already running, and is an error), or is not one
(an ignored placement means that the task is already running somewhere
on a node). It does this at the point where `failedTGAlloc` is
populated, so placement functionality isn't changed, just the field that
populates error.

There is functionality that should be preserved which (correctly)
notifies a user if a job is attempted that cannot be run on any node due
to the constraints filtering out all available nodes. This should still
behave as expected.
@allisonlarson allisonlarson added the backport/1.10.x backport to 1.10.x release line label May 14, 2025
@allisonlarson allisonlarson added backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.9.x+ent Changes are backported to 1.9.x+ent labels May 14, 2025
@allisonlarson allisonlarson marked this pull request as ready for review May 14, 2025 22:53
@allisonlarson allisonlarson requested review from a team as code owners May 14, 2025 22:53
pkazmierczak
pkazmierczak previously approved these changes May 15, 2025
Copy link
Contributor

@pkazmierczak pkazmierczak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I've left a few small comments, but once those are resolved / dismissed we should be good-to-go here.

Comment on lines -1440 to -1443
// Ensure `groupA` fails to be placed due to its constraint, but `groupB` doesn't
require.Len(t, h.Evals[2].FailedTGAllocs, 1)
require.Contains(t, h.Evals[2].FailedTGAllocs, "groupA")
require.NotContains(t, h.Evals[2].FailedTGAllocs, "groupB")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're only suppressing the error in the case where a specific task group has an alloc, shouldn't these assertions and the ones in scheduler/scheduler_sysbatch_test.go still work? Or am I misunderstand why we're removing these? (totally a possibility! 😁 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these assertions are a bit of a misdirect to what is actually being tested. The test is testing that a node can be added an existing node pool where there are allocations running, and the node is correctly evaluated in the context of the defined constraints and the new node only get allocs that match the constraint. Since the allocs are already running in this case, the new behavior says that it shouldn't mark any of them as failed.

Theres an assertion later in the test that the allocations are only running on the nodes that they are expected to be running on, which seems like what the desired behavior should do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good! 👍


// Test that the system scheduler can handle a job with a constraint on
// subsequent runs, and report the outcome appropriately
func TestSystemSched_JobConstraint_RunMultipleTimes(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great test!

Co-authored-by: Piotr Kazmierczak <[email protected]>
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.9.x+ent Changes are backported to 1.9.x+ent backport/1.10.x backport to 1.10.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

System job with constrains fails to plan
3 participants