Skip to content

server: Validate config num schedulers is between 0 and num CPUs.#25441

Merged
jrasell merged 2 commits intomainfrom
b-config-validate-server-num-schedulers
Mar 20, 2025
Merged

server: Validate config num schedulers is between 0 and num CPUs.#25441
jrasell merged 2 commits intomainfrom
b-config-validate-server-num-schedulers

Conversation

@jrasell
Copy link
Copy Markdown
Member

@jrasell jrasell commented Mar 19, 2025

The server.num_scheduler configuration value should be a value between 0 and the number of CPUs on the machine. The Nomad agent was not validating the configuration parameter which meant you could use a negative value or a value much larger than the available machine CPUs. This change enforces validation of the configuration value both on server startup and when the agent is reloaded.

The Nomad API was only performing negative value validation when updating the scheduler number via this method. This change adds to the validation to ensure the number is not greater than the CPUs on the machine.

Out documentation currently correctly states the expected value bounds: https://developer.hashicorp.com/nomad/docs/configuration/server#num_schedulers

Testing & Reproduction steps

Using the configuration example below, try starting a Nomad dev agent with various values using this change and before. In tests with the agent started, you can alter the value and send a SIGHUP signal to the agent to test reload validation.

server {
  num_schedulers = 10
}

To test the API change you can start a Nomad agent in dev mode then attempt to write the following data via the curl command curl --request PUT --data @payload.json http://localhost:4646/v1/agent/schedulers/config:

{
  "enabled_schedulers": [
    "service",
    "batch",
    "system",
    "sysbatch",
    "_core"
  ],
  "num_schedulers": 1000
}

I went back and forth on the idea of just updating the SchedulerWorkerPoolArgs.IsValid function only to get the validation we want. It would work, but the server has to be started before this check is triggered. I therefore decided to have both the config validation and the backend validation with the idea to expand the configuration validation to cover more cases and eventually plug this into the config validate command.

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

tgross
tgross previously approved these changes Mar 19, 2025
Copy link
Copy Markdown
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread command/agent/agent_endpoint_test.go
Comment thread nomad/config.go Outdated
jrasell added 2 commits March 19, 2025 14:29
The `server.num_scheduler` configuration value should be a value
between 0 and the number of CPUs on the machine. The Nomad agent
was not validating the configuration parameter which meant you
could use a negative value or a value much larger than the
available machine CPUs. This change enforces validation of the
configuration value both on server startup and when the agent is
reloaded.

The Nomad API was only performing negative value validation when
updating the scheduler number via this method. This change adds
to the validation to ensure the number is not greater than the
CPUs on the machine.
Copy link
Copy Markdown
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jrasell jrasell merged commit 5a157eb into main Mar 20, 2025
31 checks passed
@jrasell jrasell deleted the b-config-validate-server-num-schedulers branch March 20, 2025 07:29
@github-actions
Copy link
Copy Markdown

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Jul 19, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants