Skip to content

Conversation

tgross
Copy link
Member

@tgross tgross commented Jun 25, 2025

As part of ongoing work to make the scheduler more legible and more robustly tested, we're implementing property testing of at least the reconciler. This changeset provides some infrastructure we'll need for generating the test cases using pgregory.net/rapid, without building out any of the property assertions yet (that'll be in upcoming PRs over the next couple weeks).

The alloc reconciler generator produces a job, a previous version of the job, a set of tainted nodes, and a set of existing allocations. The node reconciler generator produces a job, a set of nodes, and allocations on those nodes. Reconnecting allocs are not yet well-covered by these generators, and with ~40 dimensions covered so far we may need to pull those out to their own tests in order to get good coverage. We can do that in subsequent PRs as well.

Note the scenarios only randomize fields of interest; fields like the job name that don't impact the reconciler would use up available shrink cycles on failed tests without actually reducing the scope of the scenario.

Ref: https://hashicorp.atlassian.net/browse/NMD-814
Ref: https://github.com/flyingmutant/rapid


- name: Run property tests
run: |
go test -v -cover ./scheduler/reconciler -rapid.checks=100000 -run PropTest
Copy link
Member Author

@tgross tgross Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: as it turns out, the -rapid flags won't get detected properly if we try to use a package spec like ./scheduler/..., so we need a separate line for each package we want to cover and then we can filter down to the property tests with the -run flag.

Also, running 100k iterations doesn't take very long but I figure we may want to keep this short for the moment just until we've got some useful work for them to do, and then we can crank it up to 1M or whatever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw I ran it locally with 1m checks and it really wasn't long at all:

PASS
coverage: 78.5% of statements
ok  	github.com/hashicorp/nomad/scheduler/reconciler	130.909s	coverage: 78.5% of statements

As part of ongoing work to make the scheduler more legible and more robustly
tested, we're implementing property testing of at least the reconciler. This
changeset provides some infrastructure we'll need for generating the test cases
using `pgregory.net/rapid`, without building out any of the property assertions
yet (that'll be in upcoming PRs over the next couple weeks).

The alloc reconciler generator produces a job, a previous version of the job, a
set of tainted nodes, and a set of existing allocations. The node reconciler
generator produces a job, a set of nodes, and allocations on those
nodes. Reconnecting allocs are not yet well-covered by these generators, and
with ~40 dimensions covered so far we may need to pull those out to their own
tests in order to get good coverage.

Note the scenarios only randomize fields of interest; fields like the job name
that don't impact the reconciler would use up available shrink cycles on failed
tests without actually reducing the scope of the scenario.

Ref: https://hashicorp.atlassian.net/browse/NMD-814
Ref: https://github.com/flyingmutant/rapid
jrasell
jrasell previously approved these changes Jun 26, 2025
Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Very exciting.

I've left a couple of inline comments but they are by no means blocking and just me thinking aloud.

local macOS run
$ go test -v -cover ./scheduler/reconciler -rapid.checks=100000 -run PropTest
=== RUN   TestAllocReconciler_PropTest
=== RUN   TestAllocReconciler_PropTest/batch_jobs
2025-06-26T08:20:13.580+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:13.609+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:13.614+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:13.627+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:13.738+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:13.749+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:13.761+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:13.930+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.022+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.044+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.095+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.165+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.201+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.420+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.493+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.548+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.557+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.569+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.629+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.748+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.900+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.926+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.952+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.954+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:14.962+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.038+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.054+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.155+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.298+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.348+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.364+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.394+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.575+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.611+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.656+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.779+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.809+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.954+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:15.969+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.079+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.107+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.111+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.117+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.241+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.291+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.320+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.404+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.423+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.453+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.510+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.547+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.594+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.689+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.741+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.807+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.873+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:16.993+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.044+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
    reconcile_cluster_prop_test.go:24: [rapid] OK, passed 100000 tests (3.718572958s)
=== RUN   TestAllocReconciler_PropTest/service_jobs
2025-06-26T08:20:17.183+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.256+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.267+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.273+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.383+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.385+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.546+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.575+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.581+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.684+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.698+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.879+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.928+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:17.958+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:18.123+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:18.127+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:18.423+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:18.486+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:18.568+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:18.605+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:18.689+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:18.855+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:18.874+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:18.910+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:18.968+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.002+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.009+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.030+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.035+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.130+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.284+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.461+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.506+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.545+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.557+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.628+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.710+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.746+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.751+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.788+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.827+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.838+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:19.936+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:20.042+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:20.493+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:20.548+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:20.586+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:20.658+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:20.694+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:20.712+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
2025-06-26T08:20:20.783+0100 [DEBUG] reconciler/reconnecting_picker.go:44: reconnecting-picker: picking according to strategy: strategy=best_score
    reconcile_cluster_prop_test.go:33: [rapid] OK, passed 100000 tests (3.665426458s)
--- PASS: TestAllocReconciler_PropTest (7.38s)
    --- PASS: TestAllocReconciler_PropTest/batch_jobs (3.72s)
    --- PASS: TestAllocReconciler_PropTest/service_jobs (3.67s)
=== RUN   TestNodeReconciler_PropTest
=== RUN   TestNodeReconciler_PropTest/system_jobs
    reconcile_node_prop_test.go:15: [rapid] OK, passed 100000 tests (2.792266959s)
=== RUN   TestNodeReconciler_PropTest/sysbatch_jobs
    reconcile_node_prop_test.go:25: [rapid] OK, passed 100000 tests (2.796401333s)
--- PASS: TestNodeReconciler_PropTest (5.59s)
    --- PASS: TestNodeReconciler_PropTest/system_jobs (2.79s)
    --- PASS: TestNodeReconciler_PropTest/sysbatch_jobs (2.80s)
PASS
coverage: 78.5% of statements
ok  	github.com/hashicorp/nomad/scheduler/reconciler	13.729s	coverage: 78.5% of statements

pkazmierczak
pkazmierczak previously approved these changes Jun 26, 2025
Copy link
Contributor

@pkazmierczak pkazmierczak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great start for property testing in the scheduler, great work on the generators!

@tgross tgross merged commit ec8250e into main Jun 26, 2025
39 checks passed
@tgross tgross deleted the testing-reconciler-proptest-generation branch June 26, 2025 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants