-
Notifications
You must be signed in to change notification settings - Fork 472
feat: SkipMachinePoolModelReconciliation #6325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
jackfrancis marked this conversation as resolved.
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -97,6 +97,32 @@ spec: | |
| type: RollingUpdate | ||
| ``` | ||
|
|
||
| ### Skipping Model Reconciliation | ||
| - **Feature status:** Experimental (Alpha) | ||
| - **Feature gate:** SkipMachinePoolModelReconciliation=false | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be |
||
| - **Default value:** false (disabled) | ||
|
|
||
| By default, when the VMSS model changes (for example, when the OS image, VM SKU, or any other field that maps to the | ||
| underlying VMSS model is updated), CAPZ will progressively replace existing VMSS instances so that every instance is | ||
| running the latest model. This is implemented in the `AzureMachinePool` reconciler by requeueing as long as any instance | ||
| is detected to be running a stale model, and by prioritizing stale-model instances when selecting machines to delete. | ||
|
|
||
| The `SkipMachinePoolModelReconciliation` feature gate disables that automatic convergence. When the gate is enabled: | ||
|
|
||
| - `AzureMachinePool` will not requeue solely because one or more VMSS instances are running a stale model. | ||
| - The rolling update strategy will not preferentially delete stale-model instances; deletion is driven only by the | ||
| configured `deletePolicy` (e.g., `Oldest`, `Newest`, `Random`) and `maxUnavailable` / `maxSurge` budgets. | ||
| - Existing instances on a previous model will persist until the pool is explicitly scaled, an instance is manually | ||
| deleted, or the user otherwise triggers replacement. | ||
|
|
||
| This gate does **not** prevent CAPZ from updating the underlying VMSS template when the `AzureMachinePool` spec | ||
| changes, and it does **not** block surge behavior in the VMSS reconciler itself. It only controls whether CAPZ will | ||
| proactively replace instances running an older model with new instances (running the latest model). | ||
|
|
||
| This is useful for testing scenarios and for operators who want to manage instance refresh on their own schedule | ||
| without disabling other reconciliation behavior. To enable it, set `EXP_SKIP_MACHINE_POOL_MODEL_RECONCILIATION=true` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where is this environment variable wired? It doesn't look like it's being read in manager.yaml like the other feature gates. |
||
| in the CAPZ controller-manager environment (or pass `--feature-gates=SkipMachinePoolModelReconciliation=true`). | ||
|
|
||
| ### AzureMachinePoolMachines | ||
| `AzureMachinePoolMachine` represents a virtual machine in the scale set. `AzureMachinePoolMachines` are created by the | ||
| `AzureMachinePool` controller and are used to track the life cycle of a virtual machine in the scale set. When a | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a test case to
TestMachinePoolScope_NeedsRequeueto teset the new path as well? Something like "should not requeue if an instance VM image does not match the VMSS when SkipMachinePoolModelReconciliation is enabled". Sorry I missed this in the initial review.