Skip to content

✨ Add in-place update hooks to API#12343

Merged
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
alexander-demicev:inplacehooks
Sep 26, 2025
Merged

✨ Add in-place update hooks to API#12343
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
alexander-demicev:inplacehooks

Conversation

@alexander-demicev
Copy link
Contributor

@alexander-demicev alexander-demicev commented Jun 11, 2025

What this PR does / why we need it:
This PR introduces the runtime hooks for in-place updates see In-place updates proposal and design doc for more details

umbrella issue #12291

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

/area runtime-sdk

@k8s-ci-robot k8s-ci-robot added area/runtime-sdk Issues or PRs related to Runtime SDK cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 11, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 11, 2025
Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR

TBH, It is a little bit complex to think about all the implications without a deeper understanding of when/how those hooks are being called. e.g.

  • is []string the best way to represent changes (e.g. how this will work when there changes to items in an array)
  • do we need more than []string? e.g. reference to KCP/MD, corresponding templates, may be list or reference to controlled machines impacted by the change
  • do we need something to link a set of proposed/accepted changes to in-place updates of the corresponding machines? (e.g. what will happen if the spec changes again in the meantime)

Considering I don't have a good answer to those questions yet, my gut feeling is that we should first look into where/how hooks are going to be called and then use learning to finalize & implement hooks, also because without the first part, hooks are not usable.

But no strong opinions, I'm also ok in merging a first release and then iterate, but this will probably require us to make exceptions e.g. if breaking changes will be required (probably not a blocker in this case, but I just want to bring this up).

@alexander-demicev
Copy link
Contributor Author

@fabriziopandini Thank you for the great feedback, all your points are valid. We need to start somewhere with the implementation, and my intention is to split it into smaller PRs to make the review process much easier. The in-place update feature will be hidden behind a feature gate and won’t be announced as alpha until we all agree on it.

I’ll start implementing a reference updater as soon as possible, and at the same time, we’re going to begin building an experimental updater in Rancher to battle-test the idea. Hopefully, this will help clarify some of the open questions as we go.

Regarding the book, I completely agree that we need to create a dedicated page. I can open an issue to track it, but I’d prefer to wait a bit until we have some functional code to document.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 23, 2025
@anmazzotti
Copy link

anmazzotti commented Jul 2, 2025

* is []string the best way to represent changes (e.g. how this will work when there changes to items in an array)

* do we need more than []string? e.g. reference to KCP/MD, corresponding templates, may be list or reference to controlled machines impacted by the change

Good call. It probably makes sense to stop at arrays and let the updater figure it out.
The Machine object is also updated/patched before the updater starts taking actions, so all data can be fetched if needed. I assume most implementations will be idempotent, simply fetch the to-update Machine or InfraMachine and reconcile that.

* do we need something to link a set of proposed/accepted changes to in-place updates of the corresponding machines? (e.g. what will happen if the spec changes again in the meantime)

I think the updater implementation has to keep its specific "updating" state. If multiple updates are triggered, the updater should know that a previous update is already running and the new request has to wait. I think the alternative is to forbid updates on resources annotated with runtime.cluster.x-k8s.io/pending-hooks: ExternalUpdate, but this may be too strict. For example you may want to fix a mistake while the in-place upgrade is running, for example if you set the new memoryMB value too low and you see nodes struggling, you may patch that with an updated value to fix your Cluster before it's too late.

All good points anyway, I also have some doubts over corner cases at the moment and I'd like to see some different updaters implemented, so I'm all up to merge and iterate as frequent as possible.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 19, 2025
@alexander-demicev alexander-demicev force-pushed the inplacehooks branch 6 times, most recently from b17efd7 to f7392dd Compare September 19, 2025 13:58
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look with Fabrizio, first round of feedback

@alexander-demicev
Copy link
Contributor Author

@sbueringer @fabriziopandini All comments should be resolved

Signed-off-by: Alexandr Demicev <alexandr.demicev@suse.com>
Co-authored-by: Stefan Büringer <buringerst@vmware.com>
@sbueringer
Copy link
Member

Thank you very much!

/lgtm

/assign @fabriziopandini

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 26, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

DetailsGit tree hash: aff3bf25ef66bdcca63524b4811be38884c50653

@fabriziopandini
Copy link
Member

/lgtm
/approve

Let's keep on with the momentum on this effort 🥳

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 26, 2025
@k8s-ci-robot k8s-ci-robot merged commit f143b80 into kubernetes-sigs:main Sep 26, 2025
18 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.12 milestone Sep 26, 2025
@alexander-demicev alexander-demicev deleted the inplacehooks branch September 29, 2025 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/runtime-sdk Issues or PRs related to Runtime SDK cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants