-
Notifications
You must be signed in to change notification settings - Fork 1.2k
UMBRELLA: design and refactor graceful termination #764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
See some of the previous discussions about graceful termination:
I think a better approach than sleeping (and one that's been discussed a few times) is properly wiring up all the contexts (well, currently stop channels) and using either a wait group (for manager shutdown) or some sort of map + lock + shutdown mechanism (for manager shutdown + dynamically (un)loading controllers). There's a tentative desire to replace all the stop channels with context plumbed all the way through -- graceful manager termination could plausibly be done in tandem with that change (or not, but seems natural). |
Sorry for my poor explanation. What I meant to propose is that, by inserting sleep just after the signal handler, we can omit When a Pod is shutting down, pod termination and endpoint deletion are executed at the same time. One way to wait for endpoint deletion is to add the |
/kind design We'd need a design document in form of PR to the controller-runtime repository. /help |
@vincepri: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Presumably the duration to sleep there would need to be configurable, with zero meaning not to sleep at all. But sleeping like this is a brittle cover for the lack of proper coordination. The chosen duration always turns out to be wrong. |
Based on the discussion today I think ideally we see one design sweep out all of the related context issues:
if the manager and all dependencies respect context behavior properly, the ask here becomes a matter of usage -- wire up your signal handler to wait for the manager to cleanly exit. I don't think the design will be anything crazy, but having poked at this a bit I think it will take some time + thought to cleanly flesh out the corner cases. |
It seems a small handful of issues around stopping managers and controllers (including #730, which I am fairly invested in) just got deduplicated into this one. Could we consider renaming this issue to reflect the scope? Perhaps a new issue is warranted to track all of this? The title (and to some extent content) of this issue don't immediately reflect the scope it seems to have taken on. |
@negz fair point! re-titled this issue, let me know if you have further feedback |
consider my solution at PR #805 , which was used by kube-controller-manager. |
The controller program exit step may including these:
|
…is removed from the HNCConfiguration Spec If a type is removed from the HNCConfiguration Spec, we will set the corresponding object reconciler to "ignore" mode. Ideally, we should shut down the corresponding object reconciler. Gracefully terminating an object reconciler is still under development (kubernetes-sigs/controller-runtime#764). Once the feature is released, we will see if we can shut down the object reconciler instead of setting it to "ignore" mode.
If a type is removed from the HNCConfiguration Spec, we will set the corresponding object reconciler to "ignore" mode. Ideally, we should shut down the corresponding object reconciler. Gracefully terminating an object reconciler is still under development (kubernetes-sigs/controller-runtime#764). Once the feature is released, we will see if we can shut down the object reconciler instead of setting it to "ignore" mode.
If a type is removed from the HNCConfiguration Spec, we will set the corresponding object reconciler to "ignore" mode. Ideally, we should shut down the corresponding object reconciler. Gracefully terminating an object reconciler is still under development (kubernetes-sigs/controller-runtime#764). Once the feature is released, we will see if we can shut down the object reconciler instead of setting it to "ignore" mode.
Is it common to use contexts for this purpose, given that contexts are intended to be "request scoped"? It feels like using a context to replace a stop channel could be a misuse, depending on how you interpret a "request". |
Apparently it is - kubernetes/kubernetes#57932 is an example of leader election being migrated from stop channels to contexts. |
Making incremental notes on context changes required:
|
@alexeldeib just for my understanding, maybe I am missing something. How is graceful termination depending on having plugged through the usage of |
Yeah, I think this issue arguably re-conflated two things:
|
|
To shutdown controller gracefully, containers should be terminated after endpoints deletion.
If
time.Sleep(someSeconds)
is added between these lines, applications made with controller runtime can wait for endpoint deletion as default.What do you think?
The text was updated successfully, but these errors were encountered: