-
Notifications
You must be signed in to change notification settings - Fork 557
Don't use readiness condition to detect if updates are in progress #922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/cc @rthallisey |
/cc @mhrivnak |
here's another point. Readiness may or may not always reflect that the operand's update/deployment is in progress. It could be that there's something else preventing the pod's readiness check from passing that is specific to that operator. Basically, it's feasible for an operator to be stuck with readiness failing for reasons other than the operand is still updating/deploying. In those cases, the operator will never get updated. |
To clarify, we are not suggesting that operators set readiness=false any time they make changes in a cluster. We are suggesting that as part of an operator update, initialization tasks should delay readiness of the new operator. I agree that readiness probes are not the ideal long term solution. It makes it difficult to perform long-running migrations for the reasons you noted. We likely need a separate "upgrade probe" concept in OLM that can perform arbitrary checks for upgrade safety. CVO solves this problem with a push model (operators write out to a single, shared API to indicate progress) but this has its own set of problems. In practice though, there are reasons why this may not be as big of an issue as you might think. There are two broad approaches to managing operands for operators: Many Version Approach: The operator supports running many versions (potentially any version) of an operand. For example, the etcd operator can run "any" version of etcd. Newer versions of etcd operator indicate improved operator performance and new operator features, and do not affect running etcd clusters managed by the operator. For this type of operator, upgrade readiness may involve nothing beyond controller readiness. Migrations to data are triggered by user interaction on the operator CRs. They publish updates under "stability" channels like alpha/beta/stable. Tightly Coupled Approach: The operator version is tightly coupled to the operand version. Updating the operator version rolls out updates to the operands. In this case, the operator needs to upgrade operands during an upgrade. Because of this tight coupling, there is likely a very small window of operand versions that can be "automatically" updated (for example, you may choose to allow an update from postgres 9.3.2 to 9.3.3, but probably not 9.3 to 9.4, as that can indicate significant api and behavior changes that would not be transparent to a consumer). The pattern for operators using this approach is to publish "version" channels (e.g. a "9.3" channel). In this case it is be reasonable for an operator to support an ongoing migration of all operands from all previous versions of the operator. The operator signals readiness when the controller is ready, but automated updates only occur to other operators that can continue that same migration. |
I understand the two patterns but I don't understand how this is related. OLM tries to establish a pattern where not-ready means that something special is happening which should not be interrupted by OLM, wheres the whole rest of k8s sees it the other way round, that this pod is not yet as valuable as all the other ones which are ready. There are many ways to signal that updates are in progress:
all better then using the readiness probe. But maybe I miss something. Having a model which generates warnings by default does not seem right. |
We - KubeVirt and friends - consider this a serious issue.
Do you think you could provide an alternative upgrade status probeb i.e. through some prometheus metric (which the operator would offer)? The issue is that we don't necessarily see problems right now, but definelty in operations when there will be hundreds of events, alerts by prometheus, incorrect metrics (due to kube-state-metrics) and so on and so forth just due to a long urnning upgrade - which can in the end hide real issues within the cluster. And this is the real issue. |
@ecordell it does seem problematic that the probe will prevent prometheus from grabbing monitoring data. Any thoughts on dealing with that? I can't think of a workaround.
What are the limits around this monitoring setup? When do alerts start firing? I can think of some normal workloads that take minutes to start under normal circumstances. That would be more surprising for a controller, but is the monitoring controller-specific? In any case, it would help to understand more specifically what will generate alerts in this prometheus setup.
While the readiness probe is used to determine if a pod can accept network traffic, and is used to prioritize which pod to delete when scaling down, I would not make broad conclusions from those two behaviors about how all of k8s should interpret the probe. It would be interesting to ask sig-apps how this use case matches with the intent of a readiness probe. |
Since you can have multiple deployments in a CSV file (unifed CSV file), you could create a deployment that monitors the conditions of another operator and reports |
Interesting idea! That's a good point that in the case of OCS and CNV, the meta-operators can take on this. They just need to know how to determine if their associated operators are themselves done upgrading. For more traditional operators though, where there's just one Pod running most of the time, it would be a shame to require a second Pod just for this purpose. |
For KubeVirt we are planning to provide a prometheus endpoint and a CR
status to indicate the update status.
Other components could provide similar ways to provide their status. The
meta operator could check all of these different ways.
|
Upgrade gating should probably be avoided in most cases. We know that an operator can be interrupted at any time, and it needs to be able to start a new Reconcile on a resource and continue where it left off. In most cases, the next release of an operator should be able to continue any upgrade work the previous version was unable to complete. We also know that upgrades will often skip directly to the newest release in the Channel, potentially skipping several releases, and thus that target release must be able to do any upgrade work that the intervening releases would have done. In cases of upgrading to a major new release, like 2.y.z to 3.y.z, then it may be useful to drop 2.y.z upgrade logic from the 3 code base. That's a use case worth considering, but a feature like the readiness gate may not be the best tool for the job. In a case where a definitive handoff needs to be made, a variation on leader election could be useful. A ConfigMap could serve as an upgrade lock that is held by either the operator itself or by OLM to prevent or execute upgrade respectively. Exiting gracefully might be a more valuable behavior than upgrade gating for most cases. The default grace period for a Pod to terminate is 30s, which would often be plenty of time for a Reconcile function performing upgrade steps to achieve a consistent state where it can safely return. That may not be supported by controller-runtime currently, but enabling reconciles to finish gracefully after the process receives a TERM could be a valuable step toward upgrade safety. |
This is how the etcd-operator hands off between versions |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This is still an important issue that stakeholders would like resolved. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
We are continuing discussion / design on this issue in this enhancement PR: operator-framework/enhancements#23 |
/kind bug
Using the readiness probe to detect if updates are in progress (by setting it to false), like described in #879 is conceptually wrong and may cause multiple issues on the long run.
Background: k8s treats not ready components as early in the initialization phase, which are not yet ready to fulfill their tasks. See https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/controller_utils.go#L741 for details:
Further pods are supposed to be ready after some time:
Further it is very common to detect prometheus scrape targets via endpoints (supported by prometheus itself). If a target is not ready, it will not be scraped. Consequences:
All this are hints that the readiness condition is used the wrong way which can have serious impact.
The text was updated successfully, but these errors were encountered: