-
Notifications
You must be signed in to change notification settings - Fork 39
Operator Status <Upgrade Readiness> Enhancement #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If an operator is in the middle of performing this critical work and dies for some reason (bug, OOM, hardware failure, etc), it's important that it be able to resume that work as soon as it can re-start. We wouldn't want to upgrade the operator in such a scenario, when OLM is not able to reach the proposed endpoint. Would you agree? If so, it's worth capturing that behavior in this proposal: OLM will not initiate an upgrade without positive confirmation that the operator is ready for it. But then we need an escape hatch. There will be scenarios where either the operator can't function correctly until it gets updated (so the endpoint is down), or there's a bug in its implementation of this endpoint, so maybe it's always reporting not-ready-for-upgrade. There needs to be a way to force the update even when the operator either can't be reached or says it can't be updated. Any thoughts on what that mechanism would be? |
Hello @mhrivnak, Thanks for the detailed review!
Agreed - I believe if an operator defines an upgrade readiness endpoint OLM should not upgrade an operator when it is unable to connect to the endpoint. I'll update the enhancement.
There is a manual process where the user can delete the subscription/csv to uninstall the operator and reinstall the operator at the desired version. |
@mhrivnak I am also going to make sure that we call out that the Upgrade Readiness Check is opt-in, meaning OLM will default to existing in the absence of an upgrade readiness endpoint, in the interest of support backwards compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my comments are centered around one issue and hope to not cut a new version for CSV. These are mostly my concerns and questions. Thanks for putting this together.
I feel like at heart there is a more general issue - how does a complex component, like an operator, indicate that its ready to be upgraded (by OLM, or any other cluster management component). I feel like we can think about tackling this as both an upstream and downstream challenge - our solution can be a standard for different use cases. My preference is to start simple - just a targeted annotation on the operator that indicates operator readiness. Operator authors can opt into using it by simply manipulating the annotation - adding one if the upgrade is NOT safe for some reason. OLM checks for the existence of this annotation before upgrading, otherwise proceeding as usual. I'm not sure if OLM does any existing introspection on operators it installs (that may require some design). But its a solution that would be easy to understand by members of the community and easy to implement for operator authors. |
Thanks for the review @exdx
Where is this annotation applied? It would need to be some top level resource representing the installation of the opreator, likely the CSV or the Operator API that should be available soon. In either approach, we are specifically granting operators the opportunity to interact with OLM owned resources. This could introduce unintended consequences, imagine an operator with permissions to edit a CSV escalates it's RBAC privs.
While simple, this is not a well defined standard. Wouldn't it be easier for users if we re-implemented the standards defined by Readiness Probes? |
My idea was to have the annotation on the operator pod itself - this is something we actually did when we wrote a real operator at my last job. Using annotations to indicate that something was ready to change, be torn down, or be updated. This approach would require OLM to discover the annotations on the operator pods that it manages before attempting an upgrade - which wouldn't be too bad since we know the name of the operator from the CSV and can just query the pod directly. I agree that asking the operator author to know about and modify the CSV or operator object is too specific and not a good idea. The probe idea sounds good in principle but what does it entail - kubelet performs probes, not OLM. A probe describes a health check to be performed against a container to determine whether it is alive or ready to receive traffic. I don't see how we can use them for operator upgrades |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explores the service / probe angle well but I'm not sure the other options are given enough time to breathe. Looking more at conditions
as a way to report back status could fit really well with the Operator API work, which already aggregates conditions from operator components.
I also think that we should do this with an eye towards providing more general status than just "upgradeable" - we are essentially defining the communication channel between operators and OLM, and I think it's reasonable that we / some operators will want to expand that beyond whether or not they are upgradeable.
Some ideas that may help as potential things we'd want to communicate:
- Metrics / key prometheus queries
- Whether the operator thinks it is in a state that can be upgraded
- Reporting "degraded" service like we do with CVO-operators
0015571
to
a8bf0d2
Compare
Updated to capture some more recent thoughts - still needs to incorporate all the great feedback. |
enhancements/upgrade-readiness.md
Outdated
|
||
From the presence of the `Migrating` condition, OLM will understand that it should not upgrade the operator. The Operator also communicates that the `reason` the operator cannot be upgraded is due to a "critical process" and provides a human readable `message` for cluster admins. | ||
|
||
OLM will also highlight Conditions 1 and 2 within the new [Operator API](https://github.com/operator-framework/api/blob/master/crds/operators.coreos.com_operators.yaml) within the [condtions](https://github.com/operator-framework/api/blob/master/crds/operators.coreos.com_operators.yaml#L108) in the status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@njhale should we create an "OLM Supported Conditions" condition array?
| Status | Status is the status of the condition. Can be True, False, Unknown. | | ||
| Reason | Unique, one-word, CamelCase reason for the condition's last transition. | | ||
| Message | Human-readable message indicating details about last transition. Typically used by cluster admins for debugging purposes. | | ||
| LastTransitionTime | Last time the condition transitioned from one status to another. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (or a lastHeartbeatTime) might be a useful field to enforce a hard or soft grace-period. A NotInterruptible condition must be updated at least every 30 minutes, for example. If not:
- Soft grace-period - let the user know through OLM that the operator needs their attention.
- Hard grace-period - assume the process in unrecoverable and allow the upgrade to continue anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting idea. My only concern is an operator that does not update the condition within the Hard grace-period could lead to severe consequences that OLM can't anticipate. If we decide to proceed down this route, it seems like an Operator Author should be able to opt-out of the hard grace-period. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing them to opt out is worth considering, but should probably be discouraged since it instills a false confidence. A workload being able to pick up where it left off without corrupting data is part of being a k8s workload. I see this enhancement as OLM being polite. When a node is drained, or a pod evicted, the operator will not receive the same politeness (just a normal k8s graceperiod before SIGKILL).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A workload being able to pick up where it left off without corrupting data is part of being a k8s workload.
+1000, unfortunately we've seen time and again that some operators fail or are unable to deliver their operator in this format.
I see this enhancement as OLM being polite. When a node is drained, or a pod evicted, the operator will not receive the same politeness (just a normal k8s graceperiod before SIGKILL).
I'm not convinced that this is the right approach. If operators are truly successful and used by our customers to drive their business there could easily be a situation where OLM ignoring a critical condition of an operator causes direct revenue loss. Your suggestion might be the right approach, but I want other to share their opinions as well. CC @shawn-hurley @ecordell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it would be an admins choice to enable a hard grace-period or a soft grace-period and they would be responsible for the risks associated with the workloads they run. I could see lots of folks being very happy having a hard grace-period while others may very much want a soft grace period. The key here would be documenting the user choice.
I don't think that operator author would get the choice here, they get the ability to state some things and if the user want's to ignore or take other actions I don't think OLM should get in the way of that?
Also, I could be viewing this through the wrong lense?
If this is something we are debating, are we adding a user story for it or capturing it for a follow up somewhere? I don't want to lose the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it would be an admins choice to enable a hard grace-period or a soft grace-period and they would be responsible for the risks associated with the workloads they run. I could see lots of folks being very happy having a hard grace-period while others may very much want a soft grace period. The key here would be documenting the user choice.
I don't think that operator author would get the choice here, they get the ability to state some things and if the user want's to ignore or take other actions I don't think OLM should get in the way of that
100%
If this is something we are debating, are we adding a user story for it or capturing it for a follow up somewhere? I don't want to lose the idea.
I can update this enhancement or create it as an epic we introduce later to keep the scope of this enhancement smaller - whichever you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could get something similar -- w/o putting the onus on the operator to supply the heartbeat -- by supporting "condition source" combinations; e.g. upgradeable=f(CR, pod)
, where perhaps the pod being ready is required for f(CR, pod)=true
.
enhancements/upgrade-readiness.md
Outdated
|
||
| Condition Name | Effect | | ||
| -------------- | ------ | | ||
| NotInterruptible | When true, OLM will not upgrade the operator. The pod associated with the operator may still be killed by external forces.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the description you provided, should this be NotUpgradable
instead? As you noted, the the process can still be interrupted by external forces.
Also, to minimize the potential for OLM to pickup conditions that are not intended for it by the operator (false positives) would it make sense to have an OLM
prefix or something for the condition? Like OLM.NotUpgradable = true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the description you provided, should this be NotUpgradable instead? As you noted, the the process can still be interrupted by external forces.
I had considered naming this condition NotUpgradeable
but was concerned that we might want to use this condition for additional purposes down the road. I believed that NotInterruptible
gave us enough wiggle room to allow operator to express their intent by setting the condition while giving OLM enough wiggle-room to expand on the behavior as we can make additional guarantees. An argument against this may be changing default upgrade behavior, but I assumed that the benefit of keeping "Operator Intent" clear was worth it.
Also, to minimize the potential for OLM to pickup conditions that are not intended for it by the operator (false positives) would it make sense to have an OLM prefix or something for the condition? Like OLM.NotUpgradable = true
I believe this concern is discussed here. Basically, by requiring an operator to define the mapping there is no room for the conflict to occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where you're coming from -- we want to have some wiggle room. But semantically, NotUpgradeable
feels like its more related to OLM. OLM is responsible for the upgrading of the operator. NotInterruptible
feels like something a PDB would cover - by saying maxUnavailable =0
. Also, maybe the operator author wants to interrupt their CR process (if it breaks) but still does not want an upgrade. By requiring them to write NotInterruptible
to their status it could potentially be confusing to them.
But I don't have my heart set on it. Curious what @njhale @ecordell think on this.
As to the second point, I see. They have to explicitly set the annotation on the CRD for the status update on the CR to have any bearing. That sounds good, as long as we document that clearly.
enhancements/upgrade-readiness.md
Outdated
|
||
OLM will also highlight Conditions 1 and 2 within the new [Operator API](https://github.com/operator-framework/api/blob/master/crds/operators.coreos.com_operators.yaml) within the [condtions](https://github.com/operator-framework/api/blob/master/crds/operators.coreos.com_operators.yaml#L108) in the status. | ||
|
||
Note that by adding the `NotInterruptible` "OLM Supported Condition" to each CR, OLM will not upgrade the operator if any CR reports this condition as `true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can block operator upgrades if a particular CR ends up in a bad state and is unable to recover (for example the DB migration fails). I think it's valid to block the upgrade in that case regardless but its worth pointing out - at that point only a manual cleanup will allow for an upgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion here to help UX in this scenario: #23 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think requiring cluster admin intervention is the safest approach, but @jupierce offered an interesting alternative.
Edit: Echo echo...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a very interesting idea, to have hard and soft limits for the condition and proceeding anyway. Some thoughts around this:
- By mandating a certain status gets updated every 30 minutes we are really starting to mandate how operator authors write their operator, which lowers adoption. Asking them to write a certain condition to their status is straightforward enough (they are doing it anyway) but saying to do it periodically is different. The logic definitely grows more complex, and I think we would be the ones to have to provide a library for them to use.
- Conceptually, a grace period is used when something is wrong. I don't think
NotInterruptible
is necessarily a bad condition that an operator needs to get out of - it's a transient condition that indicates a workload is in progress and has state, like an ML training job. We would expect that eventually the workload comes out of that state and the operator goes back to an upgradable state. - as @awgreene mentioned, OLM terminating workloads without user input is pretty risky behavior in general. Any deletion/upgrade event is initiated by the user, either by deleting the CSV or the subscription moving to a new version. The subscription makes the process seamless but still the user updates the catalog on the cluster with the new version, either explicitly or via catalog polling. It's really up to the operator author to make sure their operands are well behaved.
All that being said, I think you suggestion brings up a need for an example, in code, of how to set and unset the condition and best practices for it. That way operator authors have a clear picture of how to use the condition to their benefit.
enhancements/upgrade-readiness.md
Outdated
|
||
### How Operators will opt-in | ||
|
||
Operators must opt-in to "OLM Supported Conditions" by adding an annotation to their owned CRD, where the key is the defined by OLM and the value is the `type` of the condition on the CR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the CR that has the not condition is from a required CRD, and not an owned CRD?
For example, vault has a requirement on etcd. Say the vault migration fails due to a etcd cluster in a bad state - the vault operator would write the condition to the vault CR? That seems to work. I feel like there is some additional complexity in this area that is worth looking at - are we covering all the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great point and should be discussed in greater detail in this enhancement. I believe that OLM should consider if an operator can be upgraded when calculating a new Generation
.
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good. I left some feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we need to add the second half of this, which is the operator can write directly to the Probe
resource.
status: false | ||
reason: "NotUpgradeable" | ||
message: "The operator has communicated that the operator is not upgradeable" | ||
probeResources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to get un-readable once you add three CR's nevermind 100.
Can these instead become refs to the CR's that are causing the condition to trigger?
probeResources:
- name: foo-example
namespace: default
kind: Foo
group: foo.example.com
reason: "!Migrating"
```
I get the desire to include the entire decision, I am just worried that it will be too much for a user to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair - I am fine removing the list of conditions and simply including the reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that a single resource may cause multiple OLM Supported Conditions to fire, I suggest making the reason
field an array of strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to explain this at all? Or can we have something more opaque like relatedObjects
that just points at the involved resources, and leaves it up the the included message
and external docs like this enhancement to spell out the important pieces (e.g. default/foo-example has UnhealthyDatabase and BadConnectivity
instead of the example's current, generic The operator has communicated that the operator is not upgradeable
).
spec: | ||
specOverride: crdAnnotations # Changes to the CRD's annotations should override the spec. SPO will watch the CRD for changes to the "OLM Supported Condition" annotations and update the spec accordingly. | ||
probeResources: | ||
- resource: foo.example.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the CRD name.
f634b2f
to
7920db6
Compare
/lgtm |
enhancements/upgrade-readiness.md
Outdated
|
||
A core requirement of this enhancement is to provide operators with the ability to communicate "Upgrade Readiness", providing operators with the ability to signal to OLM that they should not be upgraded at this time. Consider the following scenario based on OLM today: | ||
|
||
>An operator installed with automatic updates is executing a critical process that could place the cluster in a bad state if interrupted. While this critical process is running, a new version of the operator becomes available. Since the operator was installed with automatic updates, OLM upgrades the operator without allowing the critical process to finish and places the cluster in an unrecoverable state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example conflicts with the non-goal:
OLM cannot guarantee that some outside force will not interrupt the services of the operator and will not provide the operator with the ability to communicate that it should not be "Interrupted".
Personally, I prefer the non-goal. The solution to "things break when I do dangerous, irrecoverable things and get updated part-way through" is "don't do dangerous, irrecoverable things". A better example for this enhancement might be "later versions of my operator, which you might be considering updating to, are not compatible with the current cluster state. If you update me to those versions, I will break". Or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ersonally, I prefer the non-goal. The solution to "things break when I do dangerous, irrecoverable things and get updated part-way through" is "don't do dangerous, irrecoverable things".
This example is meant to highlight that OLM will not start the upgrade process while an operator is communicating that it cannot be upgraded to the next version.
A better example for this enhancement might be "later versions of my operator, which you might be considering updating to, are not compatible with the current cluster state. If you update me to those versions, I will break". Or something like that?
As @ecordell discussed, this example is slightly different than the issues we are trying to address in this enhancement.
|
||
| Condition Type | Effect | | ||
| -------------- | ------ | | ||
| Upgradeable | When `false`, OLM will not upgrade the operator.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In OpenShift, we pivoted Upgradeable
to only block minor+ bumps, and to no longer block patch bumps (rhbz#1797624). That allows you to gracefully recover from:
I released a broken patch which sets
Upgradeable=False
more aggressively than I meant, and now I want to left folks perform patch updates out of that version without having to jump through whatever hoops the overly-strict version required to clearUpgradeable=False
.
However, it also relies on two assumptions:
- Operator workloads are always interruptible for patch bumps. Which you definitely want, but this enhancement seems maybe not ready to commit to (e.g. here).
- Patch bumps never make any changes large enough that you'd actually want to block them. Which you also want, but I have no idea how well folks SemVer their operators, if they SemVer them at all...
### Goals | ||
|
||
- Provide operators with the means to explicitly communicate a condition that influences OLM behavior, such as the "Upgrade Readiness" condition defined above. | ||
- Preserve existing behavior if the operator does not take advantage of this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there an intent here that OLM will then propagate this status to the CVO in order to block OCP upgrades under certain conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Operators saying "don't update me" and "don't update the host cluster where I live" seem like two different things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's somewhat orthogonal but i bring it up because we are moving forward with documentation that says "make sure all your operators are at the latest version before your upgrade your cluster", so if we know it's physically impossible for your operators to be at the latest version for some reason, it seems worth considering if we should also blocked the cluster upgrade. (in fact even if it's possible to upgrade your operators but you haven't done so, it would also be nice to block the upgrade).
does not have to be covered here, but wanted it in people's heads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not explicit here (because this is an operator-framework enhancement) but one goal here is to allow more communication between OLM <> Operators, and in the future use that to inform the communication between OLM <> CVO and decisions around cluster upgrades. That will be spelled out in a future openshift enhancement, though
The communication channel with OLM is valuable independent of CVO/upgrades, so we're tackling that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be worth changing Upgradeable
to OperatorUpgradeable
or some such in this enhancement to distinguish between the two cases, so we don't end up with ClusterUpgradeable
about "please don't bump the host system" and an unqualified Upgradeable
about "please don't bump me, the operator".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be worth changing Upgradeable to OperatorUpgradeable or some such in this enhancement to distinguish between the two cases, so we don't end up with ClusterUpgradeable about "please don't bump the host system" and an unqualified Upgradeable about "please don't bump me, the operator".
I don't believe that this will be an issue given that this communication channel will exist between OLM and the operator that it manages. OLM would still communicate to CVO using the existing clusteroperator resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how would OLM know which sub-operator Upgradeable=False
we're "hold the core, my current version is incompatible with future core APIs" and should be passed along in the ClusterOperator and which were "don't bump me, my future versions will break on the current cluster" and should not be passed on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OLM team has discussed the scenarios that @wking is describing but I don't believe that we've made any final decisions.
That being said, I don't know that operators will be responsible for handling either of these scenarios. They certainly can be, but it places a lot of burden on the operators to understand the requirements of an operator that hasn't been released yet (version n of an operator must know the requirements of operator version n+1).
In the past, we had discussed having OLM make intelligent decisions when upgrading an operator by looking at available APIs and the RBAC associated with the operator. In this scenario, OLM could identify both of the cases @wking mentioned.
Again, I haven't kept up with this aspect of OLM and would welcome a comment from @ecordell if I am operating off stale info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking that's the "other side" of this problem (and isn't something that should be addressed in this operator-framework enhancement).
This enhancement is just for operators communicating to OLM that they're in the middle of something important, which has been a long-standing problem for upgrades.
With this enhancement done, OLM could later use that to inform CVO that an operator is currently not in an interruptible state (which likely should mean reporting itself Upgradeable=false
with the ability for an admin to indicate that they don't care about interrupting certain operators).
But that's not sufficient to determine whether a cluster upgrade is safe. The latest thing we've discussed on that front is that we probably want to default to OLM reporting Upgradeable=false
to CVO for any operator that's installed that doesn't have an explicit "I work on OCP version X" where X is the next minor release (the mechanism of how we get that information still under some discussion) - but that will be a new enhancement in openshift/enhancements, since it's OCP-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this enhancement done, OLM could later use that to inform CVO that an operator is currently not in an interruptible state (which likely should mean reporting itself Upgradeable=false with the ability for an admin to indicate that they don't care about interrupting certain operators).
On the CVO side, Upgradeable=False
only blocks minor updates, not patch level updates. If a particular workload wants to mark itself uninterruptible, or otherwise make claims about how interruptible it is, it should use pod disruption budgets. That will also protect you from things like deschedulers. So maybe the only thing that needs a new, update-specific channel is "I'm not comfortable with the host environment or other external dependencies updating right now".
New changes are detected. LGTM label has been removed. |
f819132
to
b003683
Compare
Closing this in favor of #42 with the expectation that we will eventually revisit the discussion on how to provide operators with the means to communicate complex states to OLM without writing to an OLM CRD directly. |
No description provided.