-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix a bug in olm install #6490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a bug in olm install #6490
Conversation
Result of running the $ operator-sdk olm install --verbose
DEBU[0000] Debug logging is set
INFO[0000] Fetching CRDs for version "latest"
INFO[0000] Fetching resources for resolved version "latest"
INFO[0001] Creating CRDs and resources
INFO[0001] Creating CustomResourceDefinition "catalogsources.operators.coreos.com"
INFO[0001] Creating CustomResourceDefinition "clusterserviceversions.operators.coreos.com"
INFO[0001] Creating CustomResourceDefinition "installplans.operators.coreos.com"
INFO[0001] Creating CustomResourceDefinition "olmconfigs.operators.coreos.com"
INFO[0001] Creating CustomResourceDefinition "operatorconditions.operators.coreos.com"
INFO[0001] Creating CustomResourceDefinition "operatorgroups.operators.coreos.com"
INFO[0001] Creating CustomResourceDefinition "operators.operators.coreos.com"
INFO[0001] Creating CustomResourceDefinition "subscriptions.operators.coreos.com"
INFO[0001] Creating Namespace "olm"
INFO[0001] Creating Namespace "operators"
INFO[0001] Creating ServiceAccount "olm/olm-operator-serviceaccount"
INFO[0001] Creating ClusterRole "system:controller:operator-lifecycle-manager"
INFO[0001] Creating ClusterRoleBinding "olm-operator-binding-olm"
INFO[0001] Creating OLMConfig "cluster"
FATA[0001] Failed to install OLM version "latest": failed to create CRDs and resources: no matches for kind "OLMConfig" in version "operators.coreos.com/v1" Result after this fix; notice the $ build/operator-sdk olm install
INFO[0000] Fetching CRDs for version "latest"
INFO[0000] Fetching resources for resolved version "latest"
INFO[0001] Creating CRDs and resources
INFO[0001] Creating CustomResourceDefinition "catalogsources.operators.coreos.com"
INFO[0001] CustomResourceDefinition "catalogsources.operators.coreos.com" created
INFO[0001] Creating CustomResourceDefinition "clusterserviceversions.operators.coreos.com"
INFO[0001] CustomResourceDefinition "clusterserviceversions.operators.coreos.com" created
INFO[0001] Creating CustomResourceDefinition "installplans.operators.coreos.com"
INFO[0001] CustomResourceDefinition "installplans.operators.coreos.com" created
INFO[0001] Creating CustomResourceDefinition "olmconfigs.operators.coreos.com"
INFO[0001] CustomResourceDefinition "olmconfigs.operators.coreos.com" created
INFO[0001] Creating CustomResourceDefinition "operatorconditions.operators.coreos.com"
INFO[0001] CustomResourceDefinition "operatorconditions.operators.coreos.com" created
INFO[0001] Creating CustomResourceDefinition "operatorgroups.operators.coreos.com"
INFO[0001] CustomResourceDefinition "operatorgroups.operators.coreos.com" created
INFO[0001] Creating CustomResourceDefinition "operators.operators.coreos.com"
INFO[0001] CustomResourceDefinition "operators.operators.coreos.com" created
INFO[0001] Creating CustomResourceDefinition "subscriptions.operators.coreos.com"
INFO[0001] CustomResourceDefinition "subscriptions.operators.coreos.com" created
INFO[0001] Creating Namespace "olm"
INFO[0001] Namespace "olm" created
INFO[0001] Creating Namespace "operators"
INFO[0001] Namespace "operators" created
INFO[0001] Creating ServiceAccount "olm/olm-operator-serviceaccount"
INFO[0001] ServiceAccount "olm/olm-operator-serviceaccount" created
INFO[0001] Creating ClusterRole "system:controller:operator-lifecycle-manager"
INFO[0001] ClusterRole "system:controller:operator-lifecycle-manager" created
INFO[0001] Creating ClusterRoleBinding "olm-operator-binding-olm"
INFO[0001] ClusterRoleBinding "olm-operator-binding-olm" created
INFO[0001] Creating OLMConfig "cluster"
INFO[0001] the CRD for OLMConfig "cluster" is not ready yet. Retrying in one second
INFO[0004] OLMConfig "cluster" created
INFO[0004] Creating Deployment "olm/olm-operator"
INFO[0004] Deployment "olm/olm-operator" created
INFO[0004] Creating Deployment "olm/catalog-operator"
INFO[0004] Deployment "olm/catalog-operator" created
INFO[0004] Creating ClusterRole "aggregate-olm-edit"
INFO[0004] ClusterRole "aggregate-olm-edit" created
INFO[0004] Creating ClusterRole "aggregate-olm-view"
INFO[0004] ClusterRole "aggregate-olm-view" created
INFO[0004] Creating OperatorGroup "operators/global-operators"
INFO[0004] OperatorGroup "operators/global-operators" created
INFO[0004] Creating OperatorGroup "olm/olm-operators"
INFO[0004] OperatorGroup "olm/olm-operators" created
INFO[0004] Creating ClusterServiceVersion "olm/packageserver"
INFO[0004] ClusterServiceVersion "olm/packageserver" created
INFO[0004] Creating CatalogSource "olm/operatorhubio-catalog"
INFO[0004] CatalogSource "olm/operatorhubio-catalog" created
INFO[0004] Waiting for deployment/olm-operator rollout to complete
INFO[0004] Waiting for Deployment "olm/olm-operator" to rollout: 0 of 1 updated replicas are available
INFO[0005] Deployment "olm/olm-operator" successfully rolled out
INFO[0005] Waiting for deployment/catalog-operator rollout to complete
INFO[0005] Deployment "olm/catalog-operator" successfully rolled out
INFO[0005] Waiting for deployment/packageserver rollout to complete
INFO[0005] Waiting for Deployment "olm/packageserver" to rollout: 0 of 2 updated replicas are available
INFO[0012] Deployment "olm/packageserver" successfully rolled out
INFO[0012] Successfully installed OLM version "latest"
NAME NAMESPACE KIND STATUS
catalogsources.operators.coreos.com CustomResourceDefinition Installed
clusterserviceversions.operators.coreos.com CustomResourceDefinition Installed
installplans.operators.coreos.com CustomResourceDefinition Installed
olmconfigs.operators.coreos.com CustomResourceDefinition Installed
operatorconditions.operators.coreos.com CustomResourceDefinition Installed
operatorgroups.operators.coreos.com CustomResourceDefinition Installed
operators.operators.coreos.com CustomResourceDefinition Installed
subscriptions.operators.coreos.com CustomResourceDefinition Installed
olm Namespace Installed
operators Namespace Installed
olm-operator-serviceaccount olm ServiceAccount Installed
system:controller:operator-lifecycle-manager ClusterRole Installed
olm-operator-binding-olm ClusterRoleBinding Installed
cluster OLMConfig Installed
olm-operator olm Deployment Installed
catalog-operator olm Deployment Installed
aggregate-olm-edit ClusterRole Installed
aggregate-olm-view ClusterRole Installed
global-operators operators OperatorGroup Installed
olm-operators olm OperatorGroup Installed
packageserver olm ClusterServiceVersion Installed
operatorhubio-catalog olm CatalogSource Installed |
internal/olm/client/client.go
Outdated
if err != nil { | ||
if !apierrors.IsAlreadyExists(err) { | ||
|
||
for i := 0; i < 5; i++ { // try to create 5 times before giving up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like if we get the NoMatchError
5 times, we'll swallow the error and return nil. We probably would want to surface the error in this case instead of swallowing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 good point. But in that case, what do you say about making it a lop of ten, just for case?
We probably want a thorough set of unit tests to both reproduce the original error and exercise the edge cases of the new retry loop. |
log.Infof(" the CRD for %s %q is not ready yet. Retrying in one second", kind, resourceName) | ||
time.Sleep(time.Second) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a bit more idiomatic (at least from our kubernetes adjacency) to use the client-go retry package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to using the retry package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @nunnatsa !
I have a general question - does retrying actually resolve the error?
I am wondering if part of the problem is that there is no logic in the installation method in
operator-sdk/internal/olm/installer/client.go
Lines 88 to 91 in 861f524
log.Print("Creating CRDs and resources") | |
if err := c.DoCreate(ctx, objs...); err != nil { | |
return nil, fmt.Errorf("failed to create CRDs and resources: %v", err) | |
} |
CustomResourceDefinition
resources and the instances of the CRDs so the ordering in which these resources are being created could be out of whack. This makes me think that retrying won't suffice in the event the CRD hasn't even been created yet due to the ordering of resource creation. I am thinking it might make more sense to have a distinct step for creating the CRDs that verifies all the CRDs have been created and then attempts to create the resources. (although it does appear to be the case that the list of resources is sorted such that CRDs come first here: operator-sdk/internal/olm/installer/client.go
Line 210 in 861f524
resources := append(crdResources, olmResources...) |
Also leaving a couple comments:
# release notes and/or the migration guide | ||
entries: | ||
- description: > | ||
Fix a bug where `olm install` command is failed fo "no-match" error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It might be nice to show an example of the error output
log.Infof(" the CRD for %s %q is not ready yet. Retrying in one second", kind, resourceName) | ||
time.Sleep(time.Second) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to using the retry package
Thank you all. You raised some very good points, and I was trying to address them all. The As for the ordering: this is a great point and I checked it before I even started the implementation. For now, all the CRDs are deployed before the resources. I fully agree that for the long run, it will be better to split the CRD creation and the resource creation. But this refactoring is out of my capacity right now - sorry. Anyway, since the order is right, then this solution solves the problem. In my environment I always see (only) one failure message, and then the creation succeeds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. A couple nits:
internal/olm/client/client.go
Outdated
return false, err | ||
}) | ||
|
||
if err != nil && err == context.DeadlineExceeded { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
if err != nil && err == context.DeadlineExceeded { | |
if err != nil && errors.Is(err, context.DeadlineExceeded) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
$ operator-sdk olm install --verbose | ||
DEBU[0000] Debug logging is set | ||
INFO[0000] Fetching CRDs for version "latest" | ||
INFO[0000] Fetching resources for resolved version "latest" | ||
INFO[0001] Creating CRDs and resources | ||
INFO[0001] Creating CustomResourceDefinition "catalogsources.operators.coreos.com" | ||
INFO[0001] Creating CustomResourceDefinition "clusterserviceversions.operators.coreos.com" | ||
INFO[0001] Creating CustomResourceDefinition "installplans.operators.coreos.com" | ||
INFO[0001] Creating CustomResourceDefinition "olmconfigs.operators.coreos.com" | ||
INFO[0001] Creating CustomResourceDefinition "operatorconditions.operators.coreos.com" | ||
INFO[0001] Creating CustomResourceDefinition "operatorgroups.operators.coreos.com" | ||
INFO[0001] Creating CustomResourceDefinition "operators.operators.coreos.com" | ||
INFO[0001] Creating CustomResourceDefinition "subscriptions.operators.coreos.com" | ||
INFO[0001] Creating Namespace "olm" | ||
INFO[0001] Creating Namespace "operators" | ||
INFO[0001] Creating ServiceAccount "olm/olm-operator-serviceaccount" | ||
INFO[0001] Creating ClusterRole "system:controller:operator-lifecycle-manager" | ||
INFO[0001] Creating ClusterRoleBinding "olm-operator-binding-olm" | ||
INFO[0001] Creating OLMConfig "cluster" | ||
FATA[0001] Failed to install OLM version "latest": failed to create CRDs and resources: no matches for kind "OLMConfig" in version "operators.coreos.com/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would recommend only showing the full error message to keep the changelog entry as short as possible but still informative
@nunnatsa it looks like the sanity check is failing due to an out of date fork. I think you need to fetch the latest changes on master and rebase this PR on top of that. |
I think the sanity check is broken. I rebased my pr, again, just before thr lastest push. |
When installing OLM, retry creating resources, if the CRD is not ready yet. Signed-off-by: Nahshon Unna-Tsameret <[email protected]>
fea61a3
to
4e443da
Compare
@everettraven - done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
When installing OLM, retry creating resources, if the CRD is not ready yet.
Fixes: #6489
Description of the change
On olm install command, if the creation of a resource fails with "no-match" error, retry creating it, assuming the CRD is not ready yet, but is going to be ready very soon.
Motivation for the change
Bug fix. See #6489
##Checklist
If the pull request includes user-facing changes, extra documentation is required:
changelog/fragments
(seechangelog/fragments/00-template.yaml
)website/content/en/docs