Pathological case creating multiple CRDs in a single phase

I am trying to reconcile these manifests using Boxcutter in an OpenShift cluster: https://github.com/openshift/cluster-api-provider-aws/blob/main/openshift/capi-operator-manifests/default/manifests.yaml

Notice that the manifests contain 21 CRDs, which is not uncommon for CAPI providers. Also notice that every CRD has the annotation:

```
service.beta.openshift.io/inject-cabundle: "true"
```

This is typical in an OpenShift cluster. It will cause the service-ca-operator to inject the cluster's CA bundle in to the CRD. Critically, this injection will happen very quickly.

The problem occurs in the new object handler: https://github.com/package-operator/boxcutter/blob/79d1013f9fd4301393674671834509247c7b6ab1/machinery/objects.go#L231-L257

If the object does not already exist we use an alternate flow which does:
* Create (with POST) instead of Apply, followed immediately by
* migrateFieldManagersToSSA, modifies the field managers to look like an Apply

We do this because SSA does not provide a reliable mechanism to avoid patching an existing object, and we want the collision protection to be robust.

However, when creating a CRD with inject-cabundle, migrateFieldManagersToSSA will fail almost every time with a conflict because service-ca-operator will have modified the object in between create and apply. This causes the object reconciliation to fail, which in turn causes the phase reconciliation to fail. My controller loop returns this error to controller-runtime and waits to be rescheduled, which is idiomatic. When the phase contains 21 CRDs and every one will return an error when it is initially created, this rapidly escalates the exponential backoff. Very soon we hit the maximum 10 minutes between reconcile attempts. Every CAPI provider has the same issue. Clusterbot clusters only lives for a few hours, and I have not yet seen it complete.

I see several workaround and potential opportunities for improvement here:

The simplest is to always attempt to reconcile every object in a phase and return a single, aggregated error instead of stopping on the first failure. This would be a potentially breaking change if some users have carefully ordered manifests based on the assumption of this behaviour, so I would propose adding this as a PhaseReconcileOption. I will start working on a PR to implement this in any case. This works in my case because on the first reconcile attempt *every* CRD will return a conflict, but they will also all have been created. This will return an error and we will retry, but they are unlikely to conflict again on the second reconcile, as service-ca-operator will already have updated them.

Secondly, we could remove migrateFieldManagersToSSA from this create flow entirely. This should be safe: the controller already needs to watch Create events for its managed objects to reconcile on initial startup. Therefore the creation of a managed object should trigger another reconciliation. We also call migrateFieldManagersToSSA on update (it is usually a no-op), so we would still migrate the field owner metadata, just on the next reconciliation. At this point we should no longer be racing so predictably between the installer controller and service-ca-operator, so we would expect this to almost always succeed. Certainly it should not fail so often that exponential backoff becomes a problem. A potential disadvantage to this approach is that if the installer controller is not careful with the predicates for its managed objects, this could result in an additional reconciliation loop: 1 for the initial create, 1 triggered by create (this would happen normally anyway), and an extraneous one because the previous reconcile updated objects.

Lastly, we could consider if there is a safe way to use SSA for the initial creation without violating collision protection. We could have a separate code path when collision detection is disabled, but that only increases code complexity overall as we now have an addition code path to maintain. I had an idea that creation with SSA without ForceOwnership might be sufficient, as any field created by another manager, including CSA, would cause a conflict. It's definitely possible to conceive of a case were desired and actual objects have no overlapping fields, though, so I don't think this is as robust as the current behaviour unless we can find some 'trick' which guarantees a conflict. We could also take it to upstream k8s and see what they say if we haven't already. The desire is to guarantee creation of an object which we will subsequently maintain with SSA, so we want an atomic creation with SSA metadata. This sounds like a reasonable request.

This last one would require careful thought in any case. I would not propose this in the short term, but I believe it would be the ideal solution in the long term.

	case errors.IsNotFound(err):
	// Object might still already exist on the cluster,
	// either because of slow caches or because
	// label selectors exclude it from the cache.
	//
	// To be on the safe-side do a normal POST call.
	// Using SSA might patch an already existing object,
	// violating collision protection settings.
	err := e.create(
	ctx, desiredObject, options, client.FieldOwner(e.fieldOwner))
	if errors.IsAlreadyExists(err) {
	// Might be a slow cache or an object created by a different actor
	// but excluded by the cache selector.
	return nil, NewCreateCollisionError(desiredObject, err.Error())
	}

	if err != nil {
	return nil, fmt.Errorf("creating resource: %w", err)
	}

	if err := e.migrateFieldManagersToSSA(ctx, desiredObject); err != nil {
	return nil, fmt.Errorf("migrating to SSA after create: %w", err)
	}

	return newObjectResultCreated(
	desiredObject, options), nil

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pathological case creating multiple CRDs in a single phase #491

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pathological case creating multiple CRDs in a single phase #491

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions