Skip to content

Commit 4452d90

Browse files
maltesanderadwk67fhennigrazvanTechassi
authored
ADR: Conversion webhook deployment (#524)
* initial conversion webhook adr draft * Apply suggestions from code review Co-authored-by: Andrew Kenworthy <[email protected]> * use pro in pro and not in con! * add more pro cons * fixes * more fixes * fix adr number * added feedback and decision from adr meeting * set status to accepted * Update modules/contributor/pages/adr/ADR034-foundation-webhooks-crd-versioning.adoc Co-authored-by: Felix Hennig <[email protected]> * improve decision drivers * Update ADR034-foundation-webhooks-crd-versioning.adoc Added more content regarding OLM and downgrades. * Update ADR034-foundation-webhooks-crd-versioning.adoc Updated option 5 (accepted). * Update ADR034-foundation-webhooks-crd-versioning.adoc Updated conclusion * Added more notes on OLM * Rename file * Add contributers, re-order by last name * Minor text tweaks, put one sentence per line * Add minor content changes * Retrigger checks * Update modules/contributor/pages/adr/ADR034-foundation-webhooks-deployment.adoc Co-authored-by: Andrew Kenworthy <[email protected]> --------- Co-authored-by: Andrew Kenworthy <[email protected]> Co-authored-by: Felix Hennig <[email protected]> Co-authored-by: Razvan-Daniel Mihai <[email protected]> Co-authored-by: Techassi <[email protected]> Co-authored-by: Techassi <[email protected]>
1 parent c7e0537 commit 4452d90

File tree

1 file changed

+234
-0
lines changed

1 file changed

+234
-0
lines changed
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
= ADR034: Foundation for conversion webhooks deployment
2+
Doc Writer <doc[email protected]>
3+
v0.1
4+
:status: accepted
5+
:date: 2024-01-09
6+
7+
* Status: {status}
8+
* Deciders:
9+
** Sebastian Bernauer
10+
** Andrew Kenworthy
11+
** Sascha Lautenschlaeger
12+
** Razvan Mihai
13+
** Natalie Röijezon
14+
** Malte Sander
15+
* Date: {date}
16+
17+
Technical Story: https://github.com/stackabletech/issues/issues/361
18+
19+
== Context
20+
21+
We must version our CustomResourceDefinitions (CRDs).
22+
This step allows us to move away from unstable alpha or beta versions (like `v1alhpa1`) to stable versions like `v1` or `v2`.
23+
These versions provide stable interfaces which customers can rely on.
24+
Since we cannot avoid having breaking changes in the future (which require a bump in the respective CRD version), we have to supply conversion webhooks that take care of converting older versions to the current storage version.
25+
26+
Converting custom resources between versions is a separate step, independent of webhook deployments.
27+
CRD versions should be seamlessly upgraded when new operators/webhooks are upgraded. Downgrades are possible by first converting the custom resources to the old version and then downgrading the operator and webhook.
28+
29+
A conversion webhook is registered in a CRD like this:
30+
31+
[source,yaml]
32+
----
33+
spec:
34+
conversion:
35+
strategy: Webhook
36+
webhook:
37+
conversionReviewVersions: ["v1"]
38+
clientConfig:
39+
service:
40+
namespace: default
41+
name: example-conversion-webhook-server
42+
path: /crdconvert
43+
caBundle: "Ci0tLS0tQk...<base64-encoded PEM bundle>...tLS0K"
44+
----
45+
46+
This ADR is about the location of the webhook endpoint / server which the `spec.conversion.webhook.clientConfig.service` block is referencing.
47+
48+
49+
=== Use case: CRD downgrades
50+
51+
There can be multiple CRD versions for an operator. There is only one stored version and multiple served versions of the CRDs.
52+
53+
Setting:
54+
55+
* old crd version "v1"
56+
* new crd version "v2"
57+
* there is a cluster/stacklet in version "v2" running
58+
59+
Downgrade procedure:
60+
61+
* Step 1: request the cluster definition in "v1" and apply it again
62+
* Step 2: donwgrade operator and webhook deployments
63+
64+
[NOTE]
65+
====
66+
This works, because the cluster version has been downgraded before the webhook has been downgraded.
67+
This means that the webhook and the operator can be deployed in lock-step.
68+
====
69+
70+
Proposal: we could implement step 1 as a convenience in stackablectl and/or document how to perform it with kubectl or the https://github.com/kubernetes-sigs/kube-storage-version-migrator[storage migrator]
71+
72+
== Problem Statement
73+
74+
There are several options on how or where to deploy a conversion webhook, e.g. coupled closely with the operator as a controller or completely decoupled via an extra deployment.
75+
76+
We need a uniform deployment across all operators to keep implementation and maintenance to a minimum and reuse code wherever possible.
77+
Additionally, webhooks should be enabled / disabled on demand via options like Helm, operator-parameters or CRD flags.
78+
79+
Furthermore, in terms of downgrading, webhooks should always be deployed in their "latest" version, meaning they can convert all supported (new) versions.
80+
81+
== Discussion questions
82+
83+
- Do we want this to be HA?
84+
- Do we want this to be deployed in a decoupled way?
85+
- One operator per Kubernetes cluster: What if 3 operators deployed watching different namespaces / versions? Should be strongly discouraged!
86+
- How to abstract a common admission/conversion webhook skeleton in operator-rs, that can be implemented in the operators within a few lines of code (excluding the actual conversion code)?
87+
- How to keep maintenance, updating, pipelines or extra images to a minimum?
88+
- How to deactivate or not deploy the conversion webhook if not desired by customers? Or how to activate if opt-in?
89+
90+
== Decision Drivers
91+
92+
* Keep pipelining / maintenance / extra images / code to a minimum
93+
* Operator and webhook are deployed in lock-step
94+
* Must be deployable with Operator Lifecycle Manager (OLM)
95+
** OLM deploys webhooks together with operators in the same Cluster Service Version (CSV). This means, webhooks and operators are NOT independently up- or down-gradable. Also see the <<olm-notes>>.
96+
** Helm charts and OLM bundles should not diverge in functionality. This is to reduce maintenance costs.
97+
* The webhook has to keep working if the operator crashes
98+
99+
[[olm-notes]]
100+
=== OLM Notes
101+
102+
OLM is a Kubernetes operator that manages the lifecycle of other operators.
103+
It is used to install, update, and remove operators and their associated services.
104+
OLM uses a custom resource called a ClusterServiceVersion (CSV) to manage the lifecycle of an operator.
105+
A CSV is a manifest that describes the operator and its associated services.
106+
It contains metadata about the operator, such as its name, version, and supported Kubernetes versions.
107+
It also contains a list of resources that the operator manages, such as custom resource definitions (CRDs), roles, role bindings and most relevant for this ADR, webhook deployments.
108+
109+
Webhooks managed by OLM are deployed together with the operator in the same ClusterServiceVersion (CSV) but as a separate Deployment.
110+
The webhook and the operator manage the same ClusterResourceDefinitions marked as `owned` in the CSV.
111+
112+
Any CSV that contains conversion webhooks must support the `AllNamespaces` install mode.
113+
This is because webhooks are cluster-wide resources and must be installed in all namespaces.
114+
115+
The
116+
117+
- `spec.conversion.webhook.clientConfig.service.namespace` and
118+
- `spec.conversion.webhook.clientConfig.service.name`
119+
120+
fields of the CRD is a required field.
121+
For OLM, this means that the webhook must be deployed in that namespace together with the operator.
122+
This is a limitation of OLM and is not something that can be changed.
123+
124+
For more details regarding OLM constraints for webhooks, see the OpenShift Container Platform https://docs.openshift.com/container-platform/4.14/operators/operator_sdk/osdk-generating-csvs.html#olm-webhook-considerations_osdk-generating-csvs[documentation].
125+
126+
== Considered Options
127+
128+
[[option1]]
129+
=== Option 1: Deploy within the Operator as Controller
130+
131+
The operator contains another controller in a separate thread with the webhook server and conversion code.
132+
133+
==== Pros
134+
135+
- No extra bin / main file
136+
- No extra docker image (Openshift certification)
137+
- No extra pipelines for the build process
138+
- Always up to date with the operator, no extra versioning
139+
140+
==== Cons
141+
142+
- Downgrade not possible -> older operators may not know new storage versions
143+
- Operator crash affects webhook, no custom resources can be applied for that time
144+
-> prevents writes and reads only current versions works
145+
- Updating webhook requires updating the whole operator
146+
- (OpenShift restrictions? Restricted namespaces etc.?)
147+
148+
[[option2]]
149+
=== Option 2: Deploy within the Operator as Extra Container with Operator Image
150+
151+
The operator deployment contains another container next to the actual operator containing the webhook server and conversion code using the operator docker image.
152+
153+
==== Pros
154+
155+
- No extra pipelines for the build process
156+
- Could be enabled / disabled using Helm parameters
157+
- Operator crash does not affect webhook
158+
- Always up to date with the operator, no extra versioning
159+
160+
==== Cons
161+
162+
- Downgrade not possible -> older operators may not know new storage versions
163+
- Overhead due to operator image (not just the lightweight webhook server)
164+
- Updating webhook requires updating the whole operator
165+
- (Extra bin / main file)
166+
- (OpenShift restrictions? Restricted namespaces etc.?)
167+
168+
[[option3]]
169+
=== Option 3: Deploy within the Operator as Extra Container and Extra Image
170+
171+
The operator deployment contains another container next to the actual operator containing the webhook server and conversion code using its own docker image.
172+
173+
==== Pros
174+
175+
- No overhead due to operator image (just the lightweight webhook server)
176+
- Operator crash does not affect webhook
177+
- Could be enabled / disabled using Helm parameters
178+
- Always up to date with the operator, no extra versioning
179+
180+
==== Cons
181+
182+
- Downgrade not possible -> older operators may not know new storage versions
183+
- Updating webhook requires updating the whole operator
184+
- Extra pipelines / images for the build process
185+
- (OpenShift restrictions? Restricted namespaces etc.?)
186+
187+
[[option4]]
188+
=== Option 4: The Operator creates a Webhook Deployment
189+
190+
The operator deploys a webhook Deployment similar to how it deploys e.g. StatefulSets.
191+
192+
==== Pros
193+
194+
- Operator crash does not affect webhook
195+
- Could be enabled / disabled via custom resource
196+
- Always up to date with the operator, no extra versioning
197+
- Should not interfere with OpenShift
198+
199+
==== Cons
200+
201+
- Downgrade not possible -> older operators may not know new storage versions
202+
- Updating webhook requires updating the whole operator (bundle)
203+
- Possibly extra image
204+
- Possibly extra pipelines
205+
- Possibly more complex to test
206+
207+
[[option5]]
208+
=== Option 5: The Webhook has its own Deployment
209+
210+
The webhook and the operator are deployed in lock-step, each in it's own Deployment.
211+
Both deployments are part of the same Helm Chart, OLM CSV, etc.
212+
The webhook high-availability is achieved with multiple Deployment replicas.
213+
Both are bundled in the same container image.
214+
215+
==== Pros
216+
217+
- Operator crash does not affect webhook
218+
- Downgrade possible -> can adept to new CRD storage versions
219+
- Could be enabled / disabled Helm parameters
220+
- The webhook can be updated independently
221+
- No extra pipelines / images
222+
223+
==== Cons
224+
225+
- In OLM environments, if the operator fails to deploy, the webhook is also not deployed.
226+
227+
== Decision Outcome
228+
229+
Chosen <<option5>>, because it fits on all decision drivers.
230+
231+
== Links
232+
233+
- ADR https://docs.stackable.tech/home/nightly/contributor/adr/adr034-foundation-webhooks-ca-bundle.adoc[CA bundle injection]
234+
- https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/[Kubernetes CRD versioning]

0 commit comments

Comments
 (0)