Skip to content

Commit d449ca0

Browse files
design for enabling existing resource policy
add design doc for existing resource policy Signed-off-by: Shubham Pampattiwar <[email protected]> add use-cases and update non-goals Signed-off-by: Shubham Pampattiwar <[email protected]> update approach-1 and add policy-action table Signed-off-by: Shubham Pampattiwar <[email protected]> minor updates Signed-off-by: Shubham Pampattiwar <[email protected]> fix typos Signed-off-by: Shubham Pampattiwar <[email protected]> add CLI details Signed-off-by: Shubham Pampattiwar <[email protected]> dump updateAll option Signed-off-by: Shubham Pampattiwar <[email protected]> add implementation decision Signed-off-by: Shubham Pampattiwar <[email protected]>
1 parent 574baeb commit d449ca0

File tree

1 file changed

+262
-0
lines changed

1 file changed

+262
-0
lines changed
Lines changed: 262 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,262 @@
1+
# Add support for `ExistingResourcePolicy` to restore API
2+
## Abstract
3+
Velero currently does not support any restore policy on kubernetes resources that are already present in-cluster. Velero skips over the restore of the resource if it already exists in the namespace/cluster irrespective of whether the resource present in the restore is the same or different from the one present on the cluster. It is desired that Velero gives the option to the user to decide whether or not the resource in backup should overwrite the one present in the cluster.
4+
5+
## Background
6+
As of Today, Velero will skip over the restoration of resources that already exist in the cluster. The current workflow followed by Velero is (Using a `service` that is backed up for example):
7+
- Velero tries to attempt restore of the `service`
8+
- Fetches the `service` from the cluster
9+
- If the `service` exists then:
10+
- Checks whether the `service` instance in the cluster is equal to the `service` instance present in backup
11+
- If not equal then skips the restore of the `service` and adds a restore warning (except for [ServiceAccount objects](https://github.com/vmware-tanzu/velero/blob/574baeb3c920f97b47985ec3957debdc70bcd5f8/pkg/restore/restore.go#L1246))
12+
- If equal then skips the restore of the `service` and mentions that the restore of resource `service` is skipped in logs
13+
14+
It is desired to add the functionality to specify whether or not to overwrite the instance of resource `service` in cluster with the one present in backup during the restore process.
15+
16+
Related issue: https://github.com/vmware-tanzu/velero/issues/4066
17+
18+
## Goals
19+
- Add support for `ExistingResourcePolicy` to restore API for Kubernetes resources.
20+
21+
## Non Goals
22+
- Change existing restore workflow for `ServiceAccount` objects
23+
- Add support for `ExistingResourcePolicy` as `recreate` for Kubernetes resources. (Future scope feature)
24+
25+
## Unrelated Proposals (Completely different functionalities than the one proposed in the design)
26+
- Add support for `ExistingResourcePolicy` to restore API for Non-Kubernetes resources.
27+
- Add support for `ExistingResourcePolicy` to restore API for `PersistentVolume` data.
28+
29+
### Use-cases/Scenarios
30+
31+
### A. Production Cluster - Backup Cluster:
32+
Let's say you have a Backup Cluster which is identical to the Production Cluster. After some operations/usage/time the Production Cluster had changed itself, there might be new deployments, some secrets might have been updated. Now, this means that the Backup cluster will no longer be identical to the Production Cluster. In order to keep the Backup Cluster up to date/identical to the Production Cluster with respect to Kubernetes resources except PV data we would like to use Velero for scheduling new backups which would in turn help us update the Backup Cluster via Velero restore.
33+
34+
Reference: https://github.com/vmware-tanzu/velero/issues/4066#issuecomment-954320686
35+
36+
### B. Help identify resource delta:
37+
Here delta resources mean the resources restored by a previous backup, but they are no longer in the latest backup. Let's follow a sequence of steps to understand this scenario:
38+
- Consider there are 2 clusters, Cluster A, which has 3 resources - P1, P2 and P3.
39+
- Create a Backup1 from Cluster A which has P1, P2 and P3.
40+
- Perform restore on a new Cluster B using Backup1.
41+
- Now, Lets say in Cluster A resource P1 gets deleted and resource P2 gets updated.
42+
- Create a new Backup2 with the new state of Cluster A, keep in mind Backup1 has P1, P2 and P3 while Backup2 has P2' and P3.
43+
- So the Delta here is (|Cluster B - Backup2|), Delete P1 and Update P2.
44+
- During Restore time we would want the Restore to help us identify this resource delta.
45+
46+
Reference: https://github.com/vmware-tanzu/velero/pull/4613#issuecomment-1027260446
47+
48+
## High-Level Design
49+
### Approach 1: Add a new spec field `existingResourcePolicy` to the Restore API
50+
In this approach we do *not* change existing velero behavior. If the resource to restore in cluster is equal to the one backed up then do nothing following current Velero behavior. For resources that already exist in the cluster that are not equal to the resource in the backup (other than Service Accounts). We add a new optional spec field `existingResourcePolicy` which can have the following values:
51+
1. `none`: This is the existing behavior, if Velero encounters a resource that already exists in the cluster, we simply
52+
skip restoration.
53+
2. `update`: This option would provide the following behavior.
54+
- Unchanged resources: Velero would update the backup/restore labels on the unchanged resources, if labels patch fails Velero adds a restore error.
55+
- Changed resources: Velero will first try to patch the changed resource, Now if the patch:
56+
- succeeds: Then the in-cluster resource gets updated with the labels as well as the resource diff
57+
- fails: Velero adds a restore warning and tries to just update the backup/restore labels on the resource, if the labels patch also fails then we add restore error.
58+
3. `recreate`: If resource already exists, then Velero will delete it and recreate the resource.
59+
60+
*Note:* The `recreate` option is a non-goal for this enhancement proposal, but it is considered as a future scope.
61+
Another thing to highlight is that Velero will not be deleting any resources in any of the policy options proposed in
62+
this design but Velero will patch the resources in `update` policy option.
63+
64+
Example:
65+
A. The following Restore will execute the `existingResourcePolicy` restore type `none` for the `services` and `deployments` present in the `velero-protection` namespace.
66+
67+
```
68+
Kind: Restore
69+
70+
71+
72+
includeNamespaces: velero-protection
73+
includeResources:
74+
- services
75+
- deployments
76+
existingResourcePolicy: none
77+
78+
```
79+
80+
B. The following Restore will execute the `existingResourcePolicy` restore type `update` for the `secrets` and `daemonsets` present in the `gdpr-application` namespace.
81+
```
82+
Kind: Restore
83+
84+
85+
includeNamespaces: gdpr-application
86+
includeResources:
87+
- secrets
88+
- daemonsets
89+
existingResourcePolicy: update
90+
```
91+
92+
### Approach 2: Add a new spec field `existingResourcePolicyConfig` to the Restore API
93+
In this approach we give user the ability to specify which resources are to be included for a particular kind of force update behaviour, essentially a more granular approach where in the user is able to specify a resource:behaviour mapping. It would look like:
94+
`existingResourcePolicyConfig`:
95+
- `patch:`
96+
- `includedResources:` [ ]string
97+
- `recreate:`
98+
- `includedResources:` [ ]string
99+
100+
*Note:*
101+
- There is no `none` behaviour in this approach as that would conform to the current/default Velero restore behaviour.
102+
- The `recreate` option is a non-goal for this enhancement proposal, but it is considered as a future scope.
103+
104+
105+
Example:
106+
A. The following Restore will execute the restore type `patch` and apply the `existingResourcePolicyConfig` for `secrets` and `daemonsets` present in the `inventory-app` namespace.
107+
```
108+
Kind: Restore
109+
110+
includeNamespaces: inventory-app
111+
existingResourcePolicyConfig:
112+
patch:
113+
includedResources
114+
- secrets
115+
- daemonsets
116+
117+
```
118+
119+
120+
### Approach 3: Combination of Approach 1 and Approach 2
121+
122+
Now, this approach is somewhat a combination of the aforementioned approaches. Here we propose addition of two spec fields to the Restore API - `existingResourceDefaultPolicy` and `existingResourcePolicyOverrides`. As the names suggest ,the idea being that `existingResourceDefaultPolicy` would describe the default velero behaviour for this restore and `existingResourcePolicyOverrides` would override the default policy explicitly for some resources.
123+
124+
Example:
125+
A. The following Restore will execute the restore type `patch` as the `existingResourceDefaultPolicy` but will override the default policy for `secrets` using the `existingResourcePolicyOverrides` spec as `none`.
126+
```
127+
Kind: Restore
128+
129+
includeNamespaces: inventory-app
130+
existingResourceDefaultPolicy: patch
131+
existingResourcePolicyOverrides:
132+
none:
133+
includedResources
134+
- secrets
135+
136+
```
137+
138+
## Detailed Design
139+
### Approach 1: Add a new spec field `existingResourcePolicy` to the Restore API
140+
The `existingResourcePolicy` spec field will be an `PolicyType` type field.
141+
142+
Restore API:
143+
```
144+
type RestoreSpec struct {
145+
.
146+
.
147+
.
148+
// ExistingResourcePolicy specifies the restore behaviour for the kubernetes resource to be restored
149+
// +optional
150+
ExistingResourcePolicy PolicyType
151+
152+
}
153+
```
154+
PolicyType:
155+
```
156+
type PolicyType string
157+
const PolicyTypeNone PolicyType = "none"
158+
const PolicyTypePatch PolicyType = "update"
159+
```
160+
161+
### Approach 2: Add a new spec field `existingResourcePolicyConfig` to the Restore API
162+
The `existingResourcePolicyConfig` will be a spec of type `PolicyConfiguration` which gets added to the Restore API.
163+
164+
Restore API:
165+
```
166+
type RestoreSpec struct {
167+
.
168+
.
169+
.
170+
// ExistingResourcePolicyConfig specifies the restore behaviour for a particular/list of kubernetes resource(s) to be restored
171+
// +optional
172+
ExistingResourcePolicyConfig []PolicyConfiguration
173+
174+
}
175+
```
176+
177+
PolicyConfiguration:
178+
```
179+
type PolicyConfiguration struct {
180+
181+
PolicyTypeMapping map[PolicyType]ResourceList
182+
183+
}
184+
```
185+
186+
PolicyType:
187+
```
188+
type PolicyType string
189+
const PolicyTypePatch PolicyType = "patch"
190+
const PolicyTypeRecreate PolicyType = "recreate"
191+
```
192+
193+
ResourceList:
194+
```
195+
type ResourceList struct {
196+
IncludedResources []string
197+
}
198+
```
199+
200+
### Approach 3: Combination of Approach 1 and Approach 2
201+
202+
Restore API:
203+
```
204+
type RestoreSpec struct {
205+
.
206+
.
207+
.
208+
// ExistingResourceDefaultPolicy specifies the default restore behaviour for the kubernetes resource to be restored
209+
// +optional
210+
existingResourceDefaultPolicy PolicyType
211+
212+
// ExistingResourcePolicyOverrides specifies the restore behaviour for a particular/list of kubernetes resource(s) to be restored
213+
// +optional
214+
existingResourcePolicyOverrides []PolicyConfiguration
215+
216+
}
217+
```
218+
219+
PolicyType:
220+
```
221+
type PolicyType string
222+
const PolicyTypeNone PolicyType = "none"
223+
const PolicyTypePatch PolicyType = "patch"
224+
const PolicyTypeRecreate PolicyType = "recreate"
225+
```
226+
PolicyConfiguration:
227+
```
228+
type PolicyConfiguration struct {
229+
230+
PolicyTypeMapping map[PolicyType]ResourceList
231+
232+
}
233+
```
234+
ResourceList:
235+
```
236+
type ResourceList struct {
237+
IncludedResources []string
238+
}
239+
```
240+
241+
The restore workflow changes will be done [here](https://github.com/vmware-tanzu/velero/blob/b40bbda2d62af2f35d1406b9af4d387d4b396839/pkg/restore/restore.go#L1245)
242+
243+
### CLI changes for Approach 1
244+
We would introduce a new CLI flag called `existing-resource-policy` of string type. This flag would be used to accept the
245+
policy from the user. The velero restore command would look somewhat like this:
246+
```
247+
velero create restore <restore_name> --existing-resource-policy=update
248+
```
249+
250+
Help message `Restore Policy to be used during the restore workflow, can be - none, update`
251+
252+
The CLI changes will go at `pkg/cmd/cli/restore/create.go`
253+
254+
We would also add a validation which checks for invalid policy values provided to this flag.
255+
256+
Restore describer will also be updated to reflect the policy `pkg/cmd/util/output/restore_describer.go`
257+
258+
### Implementation Decision
259+
We have decided to go ahead with the implementation of Approach 1 as:
260+
- It is easier to implement
261+
- It is also easier to scale and leaves room for improvement and the door open to expanding to approach 3
262+
- It also provides an option to preserve the existing velero restore workflow

0 commit comments

Comments
 (0)