-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add design for enabling support for ExistingResourcePolicy
to restore API
#4613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add design for enabling support for ExistingResourcePolicy
to restore API
#4613
Conversation
This relates to #4066 |
@shubham-pampattiwar I want to include this scenario in your design We want to consider here delta resources ( resources restored by a previous backup but no longer in the latest backup ). This is the scenario The scenario :
Result: Cluster B is in an inconsistent state ; Policy1 should no longer be running on this cluster.
Proposed solution : When restore runs and finds a resource in backup which is identical with the one one the hub, it should still update the backup annotation to : velero.io/backup-name=backup2 And if we can identify restored delta, we can remove them after restore is completed. That can be done in a separate enh |
@birsanv So it sounds like for this use case we don't want the restore to delete things that aren't in the backup -- we just want the restore to update annotations on items that would have been patched but aren't because they haven't changed -- am I understanding that correctly? As for where it fits in the design, assuming the above is correct, I would think we wouldn't do this in the "none" option (i.e. preserving legacy behavior -- also it doesn't make sense to update the annotation on unchanged resources when we're simply warning on changed resources. Also, we may not want to always do this when updating, as it's extra work which may not matter for some use cases. As such, I'd propose adding a new option to the policy list:
We should also call out that any removal of dangling resources from old backups is out-of-scope for velero -- this would be done (possibly manually) by the user after restore using a delete operation with label selector or something similar. |
@sseago your proposal makes sense |
3c75638
to
de5a188
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall -- minor typo fix identified by Codespell github action, though.
42bca0e
to
e730f5b
Compare
skip restoration. | ||
2. `update` or `merge` or `patch`: In this behaviour Velero will try an attempt to patch the resource with the backed up version, | ||
if the patch fails log it as a warning and continue the restore process, user may wish specify the merge type. | ||
3. `updateAll`: Similar to `update` option but Velero will update labels on unchanged resources too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts?
3. `updateAll`: Similar to `update` option but Velero will update labels on unchanged resources too. | |
3. `updateOverrideLabels`: Similar to `update` option but Velero will update labels on unchanged resources too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3. `updateAll`: Similar to `update` option but Velero will update labels on unchanged resources too. | |
3. `updateForceUnchangedLabels`: Similar to `update` option but Velero will update labels on unchanged resources too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not so much "unchanged" labels we're forcing, it's the labels that velero adds on restore -- these labels aren't in the backup. updateForceVeleroLabels
is more accurate, but I think it's a bit clumsy as a config option since it's so long. Perhaps just updateVeleroLabels
-- that's less clumsy, but it might confuse users into thinking we're only updating velero labels. I'm thinking that updateAll
may be best, as long as the documentation is clear on what this means: "Update all resources, including unchanged ones, in order to ensure that the velero labels related to the backup and restore are appropriately updated."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When was the last time verbosity caused problems? It accelerated java adoption. You'd most likely type it once in Json/yaml or velero cli autocompletes or also have an accompanying shorthand flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to cater flags to "I haven't read the doc" folks more than the "I type this out everyday so it has to be short".. since support escalations come from the former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updateIncludingVeleroLabelsForAll
would probably be the most accurate. Or perhaps updateAndForceVeleroLabels
. I'd still vote for updateAll
, but if we want something longer, one of these would be clearer than the others I think.
Reference: https://github.com/vmware-tanzu/velero/pull/4613#issuecomment-1027260446 | ||
|
||
## High-Level Design | ||
### Approach 1: Add a new spec field `existingResourcePolicy` to the Restore API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about call this "ResourceOverwritePolicy"? This name indicates what we want to do with this policy, that is, to decide how to overwrite (existing) resource
existingResourcePolicy: update | ||
``` | ||
|
||
### Approach 2: Add a new spec field `existingResourcePolicyConfig` to the Restore API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using the name "ResourceOverwriteExceptions" instead, same reason, to make the indication more clear (they are the exceptions of ResourceOverwritePolicy)
## Non Goals | ||
- Add support for `ExistingResourcePolicy` to restore API for Non-Kubernetes resources. | ||
- Add support for `ExistingResourcePolicy` to restore API for `PersistentVolume` data. | ||
- Change existing restore workflow for `ServiceAccount` objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a plan to support the above "Non Goal" objects, if not, it is important to write the scope of the policy into Doc in a clear way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether it makes sense to distinguish "non-goals" that are desired bit out-of-scope for now and "non-goals" that are considered not part of this at all (i.e. they're unrelated things that would be dealt with in unrelated proposals_. For non-kube and PV data, I'd put them in the category of "this is completely different functionality". For service accounts, it's similar -- basically service accounts are a separate use case that is hard-coded with type-specific functionality, so we completely ignore them so as not to break that type-specific outcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lyndon-Li @sseago I will add non-kube and PV data part in a new section titled as Unrelated Proposals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated Proposals section added to the design document https://github.com/vmware-tanzu/velero/pull/4613/files#diff-472925e9329f9cf29ef5d1c15a734e96587e5e975d67b7b7fec53936a280c28dR25
| `none` | Existing Velero behavior. | `info` log for unchanged resource, `warn` on changed resource | | ||
| `update` | Velero tries to attempt a `patch` on changed resource | `info` log for unchanged, `warn` if patch failed | | ||
| `updateAll` | Velero updates labels on unchanged resource and attempts a `patch` on changed resource | `info` log for unchanged and `warn` if patch failed | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting a resource is very critical to users, so let's consider below case:
A resource is indeed in the current restore, but "update" or "updateAll" failed to update the new backup version; As our suggestion, users still get the to-be-deleted resources by comparing the version label
Suggestion: Actually, during the restore, we are able to detect the the to-be-deleted accurately, so we can print them in the log centrally, still as a reference, it is on users decide to delete these resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we are able to detect the "to-be-deleted" during restore, since we won't touch them at all. We're only processing resources that are in the current backup that's being restored. The previously-restored resources that are not in the current backup become "to-be-deleted" by default. These resources don't actually get considered/processed by the restore operation at all, so we are not in a position to log anything about them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the "patch labels" operation something that is likely to fail? I know for modified resources, failure to patch is a real possibility, if some immutable field is different in the backup vs. the cluster, but if all we're patching are the velero labels, is failure here likely? If it's really unlikely, should "unchanged resource only patching label, failed patch" be considered a restore error rather than a restore warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as my understanding, we want users to decide which resource is out-of-date thus should be deleted. So how do users tell this? - users compare the version labels we put along with the resource during restore. Eventually, it is our label that causes deletion of a resource. If my above suggestion doesn't work, we should find a way to cope with this problem.
For "patch labels", personally, I think we should consider it case by case, if the result of the patching is critical, we definitely need to fail the restore; otherwise, we can let it go with a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for unchanged resources, if we're patching labels for the purposes of allowing users to delete things (i.e. the "updateAll" functionality), we may want to make it an error since, as you point out, otherwise the unlabeled resource will look like it's out-of-date and ready to be deleted. For modified resources, it's trickier. If it's not too much trouble in the implementation, I'd propose the following:
For modified resources, if the full patch fails and the policy is "update", just warn and move on.
For modified resources, if the full patch fails and the policy is "updateAll", we have a couple options:
- error on path fail just as for unmodified resources
- attempt to patch just the label and if this succeeds, add the usual "couldn't modify resource" warning, and if label patching fails, then add the "label update failed" error and the "couldn't update resource" warning.
skip restoration. | ||
2. `update` or `merge` or `patch`: In this behaviour Velero will try an attempt to patch the resource with the backed up version, | ||
if the patch fails log it as a warning and continue the restore process, user may wish specify the merge type. | ||
3. `updateAll`: Similar to `update` option but Velero will update labels on unchanged resources too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment makes sense:
#4613 (comment)
Therefore, why not just make it default behavior of update/patch
option? i.e. if user chooses this option the annotation velero.io/backup-name
is always updated? Are there benefits for NOT updating it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only real benefit of not updating it would be a (probably minor) performance increase. I wouldn't be completely opposed to just making the "update" policy behave like "updateAll" described above, simplifying the configuration options a bit.
``` | ||
|
||
## Detailed Design | ||
### Approach 1: Add a new spec field `existingResourcePolicy` to the Restore API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your preference in terms of approach 1 vs 2 vs 3?
Approach 1 is probably good enough? Are we good not to make it any more complicated?
Wanna point out it's not flexible for introducing more fine-grained control, but I'm totally fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is that approach 1 is probably be good enough for a first release implementation. It leaves the door open to expanding in the future to approach 3. So for now, we just make one policy for the whole restore. For the future, we could add the policy override by resource type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reasonerjt @sseago I vote for approach 1 as well. Easy to implement and scale.
- `patch:` | ||
- `includedResources:` [ ]string | ||
- `recreate:` [ ]string | ||
- `includedResources:` [ ]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the values of includeResources
of different actions have overlap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to be explicit in documenting this, but I think "first match" is good enough. Ideally there would be no overlap here, but if there is, take the first one in the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or error out so user fix it in the next run?
A more general question, have we discussed whether this option should be exposed via CLI? I think the answer is yes, and this design should add one more section to describe how user should set the policy via CLI and what change in the code should be made? |
Regarding the CLI, I think "approach 1" should be relatively easy to add to the CLI -- and yes, that should probably be added to the design. I think the more complex configuration of approach 3 (the resource-specific overrides) may be OK to leave out of the cli. If the user needs complex policy like that, they need to create the CR directly, but if a single per-restore policy is sufficient, the CLI should have a flag for that. |
e730f5b
to
06fee54
Compare
I would vote for approach 1 and my only concern is the confusing difference between |
@reasonerjt Lets say we dump the
Does the above sound reasonable ? |
Added details for CLI changes https://github.com/vmware-tanzu/velero/pull/4613/files#diff-472925e9329f9cf29ef5d1c15a734e96587e5e975d67b7b7fec53936a280c28dR251 |
@shubham-pampattiwar Yes, the proposed change to update/updateAll seems reasonable to me. |
Codecov Report
@@ Coverage Diff @@
## main #4613 +/- ##
=======================================
Coverage ? 41.44%
=======================================
Files ? 204
Lines ? 18094
Branches ? 0
=======================================
Hits ? 7499
Misses ? 10046
Partials ? 549 Continue to review full report at Codecov.
|
06fee54
to
3137ad6
Compare
I have updated the design proposal with merging the |
add design doc for existing resource policy Signed-off-by: Shubham Pampattiwar <[email protected]> add use-cases and update non-goals Signed-off-by: Shubham Pampattiwar <[email protected]> update approach-1 and add policy-action table Signed-off-by: Shubham Pampattiwar <[email protected]> minor updates Signed-off-by: Shubham Pampattiwar <[email protected]> fix typos Signed-off-by: Shubham Pampattiwar <[email protected]> add CLI details Signed-off-by: Shubham Pampattiwar <[email protected]> dump updateAll option Signed-off-by: Shubham Pampattiwar <[email protected]> add implementation decision Signed-off-by: Shubham Pampattiwar <[email protected]>
3137ad6
to
d449ca0
Compare
@reasonerjt @sseago I have updated the PR with |
/kind changelog-not-required |
Shubham Pampattiwar has made several contributions to Velero, most recently in designing and implementing two v1.9 features, including the following: - [Add design for enabling multiple label support[(vmware-tanzu#4619) - [Add multiple label selector support to Velero Backup and Restore APIs](vmware-tanzu#4650) - [Add design for enabling support for ExistingResourcePolicy to restore API](vmware-tanzu#4613) - [Add existingResourcePolicy to Restore API](vmware-tanzu#4628) Signed-off-by: Scott Seago <[email protected]>
Shubham Pampattiwar has made several contributions to Velero, most recently in designing and implementing two v1.9 features, including the following: - [Add design for enabling multiple label support](vmware-tanzu#4619) - [Add multiple label selector support to Velero Backup and Restore APIs](vmware-tanzu#4650) - [Add design for enabling support for ExistingResourcePolicy to restore API](vmware-tanzu#4613) - [Add existingResourcePolicy to Restore API](vmware-tanzu#4628) Signed-off-by: Scott Seago <[email protected]>
Shubham Pampattiwar has made several contributions to Velero, most recently in designing and implementing two v1.9 features, including the following: - [Add design for enabling multiple label support](vmware-tanzu#4619) - [Add multiple label selector support to Velero Backup and Restore APIs](vmware-tanzu#4650) - [Add design for enabling support for ExistingResourcePolicy to restore API](vmware-tanzu#4613) - [Add existingResourcePolicy to Restore API](vmware-tanzu#4628) Shubham has also been driving forward the data mover requirements and design discussions for velero 1.10: - [Add datamover design](vmware-tanzu#4768) Signed-off-by: Scott Seago <[email protected]>
Shubham Pampattiwar has made several contributions to Velero, most recently in designing and implementing two v1.9 features, including the following: - [Add design for enabling multiple label support](vmware-tanzu#4619) - [Add multiple label selector support to Velero Backup and Restore APIs](vmware-tanzu#4650) - [Add design for enabling support for ExistingResourcePolicy to restore API](vmware-tanzu#4613) - [Add existingResourcePolicy to Restore API](vmware-tanzu#4628) Shubham has also been driving forward the data mover requirements and design discussions for velero 1.10: - [Add datamover design](vmware-tanzu#4768) Signed-off-by: Scott Seago <[email protected]>
Shubham Pampattiwar has made several contributions to Velero, most recently in designing and implementing two v1.9 features, including the following: - [Add design for enabling multiple label support](vmware-tanzu#4619) - [Add multiple label selector support to Velero Backup and Restore APIs](vmware-tanzu#4650) - [Add design for enabling support for ExistingResourcePolicy to restore API](vmware-tanzu#4613) - [Add existingResourcePolicy to Restore API](vmware-tanzu#4628) Shubham has also been driving forward the data mover requirements and design discussions for velero 1.10: - [Add datamover design](vmware-tanzu#4768) Signed-off-by: Scott Seago <[email protected]>
This PR adds an initial design document to add support for
ExistingResourcePolicy
to Velero's restore API.Fixes : #4842