Skip to content

Fix finalizer removal requeue #1303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jsafrane
Copy link
Contributor

What type of PR is this?
/kind bug

What this PR does / why we need it:
The first commit here fixes the linked bug.

Let syncContent() continue cleaning up VolumeSnapshotContent (i.e. remove its finalizers) after its snapshot has been deleted from the storage backend.

It will requeue on all errors.

Calling ctrl.updateContentInInformerCache() directly from deleteCSISnapshotOperation() does not requeue the snapshot content on error.

The second commit should be just refactoring, updateContentInInformerCache is not really needed now and it has a misleading name.

Which issue(s) this PR fixes:
Fixes #1301

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Fixed removal of VolumeSnapshotContent finalizers when the API server or network has a hiccup at a wrong time.

jsafrane added 2 commits May 22, 2025 12:44
Let syncContent() continue cleaning up VolumeSnapshotContent
(i.e. remove its finalizers) after its snapshot has been deleted from the
storage backend.

It will requeue on all errors.

Calling ctrl.updateContentInInformerCache() directly from
deleteCSISnapshotOperation() does not requeue the snapshot content on
error.
The name suggests it only updates the cache, but it also resyncs the
object. There is just only one caller, so remove the function completely.
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 22, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 22, 2025
@jsafrane
Copy link
Contributor Author

jsafrane commented May 22, 2025

Again, I can't reproduce the API server hiccup, it requires precise timing. With something like jsafrane@3da05bb, I am able to see:

The snapshot deletion works + the external-snapshotter has cleaned VolumeSnapshotContent status:

I0522 10:40:57.234787       1 snapshot_controller_base.go:251] syncContentByKey[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]
I0522 10:40:57.234921       1 util.go:274] storeObjectUpdate updating content "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f" with version 27663
I0522 10:40:57.234967       1 snapshot_controller.go:60] synchronizing VolumeSnapshotContent[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]
I0522 10:40:57.234973       1 snapshot_controller.go:630] Check if VolumeSnapshotContent[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f] should be deleted.
I0522 10:40:57.234978       1 snapshot_controller.go:63] VolumeSnapshotContent[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]: the policy is Delete
I0522 10:40:57.234985       1 snapshot_controller.go:114] Deleting snapshot for content: snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f
I0522 10:40:57.234988       1 snapshot_controller.go:402] deleteCSISnapshotOperation [snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f] started
I0522 10:40:57.235663       1 connection.go:264] "GRPC call" method="/csi.v1.Controller/DeleteSnapshot" request="{\"snapshot_id\":\"389bb379-36f9-11f0-9fd5-0a580a80009e\"}"
I0522 10:40:57.236553       1 connection.go:270] "GRPC response" response="{}" err=null
I0522 10:40:57.236576       1 snapshot_controller.go:431] cleanVolumeSnapshotStatus content [snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]

Finalizer removal failed. With this PR, I can see the snapshotter re-tries periodically:

E0522 10:40:57.243441       1 snapshot_controller_base.go:279] could not sync content "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f": snapshot controller failed to update snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f on API server: mock finalizer removal error
I0522 10:40:57.243469       1 snapshot_controller_base.go:232] Failed to sync content "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f", will retry again: snapshot controller failed to update snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f on API server: mock finalizer removal error

I0522 10:40:58.244408       1 snapshot_controller_base.go:251] syncContentByKey[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]
I0522 10:40:58.244468       1 util.go:274] storeObjectUpdate updating content "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f" with version 27665
I0522 10:40:58.244484       1 snapshot_controller.go:60] synchronizing VolumeSnapshotContent[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]
I0522 10:40:58.244491       1 snapshot_controller.go:630] Check if VolumeSnapshotContent[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f] should be deleted.
I0522 10:40:58.244506       1 snapshot_controller.go:63] VolumeSnapshotContent[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]: the policy is Delete
E0522 10:40:58.244578       1 snapshot_controller_base.go:279] could not sync content "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f": snapshot controller failed to update snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f on API server: mock finalizer removal error
I0522 10:40:58.244597       1 snapshot_controller_base.go:232] Failed to sync content "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f", will retry again: snapshot controller failed to update snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f on API server: mock finalizer removal error

I0522 10:41:00.245628       1 snapshot_controller_base.go:251] syncContentByKey[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]
I0522 10:41:00.245662       1 util.go:274] storeObjectUpdate updating content "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f" with version 27665
I0522 10:41:00.245671       1 snapshot_controller.go:60] synchronizing VolumeSnapshotContent[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]
I0522 10:41:00.245675       1 snapshot_controller.go:630] Check if VolumeSnapshotContent[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f] should be deleted.
I0522 10:41:00.245680       1 snapshot_controller.go:63] VolumeSnapshotContent[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]: the policy is Delete
E0522 10:41:00.245712       1 snapshot_controller_base.go:279] could not sync content "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f": snapshot controller failed to update snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f on API server: mock finalizer removal error
I0522 10:41:00.245719       1 snapshot_controller_base.go:232] Failed to sync content "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f", will retry again: snapshot controller failed to update snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f on API server: mock finalizer removal error

...

And eventually it succeeds:

I0522 10:43:04.249246       1 snapshot_controller_base.go:251] syncContentByKey[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]
I0522 10:43:04.249357       1 util.go:274] storeObjectUpdate updating content "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f" with version 27665
I0522 10:43:04.249374       1 snapshot_controller.go:60] synchronizing VolumeSnapshotContent[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]
I0522 10:43:04.249395       1 snapshot_controller.go:630] Check if VolumeSnapshotContent[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f] should be deleted.
I0522 10:43:04.249405       1 snapshot_controller.go:63] VolumeSnapshotContent[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]: the policy is Delete
I0522 10:43:04.254431       1 snapshot_controller.go:619] Removed protection finalizer from volume snapshot content snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f
I0522 10:43:04.254450       1 util.go:274] storeObjectUpdate updating content "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f" with version 27665
I0522 10:43:04.254665       1 snapshot_controller_base.go:210] enqueued "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f" for sync
I0522 10:43:04.254699       1 snapshot_controller_base.go:251] syncContentByKey[snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f]
I0522 10:43:04.254708       1 snapshot_controller_base.go:361] content "snapcontent-c6d59ccb-5e3e-4c86-8110-cebd567b951f" deleted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

external-snapshotter does not retry removal of VolumeSnapshotContent finalizers
2 participants