Skip to content

[VMOwnedVolumes][Draft] State based gaurading and failure recovery handling#4059

Open
deepakkinni wants to merge 5 commits into
kubernetes-sigs:masterfrom
deepakkinni:topic/dk016388/vmown-csi-phase3_and_4
Open

[VMOwnedVolumes][Draft] State based gaurading and failure recovery handling#4059
deepakkinni wants to merge 5 commits into
kubernetes-sigs:masterfrom
deepakkinni:topic/dk016388/vmown-csi-phase3_and_4

Conversation

@deepakkinni
Copy link
Copy Markdown
Collaborator

What this PR does / why we need it:
Phase 3 — Workflow C: Detach, CSI Side (Section 9.3)
vCenter Snapshot Tree Query (C.6) — Implement a helper that fetches mo:VirtualMachine { snapshot }, batch-fetches config.hardware.device for all snapshot MoRefs, and builds a set of diskUUIDs still referenced by at least one remaining vCenter snapshot.

Detach Reconciliation: Mark Snapshot-Retained (C.6.a) — When snapshot tree query shows the disk is still referenced, transition CVI to VM_MANAGED + vmName="" (snapshot-retained), patch PVC label to retained-by-snapshot, and remove the volume from BA status.

Detach Reconciliation: Re-Register as FCD (C.7) — When no vCenter snapshot references the disk, reconstruct CNS metadata from PVC/PV/CVI, call CnsCreateVolume to re-register the FCD, transition CVI to CSI_MANAGED, remove cvi-protection finalizer, label PVC csi-owned, and clean up BA status.

Metadata Reconstruction for Re-Registration (Section 9.4) — Implement the metadata reconstruction utility that reads PVC labels/annotations, PV name, and CSI config (ClusterID, SupervisorID) to build the CnsCreateVolume metadata payload used by both C.7 and D.6.

Revert-Induced Drop Handling (C.5/C.6) — Detect volumes with BA condition VolumeDetached=True, reason=DroppedBySnapshotRevert, ensure CVI is set to TRANSFERRING_TO_CSI if not already, and proceed through the same C.6 snapshot-tree-query and branching logic.

Phase 4 — Workflow D: Snapshot Deletion, CSI Phase (Section 10)
VMSnap Watch + CSI Finalizer Trigger (D.3) — Add a watch on VirtualMachineSnapshot CRs; when CSI observes conditions[SnapshotDeleted]=True with its csi.vsphere.vmware.com/snapshot finalizer still present, begin Phase 2 disk re-evaluation.

Per-Disk Retention Evaluation (D.4/D.5) — For each disk in VMSnap.status.disks, look up CVI by cns.vmware.com/disk-uuid label, check the remaining vCenter snapshot tree (reuse helper from task 13), and branch: still-retained (no-op), re-adopted by VM (no-op), or no snapshots remain + vmName="" (proceed to D.6).

Re-Register on Last Snapshot Deletion (D.6) — Same re-registration logic as C.7 (reuse task 15/16) but triggered from Workflow D; transition CVI from VM_MANAGED (snapshot-retained) to CSI_MANAGED, remove cvi-protection finalizer, label PVC csi-owned.

CSI Finalizer Removal from VMSnap (D.7) — After all disks in status.disks are processed, remove the CSI finalizer from the VMSnap CR to allow K8s garbage collection.

Deferred PVC Deletion After D.6 (Section 13.2.2) — After D.6 completes and the PVC has a deletionTimestamp (webhook was bypassed during retention), release the CSI volume-protection finalizer so the standard FCD delete path can proceed.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Testing done:
A PR must be marked "[WIP]", if no test result is provided. A WIP PR won't be reviewed, nor merged.
The requester can determine a sufficient test, e.g. build for a cosmetic change, E2E test in a predeployed setup, etc.
For new features, new tests should be done, in addition to regression tests.
If jtest is used to trigger precheckin tests, paste the result after jtest completes and remove [WIP] in the PR subject.
The review cycle will start, only after "[WIP]" is removed from the PR subject.

Special notes for your reviewer:

Release note:

Signed-off-by: Deepak Kinni <deepak.kinni@broadcom.com>
Signed-off-by: Deepak Kinni <deepak.kinni@broadcom.com>
…pshot Deletion, Revert Re-adoption

Signed-off-by: Deepak Kinni <deepak.kinni@broadcom.com>
…ndling

Signed-off-by: Deepak Kinni <deepak.kinni@broadcom.com>
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deepakkinni

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 31, 2026
Signed-off-by: Deepak Kinni <deepak.kinni@broadcom.com>
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants