-
Notifications
You must be signed in to change notification settings - Fork 143
Enable Rollback Support of VAC with Infeasible Errors #487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Enable Rollback Support of VAC with Infeasible Errors #487
Conversation
0a9dc85
to
20411cf
Compare
20411cf
to
5cc5fe8
Compare
/lgtm |
5cc5fe8
to
8b9a874
Compare
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sunnylovestiramisu The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
fffb062
to
feae994
Compare
@@ -67,6 +67,16 @@ func (ctrl *modifyController) modify(pvc *v1.PersistentVolumeClaim, pv *v1.Persi | |||
} | |||
} | |||
|
|||
// Rollback infeasible errors for recovery | |||
if pvc.Status.ModifyVolumeStatus != nil && pvc.Status.ModifyVolumeStatus.Status == v1.PersistentVolumeClaimModifyVolumeInfeasible { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I personally like large conditions like these to be behind functions which I can readily understand when reading this code again. I made a refactor of this - gnufied@fa37fc1 would you consider pulling this in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Let me update the PR and ping you again when tests pass etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in this commit: ac09a30
// Rollback infeasible errors for recovery | ||
if pvc.Status.ModifyVolumeStatus != nil && pvc.Status.ModifyVolumeStatus.Status == v1.PersistentVolumeClaimModifyVolumeInfeasible { | ||
targetVacName := pvc.Status.ModifyVolumeStatus.TargetVolumeAttributesClassName | ||
// Case 1: rollback to nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was trying to manually test case 1 alongside EBS CSI Driver, but looks like k/k API PVC Spec validation is getting in the way of setting `PVC.VACName = "":
❯ ka .
storageclass.storage.k8s.io/ebs-sc created
persistentvolumeclaim/ebs-claim created
pod/app created
volumeattributesclass.storage.k8s.io/io2-class created
volumeattributesclass.storage.k8s.io/io3-class unchanged
❯ k patch pvc ebs-claim --patch '{"spec": {"volumeAttributesClassName": "io3-class"}}'
persistentvolumeclaim/ebs-claim patched
❯ k describe pvc
Name: ebs-claim
...
Normal VolumeModify 23s external-resizer ebs.csi.aws.com external resizer is modifying volume ebs-claim with vac io3-class
Warning VolumeModifyFailed 20s external-resizer ebs.csi.aws.com rpc error: code = InvalidArgument desc ...
❯ k patch pvc ebs-claim --patch '{"spec": {"volumeAttributesClassName": ""}}'
The PersistentVolumeClaim "ebs-claim" is invalid: spec.volumeAttributesClassName: Forbidden: update from non-nil value to an empty string is forbidden
We would have to remove the following validations on PVC and PV specs in kubernetes/pkg/apis/core/validation/validation.go at master · kubernetes/kubernetes:
len(pvc.spec.VACName) > 0
https://github.com/kubernetes/kubernetes/blob/02eb7d424ad5ccf4f00863fe861f165be0d491da/pkg/apis/core/validation/validation.go#L2381
Might also have to remove - len(pvSpec.VACName) == 0
https://github.com/kubernetes/kubernetes/blob/02eb7d424ad5ccf4f00863fe861f165be0d491da/pkg/apis/core/validation/validation.go#L2109-L2111
This PR might still be fine. If you wish I can test this resizer against a modified k/k, and then raise PR for the API validation changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to change validations, we might be in trouble wrt to moving feature GA in this release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VAC name in PVC spec is a pointer to string. It can be nil, but it cannot be an empty string:
https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/types.go#L403
// Name of VolumeAttributesClass to which this persistent volume belongs. Empty value is not allowed
......
VolumeAttributesClassName *string
In the code, it is possible to reset it to nil again.
We should not allow user to set VAC name to an empty string. It is not allowed by the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndrewSirenko Can you test it by removing "VolumeAttributesClassName" from the PVC spec? It is an optional field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xing-yang This time I used kubectl edit
to either delete the line volumeAttributesClassName: io3-class
OR set it to the yaml indicator for nil volumeAttributesClassName: ~
Both times I get 14 # * spec.volumeAttributesClassName: Forbidden: update from non-nil value to nil is forbidden
Now I explicitly try to patch as well:
❯ k patch pvc ebs-claim --type=merge -p '{"spec":{"VolumeAttributesClassName": null}}'
persistentvolumeclaim/ebs-claim patched (no change)
❯ k get pvc -o yaml
....
spec:
volumeAttributesClassName: io3-class
...
type: ModifyingVolume
modifyVolumeStatus:
status: Infeasible
targetVolumeAttributesClassName: io3-class
phase: Bound
I believe this "cannot update non-nil to nil" is due to the following validation: https://github.com/kubernetes/kubernetes/blob/02eb7d424ad5ccf4f00863fe861f165be0d491da/pkg/apis/core/validation/validation.go#L2461-L2464
if oldPvc.Spec.VolumeAttributesClassName != nil {
if newPvc.Spec.VolumeAttributesClassName == nil {
allErrs = append(allErrs, field.Forbidden(field.NewPath("spec", "volumeAttributesClassName"), "update from non-nil value to nil is forbidden"))
} else if len(*newPvc.Spec.VolumeAttributesClassName) == 0 {
allErrs = append(allErrs, field.Forbidden(field.NewPath("spec", "volumeAttributesClassName"), "update from non-nil value to an empty string is forbidden"))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is possible to allow this only during a rollback. If VAC name is set to a non-nil value successfully, maybe we should not allow user to change it back to nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future reviewers
More discussion on k/k slack here: https://kubernetes.slack.com/archives/C8EJ01Z46/p1748628161127689
See pr here: Allow PVC VACName to go from non-nil to nil
// for rollbacking infeasible errors | ||
// Record an event to indicate that external resizer is rolling back this volume. | ||
ctrl.eventRecorder.Event(pvc, v1.EventTypeNormal, util.VolumeModify, | ||
fmt.Sprintf("external resizer is rolling back volume %s with infeasible error", pvc.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opinion: Thoughts on explicitly mentioning VAC name in event message? Something like "external resizer is rolling back volume with infeasible error to VAC "
Might be nice for those retro-actively auditing PVC events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in this commit: ac09a30
Manually tested both cases alongside AWS EBS CSI Driver v1.44 on Kubernetes v1.33. Rollback from VAC B to VAC A works great: https://gist.github.com/AndrewSirenko/22f62e64e604bc00cf2373aa4b2a0f4e Couldn't test rollback to no VAC because turns out we validate pvc.spec.VACName to not be len 0 :/. See comment #487 (comment) Will build k/k without those validations and retest this PR against that cluster next week. |
feae994
to
ac09a30
Compare
/retest |
pvcSpecVacName := pvc.Spec.VolumeAttributesClassName | ||
curVacName := pvc.Status.CurrentVolumeAttributesClassName | ||
targetVacName := pvc.Status.ModifyVolumeStatus.TargetVolumeAttributesClassName | ||
// Case 1: rollback to nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Infeasible condition and targetVacName not being cleared in case 1.
After updating kube-api-server to loosen PVC (and PV) Update validation, I am able to make PVC.spec.Vacname nil, but conditions and targetVacName not being cleared.
❯ k patch pvc ebs-claim --patch '{"spec": {"volumeAttributesClassName": "io3-class"}}'
persistentvolumeclaim/ebs-claim patched
❯ k get pvc -o yaml
spec:
storageClassName: ebs-sc
volumeAttributesClassName: io3-class
volumeMode: Filesystem
volumeName: pvc-e6c5e8ea-c220-4c59-8162-3ef63c3fb02f
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
conditions:
- lastProbeTime: null
lastTransitionTime: "2025-06-04T20:46:21Z"
message: ModifyVolume operation in progress.
status: "True"
type: ModifyingVolume
modifyVolumeStatus:
status: InProgress
targetVolumeAttributesClassName: io3-class
phase: Bound
kind: List
metadata:
resourceVersion: ""
❯ k patch pvc ebs-claim --patch '{"spec": {"volumeAttributesClassName": null}}'
persistentvolumeclaim/ebs-claim patched
❯ k get pvc -o yaml
...
spec:
storageClassName: ebs-sc
volumeMode: Filesystem
volumeName: pvc-e6c5e8ea-c220-4c59-8162-3ef63c3fb02f
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
conditions:
- lastProbeTime: null
lastTransitionTime: "2025-06-04T20:46:24Z"
message: 'ModifyVolume failed with errorrpc error: code = InvalidArgument desc
= Could not modify volume (invalid argument) "vol-0a79bd36e00ed821d": invalid
argument: operation error EC2: ModifyVolume, https response error StatusCode:
400, RequestID: 9de7e3e3-c512-4004-958f-6a83c9ed791c, api error UnknownVolumeType:
Unsupported volume type ''io3'' for volume creation. . Waiting for retry.'
status: "True"
type: ModifyingVolume
modifyVolumeStatus:
status: Infeasible
targetVolumeAttributesClassName: io3-class
phase: Bound
kind: List
metadata:
resourceVersion: ""
Case 2 is still fine.
See gist for longer example: K/K VAC Validation changes + Sunny's resizer PR
Will link k/k validation pr shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works with commit 9daca! Tested on cluster with api server built from kubernetes/kubernetes#132106 and EBS CSI Driver + resizer built from this commit
Proof:
❯ ka examples/kubernetes/modify-volume/manifests
Warning: storage.k8s.io/v1beta1 VolumeAttributesClass is deprecated in v1.34+, unavailable in v1.37+
volumeattributesclass.storage.k8s.io/io3-class created
storageclass.storage.k8s.io/ebs-sc created
persistentvolumeclaim/ebs-claim created
pod/app created
volumeattributesclass.storage.k8s.io/io2-class created
❯ k patch pvc ebs-claim --patch '{"spec": {"volumeAttributesClassName": "io3-class"}}'
persistentvolumeclaim/ebs-claim patched
❯ k describe pvc
Name: ebs-claim
Namespace: default
StorageClass: ebs-sc
Status: Bound
Volume: pvc-4cbbe0a7-acb4-4b99-ab9c-161848377cf3
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
volume.kubernetes.io/selected-node: i-0d40f496eeab7c80c
volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 10Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: app
Conditions:
Type Status LastProbeTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
ModifyingVolume True Mon, 01 Jan 0001 00:00:00 +0000 Tue, 15 Jul 2025 21:32:25 +0000 ModifyVolume failed with errorrpc error: code = InvalidArgument desc = Could not modify volume (invalid argument) "vol-071a09677ddaa352e": invalid argument: operation error EC2: ModifyVolume, https response error StatusCode: 400, RequestID: 2742eb7a-c695-4fb7-90fd-085d2990537c, api error UnknownVolumeType: Unsupported volume type 'io3' for volume creation. . Waiting for retry.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 21s persistentvolume-controller waiting for first consumer to be created before binding
Normal Provisioning 21s ebs.csi.aws.com_ebs-csi-controller-b4fcfd4db-5dp4t_b0c389e3-afe9-4ed5-9c4e-e2d61d19c640 External provisioner is provisioning volume for claim "default/ebs-claim"
Normal ExternalProvisioning 21s persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'ebs.csi.aws.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
Normal ProvisioningSucceeded 18s ebs.csi.aws.com_ebs-csi-controller-b4fcfd4db-5dp4t_b0c389e3-afe9-4ed5-9c4e-e2d61d19c640 Successfully provisioned volume pvc-4cbbe0a7-acb4-4b99-ab9c-161848377cf3
Normal VolumeModify 4s external-resizer ebs.csi.aws.com external resizer is modifying volume ebs-claim with vac io3-class
Warning VolumeModifyFailed 1s external-resizer ebs.csi.aws.com rpc error: code = InvalidArgument desc = Could not modify volume (invalid argument) "vol-071a09677ddaa352e": invalid argument: operation error EC2: ModifyVolume, https response error StatusCode: 400, RequestID: 2742eb7a-c695-4fb7-90fd-085d2990537c, api error UnknownVolumeType: Unsupported volume type 'io3' for volume creation.
❯ k get pvc -o yaml
apiVersion: v1
items:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"ebs-claim","namespace":"default"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"10Gi"}},"storageClassName":"ebs-sc"}}
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
volume.kubernetes.io/selected-node: i-0d40f496eeab7c80c
volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
creationTimestamp: "2025-07-15T21:32:05Z"
finalizers:
- kubernetes.io/pvc-protection
name: ebs-claim
namespace: default
resourceVersion: "6096"
uid: 4cbbe0a7-acb4-4b99-ab9c-161848377cf3
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ebs-sc
volumeAttributesClassName: io3-class
volumeMode: Filesystem
volumeName: pvc-4cbbe0a7-acb4-4b99-ab9c-161848377cf3
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
conditions:
- lastProbeTime: null
lastTransitionTime: "2025-07-15T21:32:25Z"
message: 'ModifyVolume failed with errorrpc error: code = InvalidArgument desc
= Could not modify volume (invalid argument) "vol-071a09677ddaa352e": invalid
argument: operation error EC2: ModifyVolume, https response error StatusCode:
400, RequestID: 2742eb7a-c695-4fb7-90fd-085d2990537c, api error UnknownVolumeType:
Unsupported volume type ''io3'' for volume creation. . Waiting for retry.'
status: "True"
type: ModifyingVolume
modifyVolumeStatus:
status: Infeasible
targetVolumeAttributesClassName: io3-class
phase: Bound
kind: List
metadata:
resourceVersion: ""
❯ k patch pvc ebs-claim --patch '{"spec": {"volumeAttributesClassName": null}}'
persistentvolumeclaim/ebs-claim patched
❯ k describe pvc
Name: ebs-claim
Namespace: default
StorageClass: ebs-sc
Status: Bound
Volume: pvc-4cbbe0a7-acb4-4b99-ab9c-161848377cf3
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
volume.kubernetes.io/selected-node: i-0d40f496eeab7c80c
volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 10Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: app
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 57s persistentvolume-controller waiting for first consumer to be created before binding
Normal Provisioning 57s ebs.csi.aws.com_ebs-csi-controller-b4fcfd4db-5dp4t_b0c389e3-afe9-4ed5-9c4e-e2d61d19c640 External provisioner is provisioning volume for claim "default/ebs-claim"
Normal ExternalProvisioning 57s persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'ebs.csi.aws.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
Normal ProvisioningSucceeded 54s ebs.csi.aws.com_ebs-csi-controller-b4fcfd4db-5dp4t_b0c389e3-afe9-4ed5-9c4e-e2d61d19c640 Successfully provisioned volume pvc-4cbbe0a7-acb4-4b99-ab9c-161848377cf3
Normal VolumeModify 40s external-resizer ebs.csi.aws.com external resizer is modifying volume ebs-claim with vac io3-class
Warning VolumeModifyFailed 37s external-resizer ebs.csi.aws.com rpc error: code = InvalidArgument desc = Could not modify volume (invalid argument) "vol-071a09677ddaa352e": invalid argument: operation error EC2: ModifyVolume, https response error StatusCode: 400, RequestID: 2742eb7a-c695-4fb7-90fd-085d2990537c, api error UnknownVolumeType: Unsupported volume type 'io3' for volume creation.
Normal VolumeModify 6s external-resizer ebs.csi.aws.com external resizer is rolling back volume ebs-claim with infeasible error to VAC nil
❯ k get pvc -o yaml
apiVersion: v1
items:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"ebs-claim","namespace":"default"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"10Gi"}},"storageClassName":"ebs-sc"}}
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
volume.kubernetes.io/selected-node: i-0d40f496eeab7c80c
volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
creationTimestamp: "2025-07-15T21:32:05Z"
finalizers:
- kubernetes.io/pvc-protection
name: ebs-claim
namespace: default
resourceVersion: "6213"
uid: 4cbbe0a7-acb4-4b99-ab9c-161848377cf3
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ebs-sc
volumeMode: Filesystem
volumeName: pvc-4cbbe0a7-acb4-4b99-ab9c-161848377cf3
status:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
phase: Bound
kind: List
metadata:
resourceVersion: ""
ac09a30
to
9dacaa9
Compare
|
||
// Update PV | ||
newPV := pv.DeepCopy() | ||
newPV.Spec.VolumeAttributesClassName = pvc.Spec.VolumeAttributesClassName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because API Server prohibits PV.Spec.VACName to be empty string, we must guard against case PVC.spec.VACName="" and set PV.VACName to nil in that case
Discussed https://kubernetes.slack.com/archives/C09QZFCE5/p1752777281020219
9dacaa9
to
471edad
Compare
expectedPVVolumeAttributesClassName: &testVac, | ||
}, | ||
{ | ||
name: "modify volume rollback to nil succeeds for infeasible errors", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a modify volume to empty string rollback succeeds unit test just to be safe in case someone forgets PV.Spev.VACName must not be empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am planning to, but just updating the PR to test k/k change first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test case, and the expectedPVVolumeAttributesClassName is nil.
471edad
to
58d8381
Compare
Works alongside Allow PVC VACName to go from non-nil to nil! Proof: Success: Nil VAC -> Infeasible VAC -> emptystring VACName |
58d8381
to
42bb9bf
Compare
42bb9bf
to
92a872b
Compare
@@ -73,6 +74,44 @@ func (ctrl *modifyController) markControllerModifyVolumeStatus( | |||
return updatedPVC, nil | |||
} | |||
|
|||
func (ctrl *modifyController) markCotrollerModifyVolumeRollbackCompeleted( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in markController
btw.
What type of PR is this?
/kind feature
What this PR does / why we need it:
ControllerModifyVolume should allow rollback if a modification ends in infeasible state.
Google doc VolumeAttributesClass Recovery Support
Which issue(s) this PR fixes:
Fixes #482 , kubernetes/kubernetes#130597
Special notes for your reviewer:
Does this PR introduce a user-facing change?: