Skip to content

Snapshot controller cannot recover from missing volume snapshot class error #333

@saikat-royc

Description

@saikat-royc

In the current implementation of the snapshot controller, in checkAndUpdateSnapshotClass()
if a missing volume snapshot class is detected, an error status is stamped on the volume snapshot object.
Periodic sync, does not clear the error status. Side effect of this is that, even if the volume snapshot class is detected in the subsequent resyncs, syncUnreadySnapshot() never triggers the snapshot content creation object. Because following condition never evaluates to true (snapshot.Status == nil || snapshot.Status.Error == nil || isControllerUpdateFailError(snapshot.Status.Error)), and the volume snapshot workflow is stuck.

Possible fixes:

  1. Do not update any error status on the volume object i.e skip calling updateSnapshotErrorStatusWithEvent() from checkAndUpdateSnapshotClass(), and only log an error message. The state machine would fail gracefully while creating a VS content object.

  2. Do not update the error status on volume object, but generate an event. (this needs additional changes to ensure that only 1 event is generated, maybe stamp an annotation of missing VSC on the volume object, before generating event)

  3. Update the error status as it is done today, but when we detect a VSC in subsequent resync clear the error status (this needs to ensure we check the error msg reason and clear only the VSC missing error status)

  4. Update error status for missing VSC as it is done today, but handle this VSC missing error in syncUnreadySnapshot() and proceed with VS content creation. VS content creation would fail gracefully if the VSC is still missing.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions