Skip to content

Conversation

@nixpanic
Copy link
Member

There have been occasional reports where syncing an image never completes. The cause seems to come from Ceph-CSI, as it may call mirror.Resync() even when the image is in a syncing state. Every time Ceph-CSI calls mirror.Resync(), the resync is started from scratch. If this is called often enough, the resync is never able to complete.

This PR introduces two improvements:

  1. only start a resync if the status is not set to syncing already
  2. after starting a resync, wait with responding to the caller, until the status is syncing

There are a few cleanups done as well, these make it simpler to implement the proposed changes.


Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@nixpanic nixpanic requested a review from yati1998 April 17, 2025 08:42
@mergify mergify bot added the component/rbd Issues related to RBD label Apr 17, 2025
@nixpanic
Copy link
Member Author

Lots of CI jobs failed, that should be fixed once #5281 is merged.

@nixpanic nixpanic marked this pull request as ready for review April 28, 2025 11:13
@nixpanic nixpanic requested a review from a team April 28, 2025 11:13
@yati1998
Copy link
Contributor

Thanks @nixpanic this PR looks good to me. I will be waiting for Ilya to review it as per the discussion going on in jira.

@nixpanic
Copy link
Member Author

Thanks @nixpanic this PR looks good to me. I will be waiting for Ilya to review it as per the discussion going on in jira.

Hey @yati1998 , if this is an improvement, we might as well take it in. Maybe it is not a final solution yet, a follow-up can be made later?

@Rakshith-R
Copy link
Contributor

Thanks @nixpanic this PR looks good to me. I will be waiting for Ilya to review it as per the discussion going on in jira.

Hey @yati1998 , if this is an improvement, we might as well take it in. Maybe it is not a final solution yet, a follow-up can be made later?

👍 , I agree this set of change is good to have regardless.

Copy link
Contributor

@Rakshith-R Rakshith-R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@nixpanic
Copy link
Member Author

nixpanic commented May 5, 2025

@Mergifyio queue

@mergify
Copy link
Contributor

mergify bot commented May 5, 2025

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 86576b4

nixpanic added 4 commits May 5, 2025 15:38
By introducing a SyncInfo interface, the `getLastSyncInfo` function can
be removed from the csi-addons/rbd package. Getting details from an RBD
mirror status, should be part of the main internal/rbd package, the
CSI-Addons components should only use the internal API, and not add any
parsing logic.

This makes it easier to enhance the SyncInfo interface in the future.

Signed-off-by: Niels de Vos <[email protected]>
It may take some time for the RBD-mirror daemon to start syncing the
image. After the resync operation is executed, the status of the resync
is checked with a small delay to prevent subsequent resync calls from
re-starting the resync quickly after each other.

Signed-off-by: Niels de Vos <[email protected]>
@mergify mergify bot force-pushed the bug/DFBUGS-984 branch from 01df18d to 4a29952 Compare May 5, 2025 15:38
@mergify mergify bot added the ok-to-test Label to trigger E2E tests label May 5, 2025
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.30

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label May 5, 2025
@mergify mergify bot merged commit 86576b4 into ceph:devel May 5, 2025
36 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/rbd Issues related to RBD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants