Skip to content

Conversation

@nixpanic
Copy link
Member

@nixpanic nixpanic commented Oct 3, 2024

ensureImageCleanup() can cause a panic when an image was deleted, but the journal still contained a reference. By opening the IOContext before using, an error may be returned instead of a panic when using a nil or freed IOContext.

runtime/debug.Stack()
        /usr/local/go/src/runtime/debug/stack.go:24 +0x5e
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x13
github.com/ceph/ceph-csi/internal/csi-common.panicHandler.func1()
        /go/src/github.com/ceph/ceph-csi/internal/csi-common/utils.go:336 +0xca
panic({0x21c43c0?, 0x4066500?})
        /usr/local/go/src/runtime/panic.go:770 +0x132
github.com/ceph/go-ceph/rados.(*IOContext).Pointer(...)
        /go/src/github.com/ceph/ceph-csi/vendor/github.com/ceph/go-ceph/rados/ioctx.go:121
github.com/ceph/go-ceph/rbd.cephIoctx(...)
        /go/src/github.com/ceph/ceph-csi/vendor/github.com/ceph/go-ceph/rbd/rbd.go:102
github.com/ceph/go-ceph/rbd.GetTrashList.func1.1(0x7f23d8275418?, 0x10?, 0x7f242218bf18?)
        /go/src/github.com/ceph/ceph-csi/vendor/github.com/ceph/go-ceph/rbd/rbd.go:1025 +0x12
github.com/ceph/go-ceph/rbd.GetTrashList.func1(0x41bba5?)
        /go/src/github.com/ceph/ceph-csi/vendor/github.com/ceph/go-ceph/rbd/rbd.go:1025 +0x85
github.com/ceph/go-ceph/internal/retry.WithSizes(0xc000788f28?, 0x2800, 0xc000789048)
        /go/src/github.com/ceph/ceph-csi/vendor/github.com/ceph/go-ceph/internal/retry/sizer.go:51 +0x4a
github.com/ceph/go-ceph/rbd.GetTrashList(0x0)
        /go/src/github.com/ceph/ceph-csi/vendor/github.com/ceph/go-ceph/rbd/rbd.go:1022 +0xf4
github.com/ceph/ceph-csi/internal/rbd.(*rbdImage).ensureImageCleanup(0xc0008ae908, {0x2cdc138, 0xc000d13410})
        /go/src/github.com/ceph/ceph-csi/internal/rbd/rbd_util.go:624 +0x45

Related issues

Found while working on #4502.


Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@mergify mergify bot added the component/rbd Issues related to RBD label Oct 3, 2024
@nixpanic nixpanic requested a review from a team October 3, 2024 11:16
@nixpanic nixpanic force-pushed the rbd/ensureImageCleanup branch from 5846922 to e4e183c Compare October 4, 2024 07:38
@nixpanic nixpanic changed the title rbd: open IOContext before getting the list of trashed images rbd: validate IOContext before getting the list of trashed images Oct 4, 2024
@nixpanic nixpanic requested review from Madhu-1 and Rakshith-R October 4, 2024 07:38
@nixpanic nixpanic force-pushed the rbd/ensureImageCleanup branch from e4e183c to 22c12b6 Compare October 4, 2024 08:28
Copy link
Contributor

@Rakshith-R Rakshith-R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@nixpanic
Copy link
Member Author

nixpanic commented Oct 4, 2024

@Mergifyio queue

@mergify
Copy link
Contributor

mergify bot commented Oct 4, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at d33e6b1

`ensureImageCleanup()` can cause a panic when an image was deleted, but
the journal still contained a reference. By opening the IOContext before
using, an error may be returned instead of a panic when using a `nil` or
freed IOContext.

Signed-off-by: Niels de Vos <[email protected]>
@nixpanic nixpanic force-pushed the rbd/ensureImageCleanup branch from 22c12b6 to 48781a0 Compare October 4, 2024 09:00
@mergify mergify bot added the ok-to-test Label to trigger E2E tests label Oct 4, 2024
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.29

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.29

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.29

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Oct 4, 2024
@mergify mergify bot merged commit d33e6b1 into ceph:devel Oct 4, 2024
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/rbd Issues related to RBD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants