Skip to content

Conversation

@Rakshith-R
Copy link
Contributor

Describe what this PR does

This commit adds design doc for adding nodeID:userID mapping metadata on each mapped rbd image and mounted cephfs subvolumes and its usecases.

Do you have any questions?
No

Is the change backward compatible?
Yes

Are there concerns around backward compatibility?
No

Provide any external context for the change, if any.

refer: rook/rook#15904

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@Rakshith-R Rakshith-R requested a review from a team July 11, 2025 09:49
@mergify mergify bot added ci/skip/e2e skip running e2e CI jobs ci/skip/multi-arch-build skip building on multiple architectures component/docs Issues and PRs related to documentation labels Jul 11, 2025
* Remove the metadata from the volume:

```
.[rbd|cephfs].csi.ceph.com/userID/<NodeId>:<userID>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will not have secrets to remove it from the NodeUnstage, please specify if you are planning to use the new secret from the configmap.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update the sideeffect if the secret is not specified in the configmap as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, forgot, let's do only controllerunpublish.
Linked to the secrets handling section in other design doc.

cluster

* **Alert Management**: Generate alerts when old userIDs are detected being used
by specific PVCs on specific nodes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specific PVCs to specific volumes

Comment on lines 3 to 4
In Kubernetes environments using Ceph CSI, when volumes are mounted on nodes,
each node uses a specific userID to access the volume. Currently, there is no
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep it generic mention CO instead of kubernetes.

@@ -0,0 +1,66 @@
# UserID Mapping

CephCSI mounts RBD or CephFS volumes on nodes using specific ceph userID. Currently,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please always write Ceph-CSI.

Uhm, a ceph userID? My first impression was that root is used for mounting... userID can be anything, and is commonly the UID of the system user. Here you mean a Ceph user, which usually isn't a numeric ID.

Maybe phrase it differently, so that there can be no confusion.

Ceph-CSI connects to the Ceph cluster with the credentials of a Ceph user account. The account is used to communicate with the MONs, OSDs and other services to attach RBD-images and mount CephFS volumes.

there is no mechanism to track and expose this nodeID:userID mapping information per volume.

## Problem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention why RBD watchers (or sysfs inspection) and CephFS sessions are not a good approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a section "alternatives" mentioning other approaches and its drawbacks.

* **Key Rotation**: Automate key cleanup by identifying when old userIDs are no
longer in use on any volumes, allowing safe deletion of associated old keyrings

These capabilities can be implemented by listing all RBD images or CephFS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an API for gathering the information. Someone would need access to the Ceph cluster, and know which RADOS-namespaces, pools and whatnot to check.

How can this be consumed in a way that tools can use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded the paragraph to include cases where multiple radonNS, pool, SubVolumeGroup and CephFilesystems may need to be considered and searched.


### ControllerUnpublishVolume()

* Remove the metadata from the volume:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this RPC is issued if the pod is force deleted as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this RPC is issued if the pod is force deleted as well?

I think you want to ask about force deletion of volume attachment.
Force deletions just make the resource disappear from the cluster and
its an unsupported action unless user is very sure of what they are doing.

Yes, force deletion of volume attachment AFAIK make kubelet forget about the attached volume
entirely and thus may not lead to cleanup calls like nodeunpublish, nodeunstage and controllerUnpublish.

IMO, its out of scope to account for and deal with such unsupported and disruptive actions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, i meant to ask for the force deletion of the pod as well, Well in some cases the user just deletes the pod and doesnt wait for garbage collection(clean umount etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally
deleting with force will still lead to clean unpublish and unstage calls

[pm@dhcp53-115 ~]$ k get po
NAME               READY   STATUS    RESTARTS   AGE
csi-rbd-demo-pod   1/1     Running   0          66s

[pm@dhcp53-115 ~]$ k get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      VOLUMEATTRIBUTESCLASS   AGE
rbd-pvc   Bound    pvc-c728d27b-be6e-4d2d-a20c-c29a7b51cc4e   1Gi        RWO            rook-ceph-block   <unset>                 72s

[pm@dhcp53-115 ~]$ k describe pvc | grep Used
Used By:       csi-rbd-demo-pod

[pm@dhcp53-115 ~]$ k get volumeattachments.storage.k8s.io 
NAME                                                                   ATTACHER                     PV                                         NODE     ATTACHED   AGE
csi-3f51d33e06a6f1646ee0b741fb9c70791e4522720ebf1f4b1b3f1e54995cd50b   rook-ceph.rbd.csi.ceph.com   pvc-c728d27b-be6e-4d2d-a20c-c29a7b51cc4e   c1-m03   true       91s

[pm@dhcp53-115 ~]$ k delete po csi-rbd-demo-pod --force
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "csi-rbd-demo-pod" force deleted

[pm@dhcp53-115 ~]$ k get volumeattachments.storage.k8s.io 
No resources found

[pm@dhcp53-115 ~]$  k logs -n rook-ceph csi-rbdplugin-rrb4s csi-rbdplugin | grep "Req-ID: 0001-0009-rook-ceph-0000000000000002-b34fe0de-b91e-49ce-8a76-61d0ab690721 GRPC call"
I0714 10:42:57.920899 1007627 utils.go:271] ID: 26 Req-ID: 0001-0009-rook-ceph-0000000000000002-b34fe0de-b91e-49ce-8a76-61d0ab690721 GRPC call: /csi.v1.Node/NodeStageVolume
I0714 10:42:58.695905 1007627 utils.go:271] ID: 30 Req-ID: 0001-0009-rook-ceph-0000000000000002-b34fe0de-b91e-49ce-8a76-61d0ab690721 GRPC call: /csi.v1.Node/NodePublishVolume
I0714 10:44:35.857884 1007627 utils.go:271] ID: 33 Req-ID: 0001-0009-rook-ceph-0000000000000002-b34fe0de-b91e-49ce-8a76-61d0ab690721 GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0714 10:44:35.970499 1007627 utils.go:271] ID: 35 Req-ID: 0001-0009-rook-ceph-0000000000000002-b34fe0de-b91e-49ce-8a76-61d0ab690721 GRPC call: /csi.v1.Node/NodeUnstageVolume

The following alternatives were considered but were not chosen for reasons
listed below:

* Extracting Ceph user account from RBD watchers:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How it possible to extract the ceph user account from watchers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 I was thought by mistake it had client info.
Looks like I was wrong.
Removing this point.

$ rbd status ocs-storagecluster-cephblockpool/csi-vol-242c97a4-6fda-4ee2-bb82-c55da1fc0399
Watchers:
	watcher=10.128.2.2:0/1799327480 client.201789 cookie=93830608692352
	watcher=100.64.0.5:0/580989485 client.15589 cookie=18446462598732884571

@Rakshith-R Rakshith-R force-pushed the design-userID branch 2 times, most recently from 0daab71 to 8b5a469 Compare July 14, 2025 08:36
@Rakshith-R Rakshith-R requested review from Madhu-1 and nixpanic July 14, 2025 08:41
Copy link
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nits, LGTM

Comment on lines 16 to 17
* Make informed decisions about key rotation
* Clean up old userIDs safely
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combine this to one as its the same?

## Important Note

> ⚠️ **WARNING**: This feature is only effective after upgrading to a Ceph-CSI
> version that supports this functionality AND rebooting all nodes in the cluster.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AND to and

Madhu-1
Madhu-1 previously approved these changes Jul 17, 2025

## Important Note

> ⚠️ **WARNING**: This feature is only effective after upgrading to a Ceph-CSI
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write warning notes like this:

> [!WARNING]  
> Details that are under discussion and/or investigation are listed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write warning notes like this:

> [!WARNING]  
> Details that are under discussion and/or investigation are listed here.

Done 👍
thanks

This commit adds design doc for adding nodeID:userID
mapping metadata on each mapped rbd image and mounted
cephfs subvolumes and its usecases.

Signed-off-by: Rakshith R <[email protected]>
@mergify mergify bot dismissed Madhu-1’s stale review July 17, 2025 13:00

Pull request has been modified.

@Rakshith-R Rakshith-R requested review from Madhu-1 and nixpanic July 17, 2025 13:01
@nixpanic nixpanic requested a review from a team July 17, 2025 13:43
@mergify mergify bot merged commit 1f2706f into ceph:devel Jul 17, 2025
15 checks passed
iPraveenParihar added a commit to iPraveenParihar/ceph-csi that referenced this pull request Jul 23, 2025
This commit adds a new flag 'setuseridmapping' false by default.
When enabled the driver will add a nodeId: userId mapping on the
volume metadata. The userId here refers to the ceph user account
used to map a volume on a node.

design doc: ceph#5425

Signed-off-by: Praveen M <[email protected]>
@iPraveenParihar iPraveenParihar mentioned this pull request Jul 23, 2025
6 tasks
iPraveenParihar added a commit to iPraveenParihar/ceph-csi that referenced this pull request Jul 23, 2025
This commit adds a new flag 'setuseridmapping' false by default.
When enabled the driver will add a nodeId: userId mapping on the
volume metadata. The userId here refers to the ceph user account
used to map a volume on a node.

design doc: ceph#5425

Signed-off-by: Praveen M <[email protected]>
iPraveenParihar added a commit to iPraveenParihar/ceph-csi that referenced this pull request Jul 24, 2025
This commit adds a new flag 'setuseridmapping' false by default.
When enabled the driver will add a nodeId: userId mapping on the
volume metadata. The userId here refers to the ceph user account
used to map a volume on a node.

design doc: ceph#5425

Signed-off-by: Praveen M <[email protected]>
iPraveenParihar added a commit to iPraveenParihar/ceph-csi that referenced this pull request Jul 24, 2025
This commit adds a new flag 'setuseridmapping' false by default.
When enabled the driver will add a nodeId: userId mapping on the
volume metadata. The userId here refers to the ceph user account
used to map a volume on a node.

design doc: ceph#5425

Signed-off-by: Praveen M <[email protected]>
iPraveenParihar added a commit to iPraveenParihar/ceph-csi that referenced this pull request Jul 24, 2025
This commit advertise PUBLISH_UNPUBLISH_VOLUME capability
if the 'setmetadata' flag is enabled.
This is required for removing the nodeId:userId mapping metadata
in ControllerUnpublishVolume() call.

design doc: ceph#5425

Signed-off-by: Praveen M <[email protected]>
iPraveenParihar added a commit to iPraveenParihar/ceph-csi that referenced this pull request Jul 24, 2025
This commit advertise PUBLISH_UNPUBLISH_VOLUME capability
if the 'setmetadata' flag is enabled.
This is required for removing the nodeId:userId mapping metadata
in ControllerUnpublishVolume() call.

design doc: ceph#5425

Signed-off-by: Praveen M <[email protected]>
iPraveenParihar added a commit to iPraveenParihar/ceph-csi that referenced this pull request Jul 29, 2025
This commit advertise PUBLISH_UNPUBLISH_VOLUME capability
if the 'setmetadata' flag is enabled.
This is required for removing the nodeId:userId mapping metadata
in ControllerUnpublishVolume() call.

design doc: ceph#5425

Signed-off-by: Praveen M <[email protected]>
iPraveenParihar added a commit to iPraveenParihar/ceph-csi that referenced this pull request Aug 11, 2025
This commit advertise PUBLISH_UNPUBLISH_VOLUME capability
if the 'setmetadata' flag is enabled.
This is required for removing the nodeId:userId mapping metadata
in ControllerUnpublishVolume() call.

design doc: ceph#5425

Signed-off-by: Praveen M <[email protected]>
iPraveenParihar added a commit to iPraveenParihar/ceph-csi that referenced this pull request Aug 12, 2025
This commit advertise PUBLISH_UNPUBLISH_VOLUME capability
if the 'setmetadata' flag is enabled.
This is required for removing the nodeId:userId mapping metadata
in ControllerUnpublishVolume() call.

design doc: ceph#5425

Signed-off-by: Praveen M <[email protected]>
iPraveenParihar added a commit to iPraveenParihar/ceph-csi that referenced this pull request Aug 13, 2025
This commit advertise PUBLISH_UNPUBLISH_VOLUME capability
if the 'setmetadata' flag is enabled.
This is required for removing the nodeId:userId mapping metadata
in ControllerUnpublishVolume() call.

design doc: ceph#5425

Signed-off-by: Praveen M <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/skip/e2e skip running e2e CI jobs ci/skip/multi-arch-build skip building on multiple architectures component/docs Issues and PRs related to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants