Skip to content

spec: NodePublishVolume and target_path #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
akutz opened this issue Aug 14, 2017 · 16 comments
Closed

spec: NodePublishVolume and target_path #80

akutz opened this issue Aug 14, 2017 · 16 comments
Assignees
Milestone

Comments

@akutz
Copy link
Contributor

akutz commented Aug 14, 2017

I have a question with regards to NodePublishVolume. Its request message, NodePublishVolumeRequest, includes the field target_path:

  // The path to which the volume will be published. It MUST be an
  // absolute path in the root filesystem of the process serving this
  // request. This is a REQUIRED field.
  string target_path = 5;

This is marked as a REQUIRED field. However, the CSI specification seems to allow for the possibility of a Node Plug-in presenting block devices directly to a container. For example, in a NodePublishVolumeRequest the volume_capability field can contain a BlockDevice message, indicating that the type of volume being published is a raw, block device.

Given that the CO will likely NOT know the filesystem device path to the block device ahead of time and thus be unable to populate, ex. /dev/sda, in the target_path field for the NodePublishVolumeRequest, shouldn't the target_path field be marked REQUIRED IFF the volume_capability field is a MountVolume message?

For example, here is a proposed revision:

  // The path to which the volume will be published. It MUST be an
  // absolute path in the root filesystem of the process serving this
  // request. This is a REQUIRED field IFF `volume_capabilities` is
  // set to a `MountVolume` message.
  string target_path = 5;
@akutz akutz changed the title NodePublishVolume and TargetPath spec: NodePublishVolume and target_path Aug 14, 2017
@julian-hj
Copy link
Contributor

julian-hj commented Aug 14, 2017

Good catch. I believe the intent behind making this field required and specified by the CO was to make the call more idempotent--in other words, when a second call comes in, if the target path is the same as the last call, then the plug-in knows for sure that it's just a repeated call and not a request to do a second thing. That's potentially a real use case for volume mounts, where we might maybe want to mount the same volume to two different paths. For block devices, I am not sure.

I think we imagined that iaas attachments would occur during ControllerPublishVolume not NodePublishVolume but some block devices need stuff to run on the node to complete the attachment. Do we have the same kind of idempotency issue when we are doing iscsi initiation or scini or something along those lines?

@akutz
Copy link
Contributor Author

akutz commented Aug 14, 2017

Hi @julian-hj,

As I understand it the proposed workflow would be for plug-ins to mount devices in a "private" area and bind-mount them to the supplied target_path, hence the idempotency.

The question then becomes is there a way to present a raw block device to a consumer multiple times -- in other words, can a raw block device participate in the idempotent nature of this call? The answer is, at least for me, "I don't know."

I'll say you have to assume that more than one NodePublishVolume is, or can be, invoked for every ControllerPublishVolume.

I think you also have to assume that NodePublishVolume can be invoked multiple times for a raw block device. Think of a VMFS device that is block but shared between multiple consumers.

Does any of this answer the question? No, I don't think so. I think all it does is propose that a block device should be capable of being presented to multiple consumers. I think that would happen by simply providing the same device path each time. I do not think you can present an "alias" for the device to each container as you do via bind mounts and FS-based storage.

In the end I think that target_path is probably not relevant for raw block devices and should likely be marked as not required.

Then again, perhaps I'm missing a way to easily present a raw, block device multiple times via some sort of alias system.

@julian-hj
Copy link
Contributor

I am not sure if it is relevant or not, but my understanding is a bit different--I was under the impression that the plugin should mount the volume directly to target_path and later, during container creation, the CO would do the final bind mount to place the volume in the container filesystem.

But again, I am not sure that's relevant to this discussion. Seems like we lack practical experience with iscsi and its ilk. Would the folks who wrote the scaleio bits for libstorage have any more insight? (assuming that wasn't you...)

@akutz
Copy link
Contributor Author

akutz commented Aug 14, 2017

Hi @julian-hj,

RE: libStorage - That was me :) Or rather I'm the lead engineer for libStorage and REX-Ray. I didn't write the ScaleIO driver directly, but I'm familiar enough with it to answer any questions you might have.

I also thought the plug-in was to mount directly to the target_path, but my colleague @clintkitson told me that the discussion around this issue was moving towards the plug-in managing mounts in a private "holding" area and that a plug-in would then bind mount to the specified target_path.

@akutz
Copy link
Contributor Author

akutz commented Aug 14, 2017

Hi @julian-hj,

I think #17 is the issue to which @clintkitson has pointed me in the past. My understanding is then that a plug-in is responsible for mounting a device once in a private area and then bind mounting it to the target path with the requested permissions (ex. ro,rw).

@akutz
Copy link
Contributor Author

akutz commented Aug 14, 2017

Hi @julian-hj,

What I think all this shows us, if anything, is the importance of settling these discussions at a much more rapid clip. Issue #17 has been lingering since May 25th. @jieyu / @saad-ali, can we make sure to address issues #17 and #80 in the next CSI meeting? The potential workflow changes as a result of #17 need to be settled ASAP in order to establish a baseline for both discussions and implementations. Because sans #17 I would create a NodePlugin that simply mounts a device to the specified target_path, not handle mounts re: the aforementioned bind mount logic.

@codenrhoden
Copy link

I would like to see the same, re: clarification re: #17. I'm already wondering what to do here for a plugin, when it comes to mounting and bind-mounting. clarification on that, and how to handle 'rw' vs 'ro' permissions would be useful. If there is a private place that volumes are mounted before being bind-mounted to the target_path, it makes sense that those bind mounts can be used to publish the same volume as 'ro' or 'rw'. But, does it imply that all mounts to the "private" area are 'rw'? If the first time a volume is mounted and the request is for 'ro', you could make both mounts 'ro'. But if a second request comes in that is 'rw', that "private" mount would have to remounted as 'rw'. Is that intended?

Just an example of the implications.

@julian-hj
Copy link
Contributor

Hmm, yeah, I am not sure that the proposed mount-then-bind-mount flow is even practical--or at any rate, if we go that way, then we will end up with a plugin that mounts a thing and bind mounts it to a place, and then the CO will proceed to bind mount it again to get it into the container and possibly also to make it read only.

From my understanding, the open container runtime has that final bind mount fairly well baked into it, and the ultimate binding happens way down in the bowels of aufs (for linux) so just factoring it out and sticking it in the plugin is probably not gonna happen anytime soon.

@codenrhoden
Copy link

I wanted to leave another note here re: NodePublishVolume and target_path, as this hasn't seen much clarification. I'm still unclear on how to proceed when dealing with BlockVolume volume types. As discussed in #60, I expect that target_path already exists when mounting a MountVolume to it. For now, I've implemented the private mount, then bind-mount work flow, and that is working fine for me in it's early stages (but I have no one to test against, but myself, so it's not worth much).

In order to mount to target_path, it has to already exist, and be a directory. But this will not work for a BlockVolume. To publish a BlockVolume to a target_path, the simplest solution is a symlink. But in order to do that, target_path should not exist already -- the plugin would create it.

If that approach is taken, I think it's fine that target_path is always required. But it needs to be interpreted depending on Volume Type. For a MountVolume, it's an existing directory where a mount call can be made to mount a filesystem at that directory. For a BlockVolume it is a path to a file that the plugin should create, that would be a symlink to a block device.

Thoughts? This is something I could really use clarification on.

For anyone who wants to see how I approached the bind-mount logic, feel free to take a look at some early code. Note that BlockVolume isn't supported yet, because of this very issue.

@clintkitson
Copy link

This sounds like its at a point that a decision needs to be made to properly get block volume support in CSI. Thoughts @saad-ali @julian-hj @jdef @jieyu?

@akutz
Copy link
Contributor Author

akutz commented Aug 29, 2017

Hi All,

I was ready to mention "Didn't I raise this topic?" when I realized it was this topic :) Yes, this is still an important issue. The provided target_path is indicated as REQUIRED, but I feel strongly that must be dependent upon the type of volume being published.

@julian-hj
Copy link
Contributor

I am going to add this to the list of stuff to discuss in the next CSI WG meeting. I will be on a plane during that meeting, so I will miss it, but I am hoping that folks will come to a good solution.

For my part, I am thinking that perhaps given the various complications we face in expecting the CO to create the mount point, it would not be better to give up on that and come up with some alternate strategy to make sure that the NodePublishVolume call is idempotent. (For example, have the CO send a publish_id string that is guaranteed to be a valid path fragment, and allow the plugin to determine its own mechanism to correlate that with a target path of its own choosing.)

@jieyu
Copy link
Member

jieyu commented Aug 31, 2017

To publish a BlockVolume to a target_path, the simplest solution is a symlink. But in order to do that, target_path should not exist already -- the plugin would create it.

This can be a bind mount too? Just that target_path has to be a regular file, not a directory.

Symlink won't be meaningful if SP is in its own mount namespace and the root filesystem is different than that of the CO.

@codenrhoden
Copy link

@jieyu You are right, you can. I did not know this. When I first tried it out, I was thrown that fs type is shown as devtmpfs. Here's what I did to try it out.

# touch /mnt/loop0
# ls -al /dev/disk/csi-blockdevices/loop0
lrwxrwxrwx. 1 root root 10 Jul 25 10:02 /dev/disk/csi-blockdevices/loop0 -> /dev/loop0
# mount -o bind /dev/disk/csi-blockdevices/loop0 /mnt/loop0
# mount | grep loop
devtmpfs on /mnt/loop0 type devtmpfs (rw,nosuid,seclabel,size=241476k,nr_inodes=60369,mode=755)
# ls -al /mnt/loop0
brw-rw----. 1 root disk 7, 0 Jul 25 10:46 /mnt/loop0
# mount /mnt/loop0 /mnt/test
# mount | grep loop
devtmpfs on /mnt/loop0 type devtmpfs (rw,nosuid,seclabel,size=241476k,nr_inodes=60369,mode=755)
/mnt/loop0 on /mnt/test type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
# ls -al /mnt/test
total 0
drwxr-xr-x. 2 root root 20 Jul 27 14:59 .
drwxr-xr-x. 4 root root 44 Jul 27 15:00 ..
-rw-r--r--. 1 root root  0 Jul 27 14:59 travis

Note that I had previously formatted /dev/loop0 with XFS, and touched a file named travis at the root of it.

Thanks for pointing that out. I can go ahead and run with this info for now, and just have the plugin ensure that target_path is a regular file rather than a directory.

@saad-ali
Copy link
Member

saad-ali commented Nov 7, 2018

Target path for block can be a bind mount to a empty file. We care clarifying that in #285

I forget, are there other action items here? @jieyu?

@jieyu
Copy link
Member

jieyu commented Nov 7, 2018

@saad-ali yes. I will create a PR today to clarify that it's CO's job to create the mount point (file or directory depending on it's block or mount volume)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants