Skip to content
This repository was archived by the owner on Oct 22, 2024. It is now read-only.

Persistency model changes #177

Merged
merged 6 commits into from
Mar 4, 2019
Merged

Persistency model changes #177

merged 6 commits into from
Mar 4, 2019

Conversation

avalluri
Copy link
Contributor

Currently supported persistency types are:

  • cache volume
  • ephemeral volume

These changes are over csi v1.0, so might not work on kuberntes < v 1.13. The idea is we should maintain pre csi-1.0 changes in a branch(csi-0.0.3). I will review supported persistency models within the limitations of csi v0.0.3 and submit PR to csi-0-0.3 branch.

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far I've only looked at the documentation to figure out how this is meant to work.

README.md Outdated
volume gets created initially, without actually allocating space. Then
when an application starts and it is time to mount that "existing"
volume, an attempt is made to allocate the space required for the
volume locally. Only if there is enough space can application startup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can pmem-csi-sequence-diagram.png be deleted now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, may be Yes. I was thinking to make it use for illustrating the normal persistent volume type, But may not be needed as Kubernetes users will have that knowledge of how normal dynamic PVs work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the current diagram isn't the "normal" PVC, is it? It's the fake PVC with delayed allocation on pod startup.

Do you intend to keep this special mode? I thought we agreed to implement the normal PVC mode (allocate space in CreateVolume) with low priority, because it isn't really needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was that reusing the old diagram by making needed changes according to normal PVC flow. There is no fake/delayed volume creation anymore.

I would request you please review the PR code to get how really the cache and ephemeral are implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in my other comment, it is easier to review how this is meant to work by reading the user-visible documentation for it.

It makes sense to update the existing diagram. Right now it still shows that CreateVolume doesn't actually create a volume. Do we agree after merging this PR, CreateVolume will end up creating the pmem device on some node?

@pohly
Copy link
Contributor

pohly commented Feb 25, 2019 via email

@avalluri
Copy link
Contributor Author

@pohly, updated the document with Usage details, can you check if it helps.

@avalluri
Copy link
Contributor Author

@okartau and @pohly I updated the PR with:

  • reverting ephemeral support
  • merged default {identity,node}servers code with actual implementation
  • updated the documentation with persistent models and usage.

@@ -48,6 +48,10 @@ TEST_PMEM_MEM_SIZE=32768 # 32GB
TEST_PMEM_SHARE=on
TEST_PMEM_LABEL_SIZE=2097152

# Kubernetes feature gates to enable/disable
# featurename=true,feature=false
TEST_FEATURE_GATES="CSINodeInfo=true,CSIDriverRegistry=true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installation of the corresponding CRDs also belongs into the cluster setup. Currently the PR has that as part of the driver installation (like in some older CSI documentation), but that's conceptually the wrong place and was just a workaround for cluster setup tools not knowing about those CRDs yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. Done

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message e81ec79:

the newly created volume to that node by using 'Volume.Topology'. OC has to make

s/OC/The container orchestrator/

Let's avoid acronyms, the reader might not know them. It was spelled wrong, too.

Unlike normal persistent volume, Cache volume creates set of pmem volumes on

You've spelled "Cache" and further down also "Persistent" with capital letter when talking about the model. I think "cache volume" and "persistent volume" are normal English words and not specific names, so normal capitalization seems more appropriate.

deploy/kubernetes-1.13/pmem-pvc.yaml says storageClassName: pmem-csi-sc # defined in pmem-csi.yaml. That should say deploy/kubernetes-1.13/pmem-storageclass.yaml.

I just brought up a cluster based on e81ec79 and followed the example (cat deploy/kubernetes-1.13/pmem-pvc.yaml | _work/ssh-clear-kvm kubectl create -f -). The resulting PV has three entries in its topology information:

$ kubectl get pv/pvc-b0354387-3c0d-11e9-895f-deadbeef0100 -o yaml
...
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - host-3
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - host-1
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - host-2

That doesn't look right. I did make push-images before rebuilding the cluster, so I should be running the latest driver. However, this is hard to verify because the driver does not identify itself in the log output and has no unique version number. I've filed #183 to track this.

From the logs it looks like the driver treated this volume as a cache volume?

$ kubectl log pmem-csi-controller-0 pmem-driver
...
I0301 10:35:10.049651       1 glog.go:58] GRPC call: /csi.v1.Controller/CreateVolume
I0301 10:35:10.049673       1 glog.go:58] GRPC request: name:"pvc-b0354387-3c0d-11e9-895f-deadbeef0100" capacity_range:<required_bytes:8589934592 > volume_capabilities:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > accessibility_requirements:<requisite:<segments:<key:"kubernetes.io/hostname" value:"host-0" > > requisite:<segments:<key:"kubernetes.io/hostname" value:"host-1" > > requisite:<segments:<key:"kubernetes.io/hostname" value:"host-2" > > requisite:<segments:<key:"kubernetes.io/hostname" value:"host-3" > > preferred:<segments:<key:"kubernetes.io/hostname" value:"host-0" > > preferred:<segments:<key:"kubernetes.io/hostname" value:"host-1" > > preferred:<segments:<key:"kubernetes.io/hostname" value:"host-2" > > preferred:<segments:<key:"kubernetes.io/hostname" value:"host-3" > > > 
I0301 10:35:10.049861       1 glog.go:79] CreateVolume: Name: pvc-b0354387-3c0d-11e9-895f-deadbeef0100 req.Required: 8589934592 req.Limit: 0
I0301 10:35:10.049965       1 glog.go:79] Connecting to 192.168.8.8:10001
I0301 10:35:10.114505       1 glog.go:79] node host-3 has requested capacity
I0301 10:35:10.114536       1 glog.go:79] Connecting to 192.168.8.4:10001
I0301 10:35:10.181366       1 glog.go:79] node host-1 has requested capacity
I0301 10:35:10.181404       1 glog.go:79] Connecting to 192.168.8.6:10001
I0301 10:35:10.249569       1 glog.go:79] node host-2 has requested capacity
I0301 10:35:10.249621       1 glog.go:79] Found nodes: [host-3 host-1 host-2]
I0301 10:35:10.249667       1 glog.go:79] CreateVolume: Record new volume as {b03c6245-3c0d-11e9-ba0d-26c6b0ca2305 pvc-b0354387-3c0d-11e9-895f-deadbeef0100 8589934592 1  true fsdax}

Some more log output would be useful to understand what the driver is doing (which mode, which parameters, etc.).

README.md Outdated

Below are the volume persistency models considered for implementing by PMEM-CSI to serve different application use cases:

* Persistent Volumes</br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resorting to HTML is unnecessary, plain Markdown also works.

Just use two spaces:

* Persistent Volumes  
A volume gets created...

https://meta.stackexchange.com/questions/72395/is-it-possible-to-have-definition-lists-in-markdown/298872#298872

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip. I fixed all
s

README.md Outdated

Kubernetes cluster administrators can expose above mentioned [volume persistency types](#volume-persistency) to applications using [`StorageClass Parameters`](https://kubernetes.io/docs/concepts/storage/storage-classes/#parameters). An optional `persistencyModel` parameter differentiates how the provisioned volume can be used.

* if no `persistencyModel` parameter specified in `StorageClass` then it is treated as normal Kubernetes persistent volume. In this case PMEM-CSI creates PMEM volume on a node and the application that claims to use this volume is supposed to schedule on this node by Kubernetes. Choosing of node is depend on StorageClass `volumeBindingMode`. In case of `volumeBindingMode: Immediate` PMEM-CSI chooses a node randomly, and in case of `volumeBindingMode: WaitForFirstConsumer` Kubernetes first chooses a node for scheduling the application, and PMEM-CSI creates the volume on that node. Applications which claim a normal persistent volume has to use `ReadOnlyOnce` access mode in its `accessModes` list. This [diagram](/docs/images/sequence/pmem-csi-persistent-sequence-diagram.png) illustrates how a normal persistent volume gets provisioned in Kubernetes using PMEM-CSI driver.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/on this node/be scheduled onto this node/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@avalluri
Copy link
Contributor Author

avalluri commented Mar 1, 2019

That doesn't look right. I did make push-images before rebuilding the cluster, so I should be running the latest driver. However, this is hard to verify because the driver does not identify itself in the log output and has no unique version number. I've filed #183 to track this.

From the logs it looks like the driver treated this volume as a cache volume?

$ kubectl log pmem-csi-controller-0 pmem-driver
...
I0301 10:35:10.049651       1 glog.go:58] GRPC call: /csi.v1.Controller/CreateVolume
I0301 10:35:10.049673       1 glog.go:58] GRPC request: name:"pvc-b0354387-3c0d-11e9-895f-deadbeef0100" capacity_range:<required_bytes:8589934592 > volume_capabilities:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > accessibility_requirements:<requisite:<segments:<key:"kubernetes.io/hostname" value:"host-0" > > requisite:<segments:<key:"kubernetes.io/hostname" value:"host-1" > > requisite:<segments:<key:"kubernetes.io/hostname" value:"host-2" > > requisite:<segments:<key:"kubernetes.io/hostname" value:"host-3" > > preferred:<segments:<key:"kubernetes.io/hostname" value:"host-0" > > preferred:<segments:<key:"kubernetes.io/hostname" value:"host-1" > > preferred:<segments:<key:"kubernetes.io/hostname" value:"host-2" > > preferred:<segments:<key:"kubernetes.io/hostname" value:"host-3" > > > 
I0301 10:35:10.049861       1 glog.go:79] CreateVolume: Name: pvc-b0354387-3c0d-11e9-895f-deadbeef0100 req.Required: 8589934592 req.Limit: 0
I0301 10:35:10.049965       1 glog.go:79] Connecting to 192.168.8.8:10001
I0301 10:35:10.114505       1 glog.go:79] node host-3 has requested capacity
I0301 10:35:10.114536       1 glog.go:79] Connecting to 192.168.8.4:10001
I0301 10:35:10.181366       1 glog.go:79] node host-1 has requested capacity
I0301 10:35:10.181404       1 glog.go:79] Connecting to 192.168.8.6:10001
I0301 10:35:10.249569       1 glog.go:79] node host-2 has requested capacity
I0301 10:35:10.249621       1 glog.go:79] Found nodes: [host-3 host-1 host-2]
I0301 10:35:10.249667       1 glog.go:79] CreateVolume: Record new volume as {b03c6245-3c0d-11e9-ba0d-26c6b0ca2305 pvc-b0354387-3c0d-11e9-895f-deadbeef0100 8589934592 1  true fsdax}

I doubt if your cluster is built on e81ec79 . The logs you shared

node host-3 has requested capacity

are not present in this commit. Can you please verify this.

avalluri added 4 commits March 1, 2019 15:18
CSI v0.0.3 is deprecated and do not supported by future Kubernetes releases.
CSI-1.0 is not backward compatible with older(<= 0.0.3) CSI releases.

Kubernetes releases earlier to v1.13+ does not support csi-1.0, so we can not
support older Kubernetes releases.

Made changes to driver deployment files to support csi-1.0 on Kubernetes v1.13:
 - dropped deployment files under Kubernetes-1.{11,12}
 - update sidecar versions to v1.0.1

Signed-off-by: Amarnath Valluri <[email protected]>
These features are needed to make the driver fully topology aware.
external-provisioner reads topology keys from CSINodeInfo objects and uses those
key to prepare CreateVolumeRequest.TopologyRequirement.

Made changes to test cluster setup script to enable these feature gates using
kubeadm configuration file, while setting up the cluster.

Signed-off-by: Amarnath Valluri <[email protected]>
Motivation behind this change is:
- we can make node controller as self contained, i.e, each node can create and
  delete volumes independent of controller.
- clearly divides what parts are handled by master and node, say for example
  nothing related to DeviceManager should be in master code.
- this is an enabler change to implement fully topology aware provisioning by
  dropping workaround solution of creating pmem device on attachment.
These default generic implementations were inherited from deprecated
kubernetes-csi/drivers/csi-common package, which is not needed anymore. Merging
them with their derived implementations. This simplifies the code and leads to
cleaning up lot of unused code.

Left controller server default implementation, as it can be shared by both node
controller and controller servers.

Signed-off-by: Amarnath Valluri <[email protected]>
@avalluri avalluri force-pushed the csi-1.0+cache branch 2 times, most recently from 8db4e39 to c717c66 Compare March 1, 2019 14:18
@pohly
Copy link
Contributor

pohly commented Mar 1, 2019

Indeed, I was one commit before the persistency changes. We need this embedded version number to catch such mistakes faster :-/

I'll continue testing.

@avalluri
Copy link
Contributor Author

avalluri commented Mar 1, 2019

Commit message e81ec79:

the newly created volume to that node by using 'Volume.Topology'. OC has to make

s/OC/The container orchestrator/
Done

You've spelled "Cache" and further down also "Persistent" with capital letter when talking about the model. I think "cache volume" and "persistent volume" are normal English words and not specific names, so normal capitalization seems more appropriate.

s/Cache/cache, s/Persistent/persistent. Done.

deploy/kubernetes-1.13/pmem-pvc.yaml says storageClassName: pmem-csi-sc # defined in pmem-csi.yaml. That should say deploy/kubernetes-1.13/pmem-storageclass.yaml.

Fixed the comment.

avalluri added 2 commits March 2, 2019 01:24
With this change, PMEM volumes are created in CreateVolume() call, irrespective
of persistency model. This means there is no 'delayed volume' creation in
ControllerPublishVolume.

In normal persistent volume case the driver creates a pmem volume on a free node
in the order of 'CreateVolumeRequest.TopologyRequirement.Preferred'. And locks
the newly created volume to that node by using 'Volume.Topology'. The container
orchestrator has to make sure that the application which claims this volume
shall run on this node.

Unlike normal persistent volume, cache volume creates set of pmem volumes on
different number of nodes, each with its own local data, in the order of
'CreateVolumeRequest.TopologyRequirement.Preferred'.  Applications are started
on those nodes and then get to use the volume on their node. Data persists
across application restarts. This is useful when the data is only cached
information that can be discarded and reconstructed at any time and the
application can reuse existing local data when restarting.

Introduced new volume/StorageClass parameters:
- "persistencyModel", which shall set to "cache" for cache volumes and
- "cacheSize" to request no of nodes the created volume can be used.

Updated README with different possible persistency models and Kubernetes usage
for cache and persistent volumes.

Removed node locking for volume tests, as we implemented full topology and
provisioning in cluster supposed to work.
protosanitizer supports suppressing secrets while logging gRPC messages.

Signed-off-by: Amarnath Valluri <[email protected]>
@avalluri avalluri requested a review from pohly March 1, 2019 23:26
Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good enough for merging. I found some more typos in the new documentation, but instead of iterating over this again and again I suggest that we merge and then let a docs write proof-read everything and fix those.

Regarding the code: I tried both examples manually and also ran make test_e2e. That worked. We don't have good tests for aspects of the persistency model, though: for example, accidentally creating a cache volume when the user asked for a normal persistent volume won't get caught. Let's add E2E tests separately. I can look into that.

@okartau if you agree, can you merge and rebase your docs PR?

@avalluri
Copy link
Contributor Author

avalluri commented Mar 4, 2019

Offline, @okartau agreed to merge this change.

@avalluri avalluri merged commit 9cd0dc9 into intel:devel Mar 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants