-
Notifications
You must be signed in to change notification settings - Fork 54
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far I've only looked at the documentation to figure out how this is meant to work.
README.md
Outdated
volume gets created initially, without actually allocating space. Then | ||
when an application starts and it is time to mount that "existing" | ||
volume, an attempt is made to allocate the space required for the | ||
volume locally. Only if there is enough space can application startup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can pmem-csi-sequence-diagram.png be deleted now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, may be Yes. I was thinking to make it use for illustrating the normal persistent volume type, But may not be needed as Kubernetes users will have that knowledge of how normal dynamic PVs work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the current diagram isn't the "normal" PVC, is it? It's the fake PVC with delayed allocation on pod startup.
Do you intend to keep this special mode? I thought we agreed to implement the normal PVC mode (allocate space in CreateVolume
) with low priority, because it isn't really needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was that reusing the old diagram by making needed changes according to normal PVC flow. There is no fake/delayed volume creation anymore.
I would request you please review the PR code to get how really the cache and ephemeral are implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said in my other comment, it is easier to review how this is meant to work by reading the user-visible documentation for it.
It makes sense to update the existing diagram. Right now it still shows that CreateVolume
doesn't actually create a volume. Do we agree after merging this PR, CreateVolume
will end up creating the pmem device on some node?
Amarnath Valluri <[email protected]> writes:
> What you are saying is that users need to rewrite their app deployment logic and can't use controllers like `ReplicaSet`.
"cache" volume are best suits for `ReplicaSet/Deployment` . Pre-create
the cached PVC with no of replicas and ask to use that PVC by
ReplicaSet.
Whether such a ReplicaSet should use "cache" or "ephemeral" volumes
depends on the application. The app might expect empty volumes on
startup and might get confused when it finds old data when using "cache"
volumes. It's also a waste of disk space to keep the data around after
the app has terminated if the data is not going to be used again.
and "ephemeral" volumes are supposed to use by `StatefulSet`, because
StatefulSets supports `volumeClaimTemplate` where each volume tied to
a PVC, the volumes are not shared.
StatefulSet is meant for "stateful" applications, i.e. those which need
to retain data across restarts. I can imagine that also those apps may
need local, ephemeral storage, but the way
https://akomljen.com/kubernetes-persistent-volumes-with-deployment-and-statefulset/
describes it, volumeClaimTemplate was meant to provide each pod with its
own persistent volume.
|
046cadf
to
238618b
Compare
@pohly, updated the document with Usage details, can you check if it helps. |
238618b
to
3a1e63c
Compare
@@ -48,6 +48,10 @@ TEST_PMEM_MEM_SIZE=32768 # 32GB | |||
TEST_PMEM_SHARE=on | |||
TEST_PMEM_LABEL_SIZE=2097152 | |||
|
|||
# Kubernetes feature gates to enable/disable | |||
# featurename=true,feature=false | |||
TEST_FEATURE_GATES="CSINodeInfo=true,CSIDriverRegistry=true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installation of the corresponding CRDs also belongs into the cluster setup. Currently the PR has that as part of the driver installation (like in some older CSI documentation), but that's conceptually the wrong place and was just a workaround for cluster setup tools not knowing about those CRDs yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. Done
3a1e63c
to
05d1a19
Compare
05d1a19
to
02e7c51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message e81ec79:
the newly created volume to that node by using 'Volume.Topology'. OC has to make
s/OC/The container orchestrator/
Let's avoid acronyms, the reader might not know them. It was spelled wrong, too.
Unlike normal persistent volume, Cache volume creates set of pmem volumes on
You've spelled "Cache" and further down also "Persistent" with capital letter when talking about the model. I think "cache volume" and "persistent volume" are normal English words and not specific names, so normal capitalization seems more appropriate.
deploy/kubernetes-1.13/pmem-pvc.yaml
says storageClassName: pmem-csi-sc # defined in pmem-csi.yaml
. That should say deploy/kubernetes-1.13/pmem-storageclass.yaml
.
I just brought up a cluster based on e81ec79 and followed the example (cat deploy/kubernetes-1.13/pmem-pvc.yaml | _work/ssh-clear-kvm kubectl create -f -
). The resulting PV has three entries in its topology information:
$ kubectl get pv/pvc-b0354387-3c0d-11e9-895f-deadbeef0100 -o yaml
...
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- host-3
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- host-1
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- host-2
That doesn't look right. I did make push-images
before rebuilding the cluster, so I should be running the latest driver. However, this is hard to verify because the driver does not identify itself in the log output and has no unique version number. I've filed #183 to track this.
From the logs it looks like the driver treated this volume as a cache volume?
$ kubectl log pmem-csi-controller-0 pmem-driver
...
I0301 10:35:10.049651 1 glog.go:58] GRPC call: /csi.v1.Controller/CreateVolume
I0301 10:35:10.049673 1 glog.go:58] GRPC request: name:"pvc-b0354387-3c0d-11e9-895f-deadbeef0100" capacity_range:<required_bytes:8589934592 > volume_capabilities:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > accessibility_requirements:<requisite:<segments:<key:"kubernetes.io/hostname" value:"host-0" > > requisite:<segments:<key:"kubernetes.io/hostname" value:"host-1" > > requisite:<segments:<key:"kubernetes.io/hostname" value:"host-2" > > requisite:<segments:<key:"kubernetes.io/hostname" value:"host-3" > > preferred:<segments:<key:"kubernetes.io/hostname" value:"host-0" > > preferred:<segments:<key:"kubernetes.io/hostname" value:"host-1" > > preferred:<segments:<key:"kubernetes.io/hostname" value:"host-2" > > preferred:<segments:<key:"kubernetes.io/hostname" value:"host-3" > > >
I0301 10:35:10.049861 1 glog.go:79] CreateVolume: Name: pvc-b0354387-3c0d-11e9-895f-deadbeef0100 req.Required: 8589934592 req.Limit: 0
I0301 10:35:10.049965 1 glog.go:79] Connecting to 192.168.8.8:10001
I0301 10:35:10.114505 1 glog.go:79] node host-3 has requested capacity
I0301 10:35:10.114536 1 glog.go:79] Connecting to 192.168.8.4:10001
I0301 10:35:10.181366 1 glog.go:79] node host-1 has requested capacity
I0301 10:35:10.181404 1 glog.go:79] Connecting to 192.168.8.6:10001
I0301 10:35:10.249569 1 glog.go:79] node host-2 has requested capacity
I0301 10:35:10.249621 1 glog.go:79] Found nodes: [host-3 host-1 host-2]
I0301 10:35:10.249667 1 glog.go:79] CreateVolume: Record new volume as {b03c6245-3c0d-11e9-ba0d-26c6b0ca2305 pvc-b0354387-3c0d-11e9-895f-deadbeef0100 8589934592 1 true fsdax}
Some more log output would be useful to understand what the driver is doing (which mode, which parameters, etc.).
README.md
Outdated
|
||
Below are the volume persistency models considered for implementing by PMEM-CSI to serve different application use cases: | ||
|
||
* Persistent Volumes</br> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resorting to HTML is unnecessary, plain Markdown also works.
Just use two spaces:
* Persistent Volumes
A volume gets created...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip. I fixed all
s
README.md
Outdated
|
||
Kubernetes cluster administrators can expose above mentioned [volume persistency types](#volume-persistency) to applications using [`StorageClass Parameters`](https://kubernetes.io/docs/concepts/storage/storage-classes/#parameters). An optional `persistencyModel` parameter differentiates how the provisioned volume can be used. | ||
|
||
* if no `persistencyModel` parameter specified in `StorageClass` then it is treated as normal Kubernetes persistent volume. In this case PMEM-CSI creates PMEM volume on a node and the application that claims to use this volume is supposed to schedule on this node by Kubernetes. Choosing of node is depend on StorageClass `volumeBindingMode`. In case of `volumeBindingMode: Immediate` PMEM-CSI chooses a node randomly, and in case of `volumeBindingMode: WaitForFirstConsumer` Kubernetes first chooses a node for scheduling the application, and PMEM-CSI creates the volume on that node. Applications which claim a normal persistent volume has to use `ReadOnlyOnce` access mode in its `accessModes` list. This [diagram](/docs/images/sequence/pmem-csi-persistent-sequence-diagram.png) illustrates how a normal persistent volume gets provisioned in Kubernetes using PMEM-CSI driver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/on this node/be scheduled onto this node/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I doubt if your cluster is built on e81ec79 . The logs you shared
are not present in this commit. Can you please verify this. |
CSI v0.0.3 is deprecated and do not supported by future Kubernetes releases. CSI-1.0 is not backward compatible with older(<= 0.0.3) CSI releases. Kubernetes releases earlier to v1.13+ does not support csi-1.0, so we can not support older Kubernetes releases. Made changes to driver deployment files to support csi-1.0 on Kubernetes v1.13: - dropped deployment files under Kubernetes-1.{11,12} - update sidecar versions to v1.0.1 Signed-off-by: Amarnath Valluri <[email protected]>
These features are needed to make the driver fully topology aware. external-provisioner reads topology keys from CSINodeInfo objects and uses those key to prepare CreateVolumeRequest.TopologyRequirement. Made changes to test cluster setup script to enable these feature gates using kubeadm configuration file, while setting up the cluster. Signed-off-by: Amarnath Valluri <[email protected]>
Motivation behind this change is: - we can make node controller as self contained, i.e, each node can create and delete volumes independent of controller. - clearly divides what parts are handled by master and node, say for example nothing related to DeviceManager should be in master code. - this is an enabler change to implement fully topology aware provisioning by dropping workaround solution of creating pmem device on attachment.
These default generic implementations were inherited from deprecated kubernetes-csi/drivers/csi-common package, which is not needed anymore. Merging them with their derived implementations. This simplifies the code and leads to cleaning up lot of unused code. Left controller server default implementation, as it can be shared by both node controller and controller servers. Signed-off-by: Amarnath Valluri <[email protected]>
8db4e39
to
c717c66
Compare
Indeed, I was one commit before the persistency changes. We need this embedded version number to catch such mistakes faster :-/ I'll continue testing. |
s/Cache/cache, s/Persistent/persistent. Done.
Fixed the comment. |
With this change, PMEM volumes are created in CreateVolume() call, irrespective of persistency model. This means there is no 'delayed volume' creation in ControllerPublishVolume. In normal persistent volume case the driver creates a pmem volume on a free node in the order of 'CreateVolumeRequest.TopologyRequirement.Preferred'. And locks the newly created volume to that node by using 'Volume.Topology'. The container orchestrator has to make sure that the application which claims this volume shall run on this node. Unlike normal persistent volume, cache volume creates set of pmem volumes on different number of nodes, each with its own local data, in the order of 'CreateVolumeRequest.TopologyRequirement.Preferred'. Applications are started on those nodes and then get to use the volume on their node. Data persists across application restarts. This is useful when the data is only cached information that can be discarded and reconstructed at any time and the application can reuse existing local data when restarting. Introduced new volume/StorageClass parameters: - "persistencyModel", which shall set to "cache" for cache volumes and - "cacheSize" to request no of nodes the created volume can be used. Updated README with different possible persistency models and Kubernetes usage for cache and persistent volumes. Removed node locking for volume tests, as we implemented full topology and provisioning in cluster supposed to work.
protosanitizer supports suppressing secrets while logging gRPC messages. Signed-off-by: Amarnath Valluri <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good enough for merging. I found some more typos in the new documentation, but instead of iterating over this again and again I suggest that we merge and then let a docs write proof-read everything and fix those.
Regarding the code: I tried both examples manually and also ran make test_e2e
. That worked. We don't have good tests for aspects of the persistency model, though: for example, accidentally creating a cache volume when the user asked for a normal persistent volume won't get caught. Let's add E2E tests separately. I can look into that.
@okartau if you agree, can you merge and rebase your docs PR?
Offline, @okartau agreed to merge this change. |
Currently supported persistency types are:
These changes are over csi v1.0, so might not work on kuberntes < v 1.13. The idea is we should maintain pre csi-1.0 changes in a branch(csi-0.0.3). I will review supported persistency models within the limitations of csi v0.0.3 and submit PR to csi-0-0.3 branch.