-
Notifications
You must be signed in to change notification settings - Fork 74
feat(chart): add nodeSelector, tolerations, affinity #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(chart): add nodeSelector, tolerations, affinity #141
Conversation
Signed-off-by: Tasko Olevski <[email protected]>
|
The build of the ppc64le image errored out. It seems to have just stalled or timed out. I think just a simple rerun should help because I did not touch any of the Docker images with this PR. |
|
Thanks @olevski for the PR will have a look and let you know! |
|
About NodeAffinity, i've got the following error when trying to mount a volume in a pod with Affinity and/or Tolerations set:
I tried setting the nodeAffinity in the Persistent Volume associated with the If I clear the |
@leonardonormando I am not sure I fully follow what you did. This PR adds NodeAffinity for several different components. For some of them (like the plugins) this is probably not very useful. The plugins are daemon sets and they run on all nodes so setting the NodeAffinity there is a bit redundant. But setting the toleration there is useful - especially if you have nodes that have specific taints but you would like to use datashim on the pods that run on those nodes. Can you provide the
Huh I also did now even know you could set a NodeAffinity on a PV. (https://kubernetes.io/docs/concepts/storage/persistent-volumes/#node-affinity) But it turns out that indeed you can. This PR does not assign affinities on the PVs. Only on the deployments/daemonsets/statefulsets. According to the docs when you set a NodeAffinity on a volume then it limits which nodes can access said volume. So if you specified a nodeAffinity for nodes If you share you values file and the manifests for the pod and the dataset you are trying to run it will be helpful. Otherwise without more info I can just guess what the problem is. |
|
Sorry! I should have been more clear about the setup: I did not tested the helm chart (yet) and used the manifest in the repository as is (dlf.yaml) to deploy Datashim. I know this PR is about setting nodeAffinity via helm chart, but I thought it will be beneficial to bring this discussion over here. If I use these manifests for dataset and pod creation it works as expected: apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
namespace: sandbox
name: dataset-sandbox
spec:
local:
type: 'COS'
accessKeyID: 'XXX'
secretAccessKey: 'XXX'
endpoint: 'http://minio.minio.svc.cluster.local:9000'
bucket: 'sandbox-bucket'
---
apiVersion: v1
kind: Pod
metadata:
name: test-bucket-mount
namespace: sandbox
spec:
volumes:
- name: test-mount
persistentVolumeClaim:
claimName: dataset-sandbox
containers:
- name: test
image: ubuntu:20.04
command: ['/bin/sleep', '3650d']
volumeMounts:
- mountPath: '/test'
name: test-mountHowever, if I set any Affinity or Toleration to the pod, the volume can't be mounted with error reported above. Am I missing something? Thanks! |
@leonardonormando to me what you describe makes sense. I am pretty sure that by setting the affinity and toleration on the pod that is using the dataset makes it get scheduled on a node where the daemonset (that is supposed to get scheduled on all nodes) is actually not running. And because the pod from the daemonset is not running on this node things dont work. When you remove the affinity and toleration from your test pod then it gets scheduled on a node where the datashim daemonset pods are scheduled and things work as expected. This PR aims to fix exactly this problem by adding the ability to assign a toleration on the daemonsets so they truly run on all nodes. You can use kustomize on the rendered manifests you deploy to add a toleration so that they scheduled on all nodes. And then things should work as expected. |
|
The daemonset that I am referring to is here: https://github.com/datashim-io/datashim/blob/master/release-tools/manifests/dlf.yaml#L1059. This is where you should add the same toleration you are adding in your pod that is trying to use the dataset. |
|
Thank you, very much, @olevski for the detailed explanation. Indeed that solved the problem. As soon as I can, I will also test the helm for this PR Thanks @YiannisGkoufas and @olevski for such great work! |
|
Hi @olevski thanks for the PR and apologies for taking so long to get around to it. The ability to add tolerations is very useful. I also like how the values are distributed over the different components. However, I have a few issues:
However, if the controller were aware that some nodes are not running the required csi-* plugins, then it could add nodeSelector to the dataset/PV. We could then use the webhook to mutate the Pods with nodeSelector. This could be a potentially useful feature, though I cannot now imagine a scenario for it. Please add an issue if you think this is useful.
If you can modify and resubmit, I'll review and accept the PR. Thanks again! (@leonardonormando thank you for the comments and please do let us know if you have tested the helm chart) |
src/csi-driver-nfs/chart/templates/csi-nodeplugin-nfsplugin.yaml
Outdated
Show resolved
Hide resolved
|
@srikumar003 thanks for the review. I am making the changes rn. On a slightly unrelated note is it possible to have a quick chat with you and/or whoever else is maintaining this project? We are very interested in using this over at https://renkulab.io/ but we are a bit unclear on what is the long term goal/plan for datashim.io. Can we touch base via email with this regard. My email is [email protected]. |
Signed-off-by: Tasko Olevski <[email protected]>
669e10a to
f7d2b4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on minikube as follows and it works!
$ minikube start --nodes 2 -p multinode-demo
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
multinode-demo Ready control-plane,master 8m1s v1.22.3
multinode-demo-m02 Ready <none> 7m4s v1.22.3
$ kubectl taint node multinode-demo node-role.kubernetes.io/master:NoSchedule
$ make deployment
$ kubectl get pods -n dlf -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-attacher-nfsplugin-0 2/2 Running 0 3m15s 10.244.1.5 multinode-demo-m02 <none> <none>
csi-attacher-s3-0 1/1 Running 0 3m15s 10.244.1.7 multinode-demo-m02 <none> <none>
csi-controller-h3-0 2/3 CrashLoopBackOff 1 (11s ago) 3m15s 10.244.1.6 multinode-demo-m02 <none> <none>
csi-nodeplugin-h3-9gz8t 2/2 Running 0 3m15s 192.168.58.3 multinode-demo-m02 <none> <none>
csi-nodeplugin-nfsplugin-m7s55 2/2 Running 0 3m15s 192.168.58.3 multinode-demo-m02 <none> <none>
csi-provisioner-s3-0 1/1 Running 0 3m15s 10.244.1.8 multinode-demo-m02 <none> <none>
csi-s3-6mztz 2/2 Running 0 3m15s 10.244.1.3 multinode-demo-m02 <none> <none>
dataset-operator-644f75fcbb-qz9nb 1/1 Running 0 3m15s 10.244.1.4 multinode-demo-m02 <none> <none>
$ cat chart/charts/dataset-operator-chart/values.yaml
…
nodeSelector:
node-role.kubernetes.io/master: ''
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
affinity: {}
$ cat chart/charts/csi-s3-chart/values.yaml
…
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
$ make manifests
$ make deployment
$ kubectl get pods -n dlf -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-attacher-nfsplugin-0 2/2 Running 0 4s 10.244.1.16 multinode-demo-m02 <none> <none>
csi-attacher-s3-0 0/1 ContainerCreating 0 4s <none> multinode-demo <none> <none>
csi-controller-h3-0 2/2 Running 0 4s 10.244.1.17 multinode-demo-m02 <none> <none>
csi-nodeplugin-h3-g8l7q 2/2 Running 0 104s 192.168.58.3 multinode-demo-m02 <none> <none>
csi-nodeplugin-nfsplugin-29r4r 2/2 Running 0 104s 192.168.58.3 multinode-demo-m02 <none> <none>
csi-provisioner-s3-0 0/1 ContainerCreating 0 4s <none> multinode-demo <none> <none>
csi-s3-4zdxj 2/2 Running 0 103s 10.244.1.15 multinode-demo-m02 <none> <none>
csi-s3-zmjnv 2/2 Running 0 103s 10.244.0.3 multinode-demo <none> <none>
dataset-operator-7654bd6c44-h8cd7 0/1 Init:0/1 0 4s <none> multinode-demo <none> <none>
* feat(chart): add nodeSelector, tolerations, affinity Signed-off-by: Tasko Olevski <[email protected]> * squashme: address comments Signed-off-by: Tasko Olevski <[email protected]>
I noticed that there does not seem to be an option to add tolerations, affinities, etc.
This is definitely useful when you have nodes with taints but when you want the daemonSets (for node plugins) to run on all of your nodes. I could not find in the current state of the code any way to do this.
So I followed how affinities/tolerations/nodeSelectors are added when you create a helm chart with the helm CLI (i.e.
helm create foo). Unlike the helm CLI template though I did separate things for the DaemonSets and everything else (i.e. controllers/attachers). For example you may want to run the controller on a specific node but the DaemonSet on all nodes (or on more nodes).Let me know what you think. I am happy to make adjustments.
Signed-off-by: Tasko Olevski [email protected]