feat(chart): add nodeSelector, tolerations, affinity #141

olevski · 2021-10-25T16:12:31Z

I noticed that there does not seem to be an option to add tolerations, affinities, etc.

This is definitely useful when you have nodes with taints but when you want the daemonSets (for node plugins) to run on all of your nodes. I could not find in the current state of the code any way to do this.

So I followed how affinities/tolerations/nodeSelectors are added when you create a helm chart with the helm CLI (i.e. helm create foo). Unlike the helm CLI template though I did separate things for the DaemonSets and everything else (i.e. controllers/attachers). For example you may want to run the controller on a specific node but the DaemonSet on all nodes (or on more nodes).

Let me know what you think. I am happy to make adjustments.

Signed-off-by: Tasko Olevski [email protected]

Signed-off-by: Tasko Olevski <[email protected]>

olevski · 2021-10-25T19:24:36Z

The build of the ppc64le image errored out. It seems to have just stalled or timed out. I think just a simple rerun should help because I did not touch any of the Docker images with this PR.

YiannisGkoufas · 2021-10-27T08:04:18Z

Thanks @olevski for the PR will have a look and let you know!

leonardonormando · 2021-12-09T12:21:17Z

About NodeAffinity, i've got the following error when trying to mount a volume in a pod with Affinity and/or Tolerations set:

MountVolume.MountDevice failed for volume "pvc-XXX" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name ch.ctrox.csi.s3-driver not found in the list of registered CSI drivers

I tried setting the nodeAffinity in the Persistent Volume associated with the dataset, but not luck.

If I clear the affinity and tolerations from the pod, the volume is mounted correctly.

olevski · 2021-12-09T15:29:42Z

About NodeAffinity, i've got the following error when trying to mount a volume in a pod with Affinity and/or Tolerations set:

MountVolume.MountDevice failed for volume "pvc-XXX" : kubernetes.io/csi: attacher.MountDevice failed to create newCsiDriverClient: driver name ch.ctrox.csi.s3-driver not found in the list of registered CSI drivers

I tried setting the nodeAffinity in the Persistent Volume associated with the dataset, but not luck.

If I clear the affinity and tolerations from the pod, the volume is mounted correctly.

@leonardonormando I am not sure I fully follow what you did.

This PR adds NodeAffinity for several different components. For some of them (like the plugins) this is probably not very useful. The plugins are daemon sets and they run on all nodes so setting the NodeAffinity there is a bit redundant. But setting the toleration there is useful - especially if you have nodes that have specific taints but you would like to use datashim on the pods that run on those nodes.

Can you provide the values.yaml file you used to deploy datashim so we are clear which affinities and tolerations you set?

I tried setting the nodeAffinity in the Persistent Volume associated with the dataset, but not luck.

Huh I also did now even know you could set a NodeAffinity on a PV. (https://kubernetes.io/docs/concepts/storage/persistent-volumes/#node-affinity) But it turns out that indeed you can. This PR does not assign affinities on the PVs. Only on the deployments/daemonsets/statefulsets. According to the docs when you set a NodeAffinity on a volume then it limits which nodes can access said volume. So if you specified a nodeAffinity for nodes A on the PV but the pod is running on some other node B, then it makes sense why you could not access it.

If you share you values file and the manifests for the pod and the dataset you are trying to run it will be helpful. Otherwise without more info I can just guess what the problem is.

leonardonormando · 2021-12-09T18:47:39Z

Sorry! I should have been more clear about the setup:

I did not tested the helm chart (yet) and used the manifest in the repository as is (dlf.yaml) to deploy Datashim.

I know this PR is about setting nodeAffinity via helm chart, but I thought it will be beneficial to bring this discussion over here.

If I use these manifests for dataset and pod creation it works as expected:

apiVersion: com.ie.ibm.hpsys/v1alpha1
kind: Dataset
metadata:
  namespace: sandbox
  name: dataset-sandbox
spec:
  local:
    type: 'COS'
    accessKeyID: 'XXX'
    secretAccessKey: 'XXX'
    endpoint: 'http://minio.minio.svc.cluster.local:9000'
    bucket: 'sandbox-bucket'

---
apiVersion: v1
kind: Pod
metadata:
  name: test-bucket-mount
  namespace: sandbox
spec:
  volumes:
    - name: test-mount
      persistentVolumeClaim:
        claimName: dataset-sandbox
  containers:
    - name: test
      image: ubuntu:20.04
      command: ['/bin/sleep', '3650d']
      volumeMounts:
        - mountPath: '/test'
          name: test-mount

However, if I set any Affinity or Toleration to the pod, the volume can't be mounted with error reported above.

Am I missing something?

Thanks!

olevski · 2021-12-14T09:51:36Z

However, if I set any Affinity or Toleration to the pod, the volume can't be mounted with error reported above.

@leonardonormando to me what you describe makes sense. I am pretty sure that by setting the affinity and toleration on the pod that is using the dataset makes it get scheduled on a node where the daemonset (that is supposed to get scheduled on all nodes) is actually not running. And because the pod from the daemonset is not running on this node things dont work.

When you remove the affinity and toleration from your test pod then it gets scheduled on a node where the datashim daemonset pods are scheduled and things work as expected.

This PR aims to fix exactly this problem by adding the ability to assign a toleration on the daemonsets so they truly run on all nodes.

You can use kustomize on the rendered manifests you deploy to add a toleration so that they scheduled on all nodes. And then things should work as expected.

olevski · 2021-12-14T09:56:39Z

The daemonset that I am referring to is here: https://github.com/datashim-io/datashim/blob/master/release-tools/manifests/dlf.yaml#L1059. This is where you should add the same toleration you are adding in your pod that is trying to use the dataset.

leonardonormando · 2021-12-15T20:13:00Z

Thank you, very much, @olevski for the detailed explanation. Indeed that solved the problem.

As soon as I can, I will also test the helm for this PR

Thanks @YiannisGkoufas and @olevski for such great work!

srikumar003 · 2022-01-18T13:09:23Z

Hi @olevski thanks for the PR and apologies for taking so long to get around to it. The ability to add tolerations is very useful. I also like how the values are distributed over the different components.

However, I have a few issues:

The nodeSelector is useful for the operator and generate-keys. In case of the CSI-* plugins, it excludes daemonSets from some nodes. This could lead to the situation demonstrated by @leonardonormando where a pod with a dataset label will be scheduled to a node without the required daemonSets. I suggest we remove nodeSelector from the charts (except for dataset-operator)

However, if the controller were aware that some nodes are not running the required csi-* plugins, then it could add nodeSelector to the dataset/PV. We could then use the webhook to mutate the Pods with nodeSelector. This could be a potentially useful feature, though I cannot now imagine a scenario for it. Please add an issue if you think this is useful.

As you mentioned, affinities are redundant for daemonSets. Let's remove them
The attachers and provisioners of the plugins have separate blocks for taints. Could lead to problems with deployment configuration. I'll mention on the reviews of the files.

If you can modify and resubmit, I'll review and accept the PR. Thanks again! (@leonardonormando thank you for the comments and please do let us know if you have tested the helm chart)

src/csi-driver-nfs/chart/templates/csi-attacher-nfsplugin.yaml

src/csi-driver-nfs/chart/values.yaml

src/csi-driver-nfs/chart/templates/csi-nodeplugin-nfsplugin.yaml

src/csi-h3/chart/templates/csi-controller-h3.yaml

src/csi-h3/chart/templates/csi-nodeplugin-h3.yaml

src/dataset-operator/chart/values.yaml

src/csi-s3/chart/values.yaml

src/csi-s3/chart/templates/provisioner.yaml

src/csi-s3/chart/templates/csi-s3.yaml

src/csi-s3/chart/templates/attacher.yaml

olevski · 2022-01-18T16:20:24Z

@srikumar003 thanks for the review. I am making the changes rn.

On a slightly unrelated note is it possible to have a quick chat with you and/or whoever else is maintaining this project? We are very interested in using this over at https://renkulab.io/ but we are a bit unclear on what is the long term goal/plan for datashim.io. Can we touch base via email with this regard. My email is [email protected].

Signed-off-by: Tasko Olevski <[email protected]>

srikumar003

Tested on minikube as follows and it works!

$ minikube start --nodes 2 -p multinode-demo

$ kubectl get nodes
NAME                 STATUS   ROLES                  AGE    VERSION
multinode-demo       Ready    control-plane,master   8m1s   v1.22.3
multinode-demo-m02   Ready    <none>                 7m4s   v1.22.3

$ kubectl taint node multinode-demo node-role.kubernetes.io/master:NoSchedule

$ make deployment

$ kubectl get pods -n dlf -o wide
NAME                                READY   STATUS             RESTARTS      AGE     IP             NODE                 NOMINATED NODE   READINESS GATES
csi-attacher-nfsplugin-0            2/2     Running            0             3m15s   10.244.1.5     multinode-demo-m02   <none>           <none>
csi-attacher-s3-0                   1/1     Running            0             3m15s   10.244.1.7     multinode-demo-m02   <none>           <none>
csi-controller-h3-0                 2/3     CrashLoopBackOff   1 (11s ago)   3m15s   10.244.1.6     multinode-demo-m02   <none>           <none>
csi-nodeplugin-h3-9gz8t             2/2     Running            0             3m15s   192.168.58.3   multinode-demo-m02   <none>           <none>
csi-nodeplugin-nfsplugin-m7s55      2/2     Running            0             3m15s   192.168.58.3   multinode-demo-m02   <none>           <none>
csi-provisioner-s3-0                1/1     Running            0             3m15s   10.244.1.8     multinode-demo-m02   <none>           <none>
csi-s3-6mztz                        2/2     Running            0             3m15s   10.244.1.3     multinode-demo-m02   <none>           <none>
dataset-operator-644f75fcbb-qz9nb   1/1     Running            0             3m15s   10.244.1.4     multinode-demo-m02   <none>           <none>

$ cat chart/charts/dataset-operator-chart/values.yaml
…
nodeSelector:
  node-role.kubernetes.io/master: ''
tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
affinity: {}

$ cat chart/charts/csi-s3-chart/values.yaml 
…
tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master

$ make manifests

$ make deployment

$  kubectl get pods -n dlf -o wide
NAME                                READY   STATUS              RESTARTS   AGE    IP             NODE                 NOMINATED NODE   READINESS GATES
csi-attacher-nfsplugin-0            2/2     Running             0          4s     10.244.1.16    multinode-demo-m02   <none>           <none>
csi-attacher-s3-0                   0/1     ContainerCreating   0          4s     <none>         multinode-demo       <none>           <none>
csi-controller-h3-0                 2/2     Running             0          4s     10.244.1.17    multinode-demo-m02   <none>           <none>
csi-nodeplugin-h3-g8l7q             2/2     Running             0          104s   192.168.58.3   multinode-demo-m02   <none>           <none>
csi-nodeplugin-nfsplugin-29r4r      2/2     Running             0          104s   192.168.58.3   multinode-demo-m02   <none>           <none>
csi-provisioner-s3-0                0/1     ContainerCreating   0          4s     <none>         multinode-demo       <none>           <none>
csi-s3-4zdxj                        2/2     Running             0          103s   10.244.1.15    multinode-demo-m02   <none>           <none>
csi-s3-zmjnv                        2/2     Running             0          103s   10.244.0.3     multinode-demo       <none>           <none>
dataset-operator-7654bd6c44-h8cd7   0/1     Init:0/1            0          4s     <none>         multinode-demo       <none>           <none>

* feat(chart): add nodeSelector, tolerations, affinity Signed-off-by: Tasko Olevski <[email protected]> * squashme: address comments Signed-off-by: Tasko Olevski <[email protected]>

feat(chart): add nodeSelector, tolerations, affinity

d33940a

Signed-off-by: Tasko Olevski <[email protected]>

srikumar003 requested changes Jan 18, 2022

View reviewed changes

squashme: address comments

f7d2b4a

Signed-off-by: Tasko Olevski <[email protected]>

olevski force-pushed the add-affinities-tollerations branch from 669e10a to f7d2b4a Compare January 18, 2022 16:43

olevski requested a review from srikumar003 January 18, 2022 16:44

srikumar003 approved these changes Jan 19, 2022

View reviewed changes

srikumar003 merged commit c2f4492 into datashim-io:master Jan 19, 2022

feat(chart): add nodeSelector, tolerations, affinity #141

feat(chart): add nodeSelector, tolerations, affinity #141

Uh oh!

Conversation

olevski commented Oct 25, 2021

Uh oh!

olevski commented Oct 25, 2021

Uh oh!

YiannisGkoufas commented Oct 27, 2021

Uh oh!

leonardonormando commented Dec 9, 2021

Uh oh!

olevski commented Dec 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leonardonormando commented Dec 9, 2021

Uh oh!

olevski commented Dec 14, 2021

Uh oh!

olevski commented Dec 14, 2021

Uh oh!

leonardonormando commented Dec 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srikumar003 commented Jan 18, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

olevski commented Jan 18, 2022

Uh oh!

srikumar003 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

olevski commented Dec 9, 2021 •

edited

Loading

leonardonormando commented Dec 15, 2021 •

edited

Loading