Skip to content

Conversation

@Lirt
Copy link
Contributor

@Lirt Lirt commented Feb 5, 2025

Fixes: #5125

All information included in linked Issue.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@mergify mergify bot added the component/deployment Helm chart, kubernetes templates and configuration Issues/PRs label Feb 5, 2025
Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, these permissions are also listed in deploy/cephfs/kubernetes/csi-provisioner-rbac.yaml

@nixpanic nixpanic added the ci/skip/multi-arch-build skip building on multiple architectures label Feb 6, 2025
@nixpanic nixpanic requested a review from a team February 6, 2025 08:10
@nixpanic nixpanic added bug Something isn't working backport-to-release-v3.13 Label to backport from devel to release-v3.13 branch labels Feb 6, 2025
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 6, 2025

I think something else is missing here, If these RBACs are missing how is CI passing for the helm charts?

@nixpanic @iPraveenParihar any idea?

@Lirt
Copy link
Contributor Author

Lirt commented Feb 6, 2025

That one is not logging anything useful. I tried to label/unlabel PVC and also recreate it (same for other 3 provisioner pods I have)

kl ceph-csi-fs-ceph-csi-cephfs-provisioner-68c847d56b-ptfk6

I0206 08:58:28.442213       1 utils.go:266] ID: 2407 GRPC call: /csi.v1.Identity/Probe
I0206 08:59:28.500902       1 utils.go:266] ID: 2408 GRPC call: /csi.v1.Identity/Probe
I0206 09:00:28.442390       1 utils.go:266] ID: 2409 GRPC call: /csi.v1.Identity/Probe
I0206 09:01:28.442180       1 utils.go:266] ID: 2410 GRPC call: /csi.v1.Identity/Probe

@iPraveenParihar
Copy link
Contributor

AFAIK, cephfs provisioner doesn't require node resource access. Let me try it on my machine.

@iPraveenParihar
Copy link
Contributor

Using release-v3.13 branch, It worked for me -

$ k get po --show-labels
NAME                                            READY   STATUS    RESTARTS   AGE     LABELS
csi-cephfsplugin-4fnqj                          3/3     Running   0          6m52s   app.kubernetes.io/managed-by=helm,app.kubernetes.io/name=ceph-csi-cephfs,app=ceph-csi-cephfs,chart=ceph-csi-cephfs-3-canary,component=nodeplugin,controller-revision-hash=5b98b59465,heritage=Helm,pod-template-generation=1,release=ceph-csi-cephfs
csi-cephfsplugin-provisioner-6b94b86f4d-cscs9   5/5     Running   0          3m24s   app.kubernetes.io/managed-by=helm,app.kubernetes.io/name=ceph-csi-cephfs,app=ceph-csi-cephfs,chart=ceph-csi-cephfs-3-canary,component=provisioner,heritage=Helm,pod-template-hash=6b94b86f4d,release=ceph-csi-cephfs
csi-rbdplugin-provisioner-74c9864df6-tmf55      7/7     Running   0          3m24s   app.kubernetes.io/managed-by=helm,app.kubernetes.io/name=ceph-csi-rbd,app=ceph-csi-rbd,chart=ceph-csi-rbd-3-canary,component=provisioner,heritage=Helm,pod-template-hash=74c9864df6,release=ceph-csi-rbd
csi-rbdplugin-xtfpt                             3/3     Running   0          6m39s   app.kubernetes.io/managed-by=helm,app.kubernetes.io/name=ceph-csi-rbd,app=ceph-csi-rbd,chart=ceph-csi-rbd-3-canary,component=nodeplugin,controller-revision-hash=765fc779c5,heritage=Helm,pod-template-generation=1,release=ceph-csi-rbd

$ k get pvc
NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    VOLUMEATTRIBUTESCLASS   AGE
csi-cephfs-pvc   Bound    pvc-f6697a91-52e8-4bf2-9f58-6b956e318333   1Gi        RWX            csi-cephfs-sc   <unset>                 3m15s

$ k get clusterrole csi-cephfsplugin-provisioner -oyaml | grep "node"
$

lgtm, these permissions are also listed in deploy/cephfs/kubernetes/csi-provisioner-rbac.yaml

It was added in PR #3460 here. But not sure, why was it added. I don't find any requirement of it 😕.

@nixpanic
Copy link
Member

nixpanic commented Feb 6, 2025

I think something else is missing here, If these RBACs are missing how is CI passing for the helm charts?

@nixpanic @iPraveenParihar any idea?

I wondered about that as well. Possibly minikube does not require RBACs?

@iPraveenParihar
Copy link
Contributor

iPraveenParihar commented Feb 6, 2025

@nixpanic, found this rook/rook#11697 by @Madhu-1.
It seems Node access is required for StorageClasses with volumeBindingMode: WaitForFirstConsumer

verified it

 Warning  ProvisioningFailed    56s (x8 over 2m)      cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-6b94b86f4d-cscs9_8a3521ba-e903-4f81-a048-af8164a4174c  failed to get target node: nodes "dr1" is forbidden: User "system:serviceaccount:test:csi-cephfsplugin-provisioner" cannot get resource "nodes" in API group "" at the cluster scope

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 6, 2025

I think something else is missing here, If these RBACs are missing how is CI passing for the helm charts?

@nixpanic @iPraveenParihar any idea?

I wondered about that as well. Possibly minikube does not require RBACs?

IMO it not related to minikube it could be related to external-provisioner version.

@Lirt what is the external-provisioner version in your cluster? and also can you paste the yaml output of the cephfs deployment?

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 6, 2025

@Mergifyio queue

@mergify
Copy link
Contributor

mergify bot commented Feb 6, 2025

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 72b9d5a

@mergify mergify bot added the ok-to-test Label to trigger E2E tests label Feb 6, 2025
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.30

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 6, 2025

@nixpanic, found this rook/rook#11697 by @Madhu-1. It seems Node access is required for StorageClasses with volumeBindingMode: WaitForFirstConsumer

verified it

 Warning  ProvisioningFailed    56s (x8 over 2m)      cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-6b94b86f4d-cscs9_8a3521ba-e903-4f81-a048-af8164a4174c  failed to get target node: nodes "dr1" is forbidden: User "system:serviceaccount:test:csi-cephfsplugin-provisioner" cannot get resource "nodes" in API group "" at the cluster scope

we need to cover this in our E2E as well :)

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Feb 6, 2025
@Lirt
Copy link
Contributor Author

Lirt commented Feb 6, 2025

I used all default tags from helm chart 3.13.0 (if I don't have mistake in values.yaml). Here is deployment ceph-csi-fs-ceph-csi-cephfs-provisioner. Also I am using WaitForFirstConsumer in SC.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ceph-csi-fs-ceph-csi-cephfs-provisioner
  namespace: storage
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ceph-csi-cephfs
      component: provisioner
      release: ceph-csi-fs
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 50%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: ceph-csi-cephfs
        chart: ceph-csi-cephfs-3.13.0
        component: provisioner
        heritage: Helm
        release: ceph-csi-fs
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - ceph-csi-cephfs
              - key: component
                operator: In
                values:
                - provisioner
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - --nodeid=$(NODE_ID)
        - --type=cephfs
        - --controllerserver=true
        - --pidlimit=-1
        - --endpoint=$(CSI_ENDPOINT)
        - --v=4
        - --drivername=$(DRIVER_NAME)
        - --setmetadata=true
        - --logslowopinterval=30s
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        - name: DRIVER_NAME
          value: cephfs.csi.ceph.com
        - name: NODE_ID
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: CSI_ENDPOINT
          value: unix:///csi/csi-provisioner.sock
        image: artifactory.devops.telekom.de/quay.io/cephcsi/cephcsi:v3.13.0
        imagePullPolicy: IfNotPresent
        name: csi-cephfsplugin
        resources:
          limits:
            cpu: 500m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /csi
          name: socket-dir
        - mountPath: /sys
          name: host-sys
        - mountPath: /lib/modules
          name: lib-modules
          readOnly: true
        - mountPath: /dev
          name: host-dev
        - mountPath: /etc/ceph/
          name: ceph-config
        - mountPath: /etc/ceph-csi-config/
          name: ceph-csi-config
        - mountPath: /tmp/csi/keys
          name: keys-tmp-dir
      - args:
        - --csi-address=$(ADDRESS)
        - --v=1
        - --timeout=60s
        - --leader-election=true
        - --retry-interval-start=500ms
        - --extra-create-metadata=true
        - --feature-gates=HonorPVReclaimPolicy=true
        - --prevent-volume-mode-conversion=true
        env:
        - name: ADDRESS
          value: unix:///csi/csi-provisioner.sock
        image: artifactory.devops.telekom.de/registry.k8s.io/sig-storage/csi-provisioner:v5.0.1
        imagePullPolicy: IfNotPresent
        name: csi-provisioner
        resources:
          limits:
            cpu: 250m
            memory: 128Mi
          requests:
            cpu: 50m
            memory: 64Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /csi
          name: socket-dir
      - args:
        - --csi-address=$(ADDRESS)
        - --v=1
        - --timeout=60s
        - --leader-election=true
        - --extra-create-metadata=true
        - --enable-volume-group-snapshots=false
        env:
        - name: ADDRESS
          value: unix:///csi/csi-provisioner.sock
        image: artifactory.devops.telekom.de/registry.k8s.io/sig-storage/csi-snapshotter:v8.0.1
        imagePullPolicy: IfNotPresent
        name: csi-snapshotter
        resources:
          limits:
            cpu: "1"
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /csi
          name: socket-dir
      - args:
        - --v=1
        - --csi-address=$(ADDRESS)
        - --timeout=60s
        - --leader-election
        - --retry-interval-start=500ms
        - --handle-volume-inuse-error=false
        - --feature-gates=RecoverVolumeExpansionFailure=true
        env:
        - name: ADDRESS
          value: unix:///csi/csi-provisioner.sock
        image: artifactory.devops.telekom.de/registry.k8s.io/sig-storage/csi-resizer:v1.11.1
        imagePullPolicy: IfNotPresent
        name: csi-resizer
        resources:
          limits:
            cpu: 500m
            memory: 256Mi
          requests:
            cpu: 50m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /csi
          name: socket-dir
      - args:
        - --type=liveness
        - --endpoint=$(CSI_ENDPOINT)
        - --metricsport=8080
        - --metricspath=/metrics
        - --polltime=60s
        - --timeout=3s
        env:
        - name: CSI_ENDPOINT
          value: unix:///csi/csi-provisioner.sock
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: artifactory.devops.telekom.de/quay.io/cephcsi/cephcsi:v3.13.0
        imagePullPolicy: IfNotPresent
        name: liveness-prometheus
        ports:
        - containerPort: 8080
          name: metrics
          protocol: TCP
        resources:
          limits:
            cpu: 500m
            memory: 256Mi
          requests:
            cpu: 100m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /csi
          name: socket-dir
      dnsPolicy: ClusterFirst
      nodeSelector:
        node-role.kubernetes.io/control-plane: ""
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: ceph-csi-fs-ceph-csi-cephfs-provisioner
      serviceAccountName: ceph-csi-fs-ceph-csi-cephfs-provisioner
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
        operator: Exists
      volumes:
      - emptyDir:
          medium: Memory
        name: socket-dir
      - hostPath:
          path: /sys
          type: ""
        name: host-sys
      - hostPath:
          path: /lib/modules
          type: ""
        name: lib-modules
      - hostPath:
          path: /dev
          type: ""
        name: host-dev
      - configMap:
          defaultMode: 420
          name: ceph-config-cephfs
        name: ceph-config
      - configMap:
          defaultMode: 420
          name: ceph-csi-config-cephfs
        name: ceph-csi-config
      - emptyDir:
          medium: Memory
        name: keys-tmp-dir

Storage Class:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard-rwx-retain
parameters:
  clusterID: ...
  csi.storage.k8s.io/controller-expand-secret-name: ceph-rwx-pool-01
  csi.storage.k8s.io/controller-expand-secret-namespace: storage-namespace
  csi.storage.k8s.io/node-stage-secret-name: ceph-rwx-pool-01
  csi.storage.k8s.io/node-stage-secret-namespace: storage-namespace
  csi.storage.k8s.io/provisioner-secret-name: ceph-rwx-pool-01
  csi.storage.k8s.io/provisioner-secret-namespace: storage-namespace
  fsName: ...
  pool: ...
provisioner: cephfs.csi.ceph.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

@mergify mergify bot merged commit 72b9d5a into ceph:devel Feb 6, 2025
49 of 50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-release-v3.13 Label to backport from devel to release-v3.13 branch bug Something isn't working ci/skip/multi-arch-build skip building on multiple architectures component/deployment Helm chart, kubernetes templates and configuration Issues/PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing "nodes" RBAC for cephfs provisioner clusterrole (v3.13.0)

5 participants