Device Cache initialization slowing down csi-node when `--node-name` isn't specified

This PR https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/2141 introduced a new deviceCache that relies on the `--node-name` flag to work. However, nothing prevent the user from not configuring this flag and when this happens, the node pods initialization get delayed significantly because it has to wait for the full backoff:

```
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:24.784762       1 main.go:125] Operating compute environment set to: production and computeEndpoint is set to: <nil>
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:24.785339       1 main.go:134] Sys info: NumCPU: 15 MAXPROC: 1
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:24.785357       1 main.go:139] Driver vendor version v1.21.4-dd.202541
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:24.786456       1 mount_linux.go:316] Cannot create temp dir to detect safe 'not mounted' behavior: mkdir /tmp/kubelet-detect-safe-umount3379456297: read-only file system
...
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:24.790026       1 request.go:1178] Error in request: resource name may not be empty
gcp-pd-csi-driver-node-vbl4c csi-plugin W1014 07:32:24.790071       1 node.go:37] Error getting node : resource name may not be empty, retrying...
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:25.790198       1 request.go:1178] Error in request: resource name may not be empty
gcp-pd-csi-driver-node-vbl4c csi-plugin W1014 07:32:25.790245       1 node.go:37] Error getting node : resource name may not be empty, retrying...
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:27.791374       1 request.go:1178] Error in request: resource name may not be empty
gcp-pd-csi-driver-node-vbl4c csi-plugin W1014 07:32:27.791414       1 node.go:37] Error getting node : resource name may not be empty, retrying...
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:31.793285       1 request.go:1178] Error in request: resource name may not be empty
gcp-pd-csi-driver-node-vbl4c csi-plugin W1014 07:32:31.793325       1 node.go:37] Error getting node : resource name may not be empty, retrying...
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:39.793657       1 request.go:1178] Error in request: resource name may not be empty
gcp-pd-csi-driver-node-vbl4c csi-plugin W1014 07:32:39.793724       1 node.go:37] Error getting node : resource name may not be empty, retrying...
gcp-pd-csi-driver-node-vbl4c csi-plugin E1014 07:32:39.793731       1 node.go:46] Failed to get node  after retries: timed out waiting for the condition
gcp-pd-csi-driver-node-vbl4c csi-plugin W1014 07:32:39.793767       1 main.go:283] Failed to create device cache: failed to get node : timed out waiting for the condition
...
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:39.794316       1 gce-pd-driver.go:187] Driver: pd.csi.storage.gke.io
```

That's 15s lost which could be avoided and that's slowing down significantly the rollout of the driver on large clusters.
I know this can be avoided by configuring the flag but I think the situation could also be improved with a saner default.

I have 2 ideas:
1. check the nodeName isn't an empty string before running [NewDeviceCacheForNode](https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/2141/files#diff-a81f078f88d79f44c944252605b241dc262580acf27c26d395a981c1b9f6241eR284)
2. or if the deviceCache is needed to be running in all cases, then the flag should become mandatory

What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Device Cache initialization slowing down csi-node when `--node-name` isn't specified #2200

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Device Cache initialization slowing down csi-node when --node-name isn't specified #2200

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Device Cache initialization slowing down csi-node when `--node-name` isn't specified #2200