-
Notifications
You must be signed in to change notification settings - Fork 168
Open
Description
This PR #2141 introduced a new deviceCache that relies on the --node-name
flag to work. However, nothing prevent the user from not configuring this flag and when this happens, the node pods initialization get delayed significantly because it has to wait for the full backoff:
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:24.784762 1 main.go:125] Operating compute environment set to: production and computeEndpoint is set to: <nil>
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:24.785339 1 main.go:134] Sys info: NumCPU: 15 MAXPROC: 1
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:24.785357 1 main.go:139] Driver vendor version v1.21.4-dd.202541
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:24.786456 1 mount_linux.go:316] Cannot create temp dir to detect safe 'not mounted' behavior: mkdir /tmp/kubelet-detect-safe-umount3379456297: read-only file system
...
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:24.790026 1 request.go:1178] Error in request: resource name may not be empty
gcp-pd-csi-driver-node-vbl4c csi-plugin W1014 07:32:24.790071 1 node.go:37] Error getting node : resource name may not be empty, retrying...
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:25.790198 1 request.go:1178] Error in request: resource name may not be empty
gcp-pd-csi-driver-node-vbl4c csi-plugin W1014 07:32:25.790245 1 node.go:37] Error getting node : resource name may not be empty, retrying...
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:27.791374 1 request.go:1178] Error in request: resource name may not be empty
gcp-pd-csi-driver-node-vbl4c csi-plugin W1014 07:32:27.791414 1 node.go:37] Error getting node : resource name may not be empty, retrying...
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:31.793285 1 request.go:1178] Error in request: resource name may not be empty
gcp-pd-csi-driver-node-vbl4c csi-plugin W1014 07:32:31.793325 1 node.go:37] Error getting node : resource name may not be empty, retrying...
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:39.793657 1 request.go:1178] Error in request: resource name may not be empty
gcp-pd-csi-driver-node-vbl4c csi-plugin W1014 07:32:39.793724 1 node.go:37] Error getting node : resource name may not be empty, retrying...
gcp-pd-csi-driver-node-vbl4c csi-plugin E1014 07:32:39.793731 1 node.go:46] Failed to get node after retries: timed out waiting for the condition
gcp-pd-csi-driver-node-vbl4c csi-plugin W1014 07:32:39.793767 1 main.go:283] Failed to create device cache: failed to get node : timed out waiting for the condition
...
gcp-pd-csi-driver-node-vbl4c csi-plugin I1014 07:32:39.794316 1 gce-pd-driver.go:187] Driver: pd.csi.storage.gke.io
That's 15s lost which could be avoided and that's slowing down significantly the rollout of the driver on large clusters.
I know this can be avoided by configuring the flag but I think the situation could also be improved with a saner default.
I have 2 ideas:
- check the nodeName isn't an empty string before running NewDeviceCacheForNode
- or if the deviceCache is needed to be running in all cases, then the flag should become mandatory
What do you think?
Metadata
Metadata
Assignees
Labels
No labels