Add daemon set as a way to deploy device plugin. #77

michalad1 · 2025-05-08T06:43:29Z

Based on docs:
https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/advanced-install.md#install-to-all-nodes

Added possibility to deploy device plugin as a daemon set without NFD and operator.

mythi · 2025-05-08T06:56:40Z

Added possibility to deploy device plugin as a daemon set without NFD and operator.

"without NFD" is already possible. we need a mechanism that covers all plugins and is low maintenance. can you explain what problems the existing setup has?

michalad1 · 2025-05-08T07:08:01Z

Added possibility to deploy device plugin as a daemon set without NFD and operator.

"without NFD" is already possible. we need a mechanism that covers all plugins and is low maintenance. can you explain what problems the existing setup has?

afaik we need to install operator in order to have it working, but like in mentioned docs it is possible to just deploy daemonset.
in my case we already have a lot of pods and applications so I don't want to install additional applications.

Note: I created this PR as installation of apps in our cluster is done via helm charts so instead of manual deployment of daemon set I wanted to use official helm chart to do the same thing.

tkatila

Please also add a note to README about this alternative install method.

tkatila · 2025-05-08T08:09:02Z

charts/gpu-device-plugin/values.yaml

@@ -21,3 +21,10 @@ nodeSelector:
 tolerations: []

 nodeFeatureRule: true
+
+# to preserve backward compatibility
+operator: true


Suggested change

operator: true

deployWithoutOperator: false

tkatila · 2025-05-08T08:09:14Z

charts/gpu-device-plugin/values.yaml

+# to deploy the device plugin as a DaemonSet
+daemonSet:
+  enabled: false


Remove these.

tkatila · 2025-05-08T08:09:51Z

charts/gpu-device-plugin/templates/daemonset.yaml

@@ -0,0 +1,79 @@
+{{- if .Values.daemonSet.enabled }}


Use deployWithoutOperator

tkatila · 2025-05-08T08:10:30Z

charts/gpu-device-plugin/templates/gpu.yaml

@@ -2,7 +2,7 @@
 based on
 deployments/operator/samples/deviceplugin_v1_gpudeviceplugin.yaml
 */}}
-
+{{- if .Values.operator }}


Use ! deployWithoutOperator

tkatila · 2025-05-08T08:11:41Z

charts/gpu-device-plugin/templates/daemonset.yaml

+          path: /var/run/cdi
+          type: DirectoryOrCreate
+      nodeSelector:
+        kubernetes.io/arch: amd64


This should use nodeSelector from values.

okay, but default value is:
intel.feature.node.kubernetes.io/gpu: 'true'

and I could not find a way to replace this selector with different one.

tkatila · 2025-05-08T08:13:06Z

charts/gpu-device-plugin/templates/daemonset.yaml

+    metadata:
+      labels:
+        app: intel-gpu-plugin
+    spec:


I'd like to have the tolerations defined here as well.

tkatila · 2025-05-08T08:17:22Z

charts/gpu-device-plugin/templates/daemonset.yaml

+          - name: HOST_IP
+            valueFrom:
+              fieldRef:
+                fieldPath: status.hostIP


You can drop this. It's only used with resourceManager which is being EoL'd.

tkatila · 2025-05-08T08:22:15Z

charts/gpu-device-plugin/templates/daemonset.yaml

+    spec:
+      containers:
+      - name: intel-gpu-plugin
+        env:


Please also add support to configure GPU plugin with its different modes: sharedDevNum, enableMonitoring, allocationPolicy and logLevel. No need for the "resourceManager" as it's being EoL'd.

mythi · 2025-05-09T05:30:56Z

Added possibility to deploy device plugin as a daemon set without NFD and operator.

"without NFD" is already possible. we need a mechanism that covers all plugins and is low maintenance. can you explain what problems the existing setup has?

afaik we need to install operator in order to have it working, but like in mentioned docs it is possible to just deploy daemonset. in my case we already have a lot of pods and applications so I don't want to install additional applications.

Note: I created this PR as installation of apps in our cluster is done via helm charts so instead of manual deployment of daemon set I wanted to use official helm chart to do the same thing.

I don't see there's enough justification to accept the maintenance burden especially in this repo that is decoupled from the original reference YAML we have and as long as its GPU-only.

Add daemon set as a way to deploy device plugin.

1828548

michalad1 requested review from mythi and poussa as code owners May 8, 2025 06:43

tkatila suggested changes May 8, 2025

View reviewed changes

tkatila reviewed May 8, 2025

View reviewed changes

comments

06c3345

michalad1 requested a review from tkatila May 8, 2025 10:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add daemon set as a way to deploy device plugin. #77

Add daemon set as a way to deploy device plugin. #77

Uh oh!

michalad1 commented May 8, 2025

Uh oh!

mythi commented May 8, 2025

Uh oh!

michalad1 commented May 8, 2025 •

edited

Loading

Uh oh!

tkatila left a comment

Uh oh!

tkatila May 8, 2025

Uh oh!

tkatila May 8, 2025

Uh oh!

tkatila May 8, 2025

Uh oh!

tkatila May 8, 2025

Uh oh!

tkatila May 8, 2025

Uh oh!

michalad1 May 8, 2025 •

edited

Loading

Uh oh!

tkatila May 8, 2025

Uh oh!

tkatila May 8, 2025

Uh oh!

tkatila May 8, 2025

Uh oh!

mythi commented May 9, 2025

Uh oh!

Uh oh!

Add daemon set as a way to deploy device plugin. #77

Are you sure you want to change the base?

Add daemon set as a way to deploy device plugin. #77

Uh oh!

Conversation

michalad1 commented May 8, 2025

Uh oh!

mythi commented May 8, 2025

Uh oh!

michalad1 commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tkatila left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michalad1 May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mythi commented May 9, 2025

Uh oh!

Uh oh!

michalad1 commented May 8, 2025 •

edited

Loading

michalad1 May 8, 2025 •

edited

Loading