Skip to content

feat: add Helm chart for driver deployment#83

Merged
k8s-ci-robot merged 7 commits into
kubernetes-sigs:mainfrom
Nordix:devel/helm-charts
May 12, 2026
Merged

feat: add Helm chart for driver deployment#83
k8s-ci-robot merged 7 commits into
kubernetes-sigs:mainfrom
Nordix:devel/helm-charts

Conversation

@fmuyassarov

@fmuyassarov fmuyassarov commented Mar 11, 2026

Copy link
Copy Markdown
Member

Add a Helm chart for driver installation. This PR adds:

  • Helm chart for driver installation
  • Documentation to describe installation and available values
  • Linter CI for the charts & schema validation

Follow-up (TODO)

  • chart packaging and publishing to ghcr.io/kubernetes-sigs/dra-driver-cpu/charts/dra-driver-cpu
  • versioned releases from tags, 0.0.0-main from main branch

Note: currently all the templates (DeamonSet, ServiceAccount, etc are based on the what is available in install.yaml).

Fixes: #72

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 11, 2026
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 11, 2026
@fmuyassarov fmuyassarov force-pushed the devel/helm-charts branch 2 times, most recently from 9598847 to 3ba52b2 Compare March 12, 2026 16:47
@fmuyassarov

Copy link
Copy Markdown
Member Author

I’ll keep this PR as a draft until we’re ready to land it (post 0.1.0?). In the meantime, please feel free to take a look and share any thoughts.
/cc @ffromani @pravk03

@AutuSnow

Copy link
Copy Markdown
Contributor

@fmuyassarov Can you add the configurations for livenessProbe and readinessProbe

@fmuyassarov

Copy link
Copy Markdown
Member Author

livenessProbe

Yes sure.

@fmuyassarov

Copy link
Copy Markdown
Member Author

@fmuyassarov Can you add the configurations for livenessProbe and readinessProbe

@AutuSnow added #84 for the install.yaml and soon will add here as well.

@AutuSnow

Copy link
Copy Markdown
Contributor

@fmuyassarov Can you add the configurations for livenessProbe and readinessProbe

@AutuSnow added #84 for the install.yaml and soon will add here as well.

Thanks !!

@fmuyassarov fmuyassarov force-pushed the devel/helm-charts branch 2 times, most recently from d6efb7b to cbb81c8 Compare March 22, 2026 10:57
@fmuyassarov

fmuyassarov commented Mar 22, 2026

Copy link
Copy Markdown
Member Author

Similar health probes as in #84 are added here to the chart.

@fmuyassarov fmuyassarov marked this pull request as ready for review March 22, 2026 10:59
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 22, 2026
@ffromani

Copy link
Copy Markdown
Contributor

will review again shortly, thanks for the patience

@fmuyassarov

fmuyassarov commented Apr 13, 2026

Copy link
Copy Markdown
Member Author

Thanks. But I would ask don't do it yet, because I'm about to add few more improvements in an hour or two. Will ping you once ready.

@fmuyassarov

fmuyassarov commented Apr 13, 2026

Copy link
Copy Markdown
Member Author

Thanks. But I would ask don't do it yet, because I'm about to add few more improvements in an hour or two. Will ping you once ready.

@ffromani
This should be ready now. Added few more changes I had in mind.

Comment thread deployment/helm/dra-driver-cpu/Chart.yaml
@fmuyassarov

Copy link
Copy Markdown
Member Author

/test pull-dra-driver-cpu-e2e-device-mode-grouped-arm64

@fmuyassarov

Copy link
Copy Markdown
Member Author

/test pull-dra-driver-cpu-e2e-device-mode-grouped-arm64
/test pull-dra-driver-cpu-e2e-device-mode-individual-arm64

@fmuyassarov

Copy link
Copy Markdown
Member Author

I have already run the project to the production environment and it is very important. I do want to make a compromise, but I feel that the above migration plan did not meet expectations. Let it go, that's it

Thanks for sharing your concerns. If the migration plan didn’t meet your expectations, I’d really appreciate hearing more about what you think could be improved and more importantly how. Given the current state of the project (still in v0.x.x), we do expect some level of iteration and change. This PR isn’t intended to introduce anything major, but rather to make the driver installation process smoother than it is today.
That said, I’m open to suggestions and happy to improve things further. But as said, let's take smaller steps at a time.

@AutuSnow

Copy link
Copy Markdown
Contributor

I have already run the project to the production environment and it is very important. I do want to make a compromise, but I feel that the above migration plan did not meet expectations. Let it go, that's it

Thanks for sharing your concerns. If the migration plan didn’t meet your expectations, I’d really appreciate hearing more about what you think could be improved and more importantly how. Given the current state of the project (still in v0.x.x), we do expect some level of iteration and change. This PR isn’t intended to introduce anything major, but rather to make the driver installation process smoother than it is today. That said, I’m open to suggestions and happy to improve things further. But as said, let's take smaller steps at a time.

Thank you for your reply. Let me clarify - what I am concerned about is not the Helm chart itself, but the migration path from the existing manifests/install.yaml to Helm.
Gap:
The user currently running VNet application-f manifest/install.yaml does not have a clear Helm upgrade path. PR did not address the following issues:

  1. Coexistence: Can Helm managed resources coexist with existing VNet application resources? (The same resource name will result in conflicts)
  2. Migration Guide: There is no record of the steps for users to convert from install. yaml → Helm does not need to be shut down
  3. Backward compatibility: install.ml may be deprecated, but migration tools
    What I am looking forward to:
    -How to securely migrate existing deployments, or how to smoothly transition?
    kubectl delete -f install.yaml
    Is helm install enough?

@fmuyassarov

Copy link
Copy Markdown
Member Author

The driver is stateless on restart and because of this, I think (might be wrong since didn't test it myself) existing workloads should not be disrupted. Given that, the expected migration step is literally two commands:

 kubectl delete -f dist/install.yaml
 helm install dra-driver-cpu ./deployment/helm/dra-driver-cpu -n kube-system

I've listed some additional options here though I haven't validated it it on a production setup.

I don't see a way to avoid a brief downtime window for new allocations during the rollover, if you know of one,
I would love to hear it and we can document it. But meanwhile, I think we can add a short migration guide from install.yaml to helm the README. Would that address your concern?

@fmuyassarov

Copy link
Copy Markdown
Member Author

What I can do is test the migration and see what breaks in practice. That said, I don't think this should block the PR though - not everyone will be migrating; some users will simply start fresh from the latest release.

Would it make sense to continue the migration discussion on the original issue, where it has more context? That way this PR can move forward and we can iterate on the migration story separately?

@ffromani

Copy link
Copy Markdown
Contributor

@fmuyassarov I agree on the general sentiment: we can and should do due diligence to ensure no obvious breakage, but this should not block this PR.

My proposal is to add automated tests, which perhaps we can hook on CI, to ensure that install.yaml and the helm chart deliver both a functioning plugin. TL;DR: variants of the existing lanes.

We don't need to use prow: we can just set up in such a way the tests run against kind or minikube and temporarily hook those in the github actions. This is probably more convenient than a temporary prow setup.

@ffromani

Copy link
Copy Markdown
Contributor

@fmuyassarov I agree on the general sentiment: we can and should do due diligence to ensure no obvious breakage, but this should not block this PR.

My proposal is to add automated tests, which perhaps we can hook on CI, to ensure that install.yaml and the helm chart deliver both a functioning plugin. TL;DR: variants of the existing lanes.

We don't need to use prow: we can just set up in such a way the tests run against kind or minikube and temporarily hook those in the github actions. This is probably more convenient than a temporary prow setup.

@AutuSnow will this proposal + the other ideas @fmuyassarov contributed address your concern?

@AutuSnow

Copy link
Copy Markdown
Contributor

I think I would agree

@fmuyassarov

Copy link
Copy Markdown
Member Author

@fmuyassarov I agree on the general sentiment: we can and should do due diligence to ensure no obvious breakage, but this should not block this PR.

My proposal is to add automated tests, which perhaps we can hook on CI, to ensure that install.yaml and the helm chart deliver both a functioning plugin. TL;DR: variants of the existing lanes.

We don't need to use prow: we can just set up in such a way the tests run against kind or minikube and temporarily hook those in the github actions. This is probably more convenient than a temporary prow setup.

Thanks. I think we can cook up something relatively easy for that purpose. I will prepare it tomorrow.

@ffromani

ffromani commented May 4, 2026

Copy link
Copy Markdown
Contributor

@fmuyassarov I agree on the general sentiment: we can and should do due diligence to ensure no obvious breakage, but this should not block this PR.
My proposal is to add automated tests, which perhaps we can hook on CI, to ensure that install.yaml and the helm chart deliver both a functioning plugin. TL;DR: variants of the existing lanes.
We don't need to use prow: we can just set up in such a way the tests run against kind or minikube and temporarily hook those in the github actions. This is probably more convenient than a temporary prow setup.

Thanks. I think we can cook up something relatively easy for that purpose. I will prepare it tomorrow.

SGTM, thanks. Let's run these tests (perhaps for some time in GH actions CI) and we can move forward. No pressure, just recording the next steps.

@fmuyassarov

Copy link
Copy Markdown
Member Author

What I can do is test the migration and see what breaks in practice. That said, I don't think this should block the PR though - not everyone will be migrating; some users will simply start fresh from the latest release.

Would it make sense to continue the migration discussion on the original issue, where it has more context? That way this PR can move forward and we can iterate on the migration story separately?

Sorry it took me a while to find some time to test this.

I can’t fully simulate a production-like environment, but here’s what I did. The cluster is running with version v0.1.0 of the driver, installed manually using the install.yaml file. I created two workload Pods, each requesting some CPUs via resourceClaims that reference the same deviceClass (dra.cpu).

Next I deleted the deviceClass (dra.cpu). As expected, this had no effect on the already running Pods. I then created two additional Pods requesting CPUs again referencing the same deviceClass. These Pods remained in the Pending state, since the deviceClass no longer existed.

While the Pods were still pending, I deleted the driver’s daemonSet and installed a new version of the driver with a local Helm chart. The new daemonSet Pods were up and running. The previously pending Pods moved to Running, and the Pods that were already running were not affected.

I know this is not a migration scenario, but it does shows that Pods with already allocated resources are not affected during such changes.

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@est.tech>
@fmuyassarov fmuyassarov force-pushed the devel/helm-charts branch from 5cc8768 to a5782fb Compare May 5, 2026 09:14
@fmuyassarov

Copy link
Copy Markdown
Member Author

@ffromani I’ve just pushed the a5782fb to add a minimal workflow. It builds the image, loads it into kind, installs the driver via Helm using that image, and then primarily checks for the existence of resourceSlice with the existing script. Here is the example run from this PR: https://github.com/kubernetes-sigs/dra-driver-cpu/actions/runs/25367882428/job/74383380807?pr=83.
So, we have a5782fb and d7b830d as a starting point to catch possible bugs for future helm part related patches.

@fmuyassarov

Copy link
Copy Markdown
Member Author

and once this is in place, I’ll do a follow up with another patch so that charts are published alongside the images in the production registry.

@ffromani

ffromani commented May 6, 2026

Copy link
Copy Markdown
Contributor

thanks for the updates. I'm reviewing but it will take me some more time to fully groc given my very rusty helm knowledge.

@pravk03 pravk03 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fmuyassarov. From my limited helm knowledge, everything looked good on this PR.

Comment thread .github/workflows/helm-e2e.yaml
@fmuyassarov

Copy link
Copy Markdown
Member Author

Thanks @fmuyassarov. From my limited helm knowledge, everything looked good on this PR.

No worries and thanks for your review @pravk03 .

@ffromani ffromani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

thanks for pushing this forward @fmuyassarov .

few more followups:

  1. use helm as default install, deprecate install.yaml
  2. document the update path + disruption (988529f#r3123028531)
  3. for helm support in particular but in general simplify our CI steps leveraging pre-installed software in the runner images: https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2404-Readme.md

Comment thread .github/workflows/helm-e2e.yaml
Comment thread .github/workflows/helm-e2e.yaml
Comment thread .github/workflows/helm-e2e.yaml
spec:
selector:
matchLabels:
{{- include "dra-driver-cpu.selectorLabels" . | nindent 6 }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree. Let's proceed considering option 1 as the most likely.

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, fmuyassarov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2026
@ffromani

Copy link
Copy Markdown
Contributor

/lgtm

from my side and also acknowledging #83 (review)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 12, 2026
@ffromani

Copy link
Copy Markdown
Contributor

/approve

thanks for pushing this forward @fmuyassarov .

few more followups:

1. use helm as default install, deprecate `install.yaml`

2. document the update path + disruption ([988529f#r3123028531](https://github.com/kubernetes-sigs/dra-driver-cpu/commit/988529f3616b35b734a8fc14168d920e4a228666#r3123028531))

3. for helm support in particular but in general simplify our CI steps leveraging pre-installed software in the runner images: https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2404-Readme.md

FYI @fmuyassarov / @pravk03 let's chat about these

@k8s-ci-robot k8s-ci-robot merged commit 1b2274c into kubernetes-sigs:main May 12, 2026
11 checks passed
@fmuyassarov

Copy link
Copy Markdown
Member Author

Thanks @ffromani.
I will take care of those points as a follow up.

@fmuyassarov fmuyassarov deleted the devel/helm-charts branch May 12, 2026 18:27
@fmuyassarov

Copy link
Copy Markdown
Member Author

Filed a follow up in #144.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: Helm chart for dra-driver-cpu

5 participants