feat: add Helm chart for driver deployment#83
Conversation
9598847 to
3ba52b2
Compare
|
@fmuyassarov Can you add the configurations for livenessProbe and readinessProbe |
Yes sure. |
@AutuSnow added #84 for the install.yaml and soon will add here as well. |
Thanks !! |
d6efb7b to
cbb81c8
Compare
|
Similar health probes as in #84 are added here to the chart. |
|
will review again shortly, thanks for the patience |
|
Thanks. But I would ask don't do it yet, because I'm about to add few more improvements in an hour or two. Will ping you once ready. |
cbb81c8 to
ae5cb15
Compare
@ffromani |
|
/test pull-dra-driver-cpu-e2e-device-mode-grouped-arm64 |
|
/test pull-dra-driver-cpu-e2e-device-mode-grouped-arm64 |
Thanks for sharing your concerns. If the migration plan didn’t meet your expectations, I’d really appreciate hearing more about what you think could be improved and more importantly how. Given the current state of the project (still in v0.x.x), we do expect some level of iteration and change. This PR isn’t intended to introduce anything major, but rather to make the driver installation process smoother than it is today. |
Thank you for your reply. Let me clarify - what I am concerned about is not the Helm chart itself, but the migration path from the existing manifests/install.yaml to Helm.
|
|
The driver is stateless on restart and because of this, I think (might be wrong since didn't test it myself) existing workloads should not be disrupted. Given that, the expected migration step is literally two commands: I've listed some additional options here though I haven't validated it it on a production setup. I don't see a way to avoid a brief downtime window for new allocations during the rollover, if you know of one, |
|
What I can do is test the migration and see what breaks in practice. That said, I don't think this should block the PR though - not everyone will be migrating; some users will simply start fresh from the latest release. Would it make sense to continue the migration discussion on the original issue, where it has more context? That way this PR can move forward and we can iterate on the migration story separately? |
|
@fmuyassarov I agree on the general sentiment: we can and should do due diligence to ensure no obvious breakage, but this should not block this PR. My proposal is to add automated tests, which perhaps we can hook on CI, to ensure that install.yaml and the helm chart deliver both a functioning plugin. TL;DR: variants of the existing lanes. We don't need to use prow: we can just set up in such a way the tests run against kind or minikube and temporarily hook those in the github actions. This is probably more convenient than a temporary prow setup. |
@AutuSnow will this proposal + the other ideas @fmuyassarov contributed address your concern? |
|
I think I would agree |
Thanks. I think we can cook up something relatively easy for that purpose. I will prepare it tomorrow. |
SGTM, thanks. Let's run these tests (perhaps for some time in GH actions CI) and we can move forward. No pressure, just recording the next steps. |
Sorry it took me a while to find some time to test this. I can’t fully simulate a production-like environment, but here’s what I did. The cluster is running with version v0.1.0 of the driver, installed manually using the install.yaml file. I created two workload Pods, each requesting some CPUs via resourceClaims that reference the same deviceClass ( Next I deleted the deviceClass ( While the Pods were still pending, I deleted the driver’s daemonSet and installed a new version of the driver with a local Helm chart. The new daemonSet Pods were up and running. The previously pending Pods moved to Running, and the Pods that were already running were not affected. I know this is not a migration scenario, but it does shows that Pods with already allocated resources are not affected during such changes. |
Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@est.tech>
5cc8768 to
a5782fb
Compare
|
@ffromani I’ve just pushed the a5782fb to add a minimal workflow. It builds the image, loads it into kind, installs the driver via Helm using that image, and then primarily checks for the existence of resourceSlice with the existing script. Here is the example run from this PR: https://github.com/kubernetes-sigs/dra-driver-cpu/actions/runs/25367882428/job/74383380807?pr=83. |
|
and once this is in place, I’ll do a follow up with another patch so that charts are published alongside the images in the production registry. |
|
thanks for the updates. I'm reviewing but it will take me some more time to fully groc given my very rusty helm knowledge. |
pravk03
left a comment
There was a problem hiding this comment.
Thanks @fmuyassarov. From my limited helm knowledge, everything looked good on this PR.
No worries and thanks for your review @pravk03 . |
There was a problem hiding this comment.
/approve
thanks for pushing this forward @fmuyassarov .
few more followups:
- use helm as default install, deprecate
install.yaml - document the update path + disruption (988529f#r3123028531)
- for helm support in particular but in general simplify our CI steps leveraging pre-installed software in the runner images: https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2404-Readme.md
| spec: | ||
| selector: | ||
| matchLabels: | ||
| {{- include "dra-driver-cpu.selectorLabels" . | nindent 6 }} |
There was a problem hiding this comment.
I tend to agree. Let's proceed considering option 1 as the most likely.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ffromani, fmuyassarov The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm from my side and also acknowledging #83 (review) |
FYI @fmuyassarov / @pravk03 let's chat about these |
|
Thanks @ffromani. |
|
Filed a follow up in #144. |
Add a Helm chart for driver installation. This PR adds:
Follow-up (TODO)
ghcr.io/kubernetes-sigs/dra-driver-cpu/charts/dra-driver-cpu0.0.0-mainfrom main branchNote: currently all the templates (DeamonSet, ServiceAccount, etc are based on the what is available in install.yaml).
Fixes: #72