Skip to content

WIP: add golang-based containerd setup helper#128

Draft
ffromani wants to merge 1 commit into
kubernetes-sigs:mainfrom
ffromani:setup-helper-golang
Draft

WIP: add golang-based containerd setup helper#128
ffromani wants to merge 1 commit into
kubernetes-sigs:mainfrom
ffromani:setup-helper-golang

Conversation

@ffromani
Copy link
Copy Markdown
Contributor

WIP

WIP

Signed-off-by: Francesco Romani <fromani@redhat.com>
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 21, 2026
@k8s-ci-robot k8s-ci-robot requested a review from klueska April 21, 2026 16:56
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 21, 2026
Copy link
Copy Markdown
Member

@fmuyassarov fmuyassarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ffromani, I know it is still WIP but probably good time to ask few questions.

  1. I'm wondering if you have considered using dbus (via go-systemd) directly instead of host systemctl?
  2. Would it be rather convenient from the user's perspective to have an optional init container for runtime config (along side the driver Pod) changes rather than the Job?
  3. Should be also try to cover the CRI-O?

Comment on lines +2 to +3
# Configures containerd on all worker nodes to enable NRI and CDI.
# Only needed for runtimes older than containerd 2.0 / CRI-O 1.30.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or in case when NRI/CDI is disabled prior to driver installation.

@ffromani
Copy link
Copy Markdown
Contributor Author

Thanks @ffromani, I know it is still WIP but probably good time to ask few questions.

It surely is!

1. I'm wondering if you have considered using dbus (via [go-systemd](https://github.com/coreos/go-systemd)) directly instead of host systemctl?

Nope, I used systemctl because this is what the initial initContainer implementation did and I didn't explored options yet. I think the key factors here are reliability, simplicity and privileges required. If go-systemd gives us a better alternative along these axes, we can surely switch to it.

2. Would it be rather convenient from the user's perspective to have an optional init container for runtime config (along side the driver Pod) changes rather than the Job?

I think the job is better because separation of concerns and separation of required privileges. Unless I'm mistaken, re-introducing a initContainer will also require a privilege bump, which I'd rather avoid.
These are the main reasons I'd be not thrilled about reintroducing a initContainer, but we can surely talk about them, if there are upsides I'm missing I'd be happy to consider them and revisit the decision.

3. Should be also try to cover the CRI-O?

Yes! If we decide to go forward with this approach I will deep dive in crio and add the crio counterpart code.

@fmuyassarov
Copy link
Copy Markdown
Member

I think the job is better because separation of concerns and separation of required privileges. Unless I'm mistaken, re-introducing a initContainer will also require a privilege bump, which I'd rather avoid. These are the main reasons I'd be not thrilled about reintroducing a initContainer, but we can surely talk about them, if there are upsides I'm missing I'd be happy to consider them and revisit the decision.

If we use an init container and dbus, we would need to mount the following paths:

  • /etc/containerd/ for the config.toml
  • /etc/crio/crio.conf.d/ for CRI-O drop-in configuration
  • /var/run/dbus/system_bus_socket if we decide to use dbus, for detecting systemd units of the container runtime and related interactions.

Compared to the current Job configuration, this reduces the exposure of the host filesystem. Instead of mounting the whole /etc, we limit access to only the specific subdirectories.

With Job, we also put operational burden on the user because they need to know the node count, run the Job manually and check if it succeeded on each node. With init container, they can just pass a single flag (e.g. --set initContainer.enabled=true or something) and the chart handles the rest. If the init container fails on a node, the driver pod doesn't start there, which gives an immediate signal rather than a silent misconfiguration.

@fmuyassarov
Copy link
Copy Markdown
Member

fmuyassarov commented Apr 22, 2026

With Job, we also put operational burden on the user because they need to know the node count, run the Job manually and check if it succeeded on each node. With init container, they can just pass a single flag (e.g. --set initContainer.enabled=true or something) and the chart handles the rest. If the init container fails on a node, the driver pod doesn't start there, which gives an immediate signal rather than a silent misconfiguration.

I'm not insisting to use dbus specifically or init container over Job. I just want to make sure that
we pick the best option from what's available to us, so happy to hear other thoughts.

@ffromani
Copy link
Copy Markdown
Contributor Author

I think the job is better because separation of concerns and separation of required privileges. Unless I'm mistaken, re-introducing a initContainer will also require a privilege bump, which I'd rather avoid. These are the main reasons I'd be not thrilled about reintroducing a initContainer, but we can surely talk about them, if there are upsides I'm missing I'd be happy to consider them and revisit the decision.

If we use an init container and dbus, we would need to mount the following paths:

* `/etc/containerd/` for the config.toml

* `/etc/crio/crio.conf.d/` for CRI-O drop-in configuration

* `/var/run/dbus/system_bus_socket` if we decide to use dbus, for detecting systemd units of the container runtime and related interactions.

Compared to the current Job configuration, this reduces the exposure of the host filesystem. Instead of mounting the whole /etc, we limit access to only the specific subdirectories.

With Job, we also put operational burden on the user because they need to know the node count, run the Job manually and check if it succeeded on each node. With init container, they can just pass a single flag (e.g. --set initContainer.enabled=true or something) and the chart handles the rest. If the init container fails on a node, the driver pod doesn't start there, which gives an immediate signal rather than a silent misconfiguration.

These are all good points. The operational mode using dbus has surely upsides worth a serious look, thanks for raising. The delivery mechanism - jobs vs initContainer I'm still not thrilled but I do see your point. Let's iterate with this approach a bit more, I won't be firmly against a opt-in, disabled-by-default init container task even though I'd really prefer a stronger separation of concerns if possible. But "if possible" is the key here.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 12, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants