What happened?
Environment
- Kubespray: master branch, commit
bdbfcaae8 (v2.30.0-94),
bug also present on origin/master HEAD
- Cilium: 1.19.2 chart and image
- cilium-cli: v0.18.9
- K8s: 1.34.4 with containerd
- Deployment mode: offline registry (
dockerhub_image_repo set)
Summary
Kubespray syncs the cilium/operator image to offline registries, but the
Cilium Helm chart requires cilium/operator-generic for non-cloud
deployments. This mismatch causes broken deployments for offline-registry
users.
Root cause
Cilium chart image naming convention
In cilium/templates/cilium-operator/_helpers.tpl, the chart constructs
the operator image name as:
{repository}-{cloud}{suffix}{tag}{digest}
where {cloud} is determined by cilium.operator.cloud define:
aws if eni.enabled
azure if azure.enabled
alibabacloud if alibabacloud.enabled
generic otherwise (default for non-cloud, including bare-metal)
So the rendered image for a non-cloud deployment with default values is:
quay.io/cilium/operator + "-" + "generic" + "" + ":v1.19.2"
= quay.io/cilium/operator-generic:v1.19.2
The same {cloud} variable is also used in the deployment's command:
command:
- cilium-operator-{{ include "cilium.operator.cloud" . }}
Kubespray mismatch
roles/kubespray_defaults/defaults/main/download.yml:237:
cilium_operator_image_repo: "{{ quay_image_repo }}/cilium/operator"
This value is used in two places:
download.yml:599 (image sync entry) — Kubespray pulls
quay.io/cilium/operator:vX.Y.Z and pushes it to the offline registry
as <registry>/cilium/operator:vX.Y.Z.
roles/network_plugin/cilium/templates/values.yaml.j2:154-157 —
rendered to chart values as:
operator:
image:
repository: <registry>/cilium/operator
tag: vX.Y.Z
The chart then applies its helper logic and ends up requesting image
<registry>/cilium/operator-generic:vX.Y.Z — a name not synced to
the offline registry.
Why online-registry users don't hit this
Online users get quay.io/cilium/operator-generic:vX.Y.Z directly (chart
default repository + chart helper). Kubespray's override of the
repository field reuses the same cilium/operator base, so the chart
helper still produces a valid name in quay.io/cilium/operator-generic,
which exists upstream.
Why this hasn't been reported
Offline-registry users typically have an image sync workflow that
inadvertently masks the bug:
- The sync sees
cilium/operator:vX.Y.Z in Kubespray's list
- The pull from upstream succeeds (this name exists for cloud variants
build base)
- Some sync scripts auto-retag the pulled image to additional aliases
including cilium/operator-generic
- Cluster pulls
cilium/operator-generic successfully — but gets the
wrong image content (contains cilium-operator binary, not
cilium-operator-generic)
- For some Cilium versions/cilium-cli combinations, the resulting
deployment's command happens to match the wrong binary, and the
pod runs — appearing as a working deployment
This silent failure mode can persist until a chart upgrade synchronizes
the deployment's command field with the chart's expectation
(cilium-operator-generic), at which point new pods CrashLoopBackOff:
exec: "cilium-operator-generic": executable file not found in $PATH
Reproduction
- Configure Kubespray for offline registry:
dockerhub_image_repo: "<registry>/kubespray"
- Use Kubespray's image sync (without any extra retagging) to populate
the offline registry from cilium_image_list.
- Verify only
cilium/operator (not cilium/operator-generic) ends up
in the offline registry:
curl -s "https://<registry>/v2/kubespray/cilium/operator-generic/tags/list"
# 404 or empty
- Deploy cilium:
ansible-playbook cluster.yml -i inventory/... --tags cilium
- Observe operator pods CrashLoopBackOff with:
Failed to pull image "<registry>/kubespray/cilium/operator-generic:vX.Y.Z":
manifest unknown
Evidence
Available on request. Key data points:
- helm template rendering with default values produces
quay.io/cilium/operator-generic:v1.19.2
- helm template with
--set operator.image.repository=<custom>/cilium/operator
still produces <custom>/cilium/operator-generic:v1.19.2 (chart helper
unconditionally adds -{cloud} suffix)
- A real offline-registry deployment ended up with three different
ReplicaSet specs over several upgrades; the only Ready one used
image=operator + command=cilium-operator (matching by luck), while
the deterministic chart output (image=operator-generic +
command=cilium-operator-generic) consistently failed
What did you expect to happen?
cilium-operator deployment should be Running with image and command fields that match what the offline registry contains. Specifically:
Kubespray should sync the upstream image with the same name the chart's helper will compute (cilium/operator-generic for non-cloud deployments), OR
Kubespray should pass operator.image.override to the chart so the chart skips the helper's suffix logic and uses the explicit image name Kubespray has synced.
In either case, the final deployed image and command should be consistent and point to a valid binary inside the image.
How can we reproduce it (as minimally and precisely as possible)?
-
Configure Kubespray for offline registry deployment:
# inventory/<cluster>/group_vars/all/offline.yml
dockerhub_image_repo: "<your-registry>/kubespray"
quay_image_repo: "{{ dockerhub_image_repo }}"
-
Sync images to your offline registry using whatever workflow you have (typically pulling from quay.io and pushing to your private registry). Do not auto-retag — only push the exact image name in cilium_image_list.
-
Verify the offline registry only contains cilium/operator (the name Kubespray syncs):
curl -s "https://<registry>/v2/kubespray/cilium/operator/tags/list"
# {"name":"kubespray/cilium/operator","tags":["v1.19.2"]}
curl -s "https://<registry>/v2/kubespray/cilium/operator-generic/tags/list"
# {"errors":[{"code":"NAME_UNKNOWN", ...}]}
-
Deploy with:
kube_network_plugin: cilium
cilium_version: 1.19.2
ansible-playbook -i inventory/<cluster>/hosts.yaml cluster.yml --tags cilium
-
Observe cilium-operator pods CrashLoopBackOff with ErrImagePull or exec: "cilium-operator-generic": executable file not found.
-
Verify root cause with helm template:
helm template cilium <cilium-1.19.2-chart-path> \
--namespace kube-system \
--set operator.image.repository=<registry>/kubespray/cilium/operator \
--set operator.image.tag=v1.19.2 \
--set operator.image.useDigest=false \
| grep -A2 "name: cilium-operator$"
# Shows image: <registry>/kubespray/cilium/operator-generic:v1.19.2 (chart added -generic)
OS
Ubuntu 24
Version of Ansible
ansible [core 2.18.12]
Version of Python
python version = 3.12.4 (main, Jul 5 2024, 11:37:28) [GCC 9.4.0] (/usr/local/python3.12/bin/python3.12)
Version of Kubespray (commit)
commit bdbfcaa tag: v2.30.0-94-gbdbfcaae8 (master branch, 94 commits after v2.30.0)
Network plugin used
cilium
Full inventory with variables
https://gist.github.com/Feelings0220/e531c5a94af04ecbc279314086cdfd45
Command used to invoke ansible
ansible-playbook -i inventory//hosts.yaml \ cluster.yml \ -b \ --become-user=root \ -e kube_version=v1.34.4 \ -e cilium_version=1.19.2
Output of ansible run
Welcome to Ubuntu 24.04.4 LTS (GNU/Linux 6.8.0-100-generic x86_64)
System information as of Tue May 12 02:57:30 PM CST 2026
System load: 0.45 Processes: 840
Usage of /home: 1.9% of 10.00TB Users logged in: 1
Memory usage: 2% IPv4 address for ens1f0: 10.8.9.168
Swap usage: 0% IPv4 address for ens1f0: 10.8.9.150
Temperature: 70.0 C
Expanded Security Maintenance for Applications is not enabled.
127 updates can be applied immediately.
To see these additional updates run: apt list --upgradable
Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status
Failed to connect to https://changelogs.ubuntu.com/meta-release-lts. Check your Internet connection or proxy settings
=== cilium status (after upgrade) ===
/¯¯
/¯¯_/¯¯\ Cilium: OK
_/¯¯_/ Operator: 2 errors
/¯¯_/¯¯\ Envoy DaemonSet: OK
_/¯¯_/ Hubble Relay: 1 errors, 2 warnings
__/ ClusterMesh: disabled
DaemonSet cilium Desired: 8, Ready: 8/8, Available: 8/8
DaemonSet cilium-envoy Desired: 8, Ready: 8/8, Available: 8/8
Deployment cilium-operator Desired: 3, Ready: 1/3, Available: 1/3, Unavailable: 2/3
Deployment hubble-relay Desired: 1, Unavailable: 1/1
Containers: cilium Running: 8
cilium-envoy Running: 8
cilium-operator Running: 3
clustermesh-apiserver
hubble-relay Pending: 1
Cluster Pods: 45/45 managed by Cilium
Helm chart version: 1.19.2
Image versions cilium dockerhub.kubekey.local/kubernetes-kubespray/cilium/cilium:v1.19.2: 8
cilium-envoy dockerhub.kubekey.local/kubernetes-kubespray/cilium/cilium-envoy:v1.34.10-1762597008-ff7ae7d623be00078865cff1b0672cc5d9bfc6d5: 8
cilium-operator dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator:v1.19.2: 3
hubble-relay dockerhub.kubekey.local/kubernetes-kubespray/cilium/hubble-relay:v1.19.2@sha256:9987c73bad48c987fd065185535fd15a6717cbe8a8caf7fc7ef0413532cf490e: 1
Errors: cilium-operator cilium-operator 2 pods of Deployment cilium-operator are not ready
cilium-operator cilium-operator deployment cilium-operator is rolling out - 2 out of 3 pods updated
hubble-relay hubble-relay 1 pods of Deployment hubble-relay are not ready
Warnings: hubble-relay hubble-relay-755c6b7747-xj4rw pod is pending
hubble-relay hubble-relay-755c6b7747-xj4rw pod is pending
=== cilium-operator pods ===
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cilium-operator-55bb5d64cc-j2kz6 1/1 Running 1 (30h ago) 30h 10.8.9.95 worker-a-03
cilium-operator-5d5878f8fb-pxbhn 0/1 CrashLoopBackOff 329 (28s ago) 27h 10.8.9.169 master-03
cilium-operator-5d5878f8fb-s4qrn 0/1 CrashLoopBackOff 327 (115s ago) 27h 10.8.9.94 worker-a-02
=== Most recent crash pod logs (last 30 lines) ===
Crash pod: pod/cilium-operator-5d5878f8fb-pxbhn
pod/cilium-operator-5d5878f8fb-s4qrn
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h8m79 (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cilium-config-path:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cilium-config
Optional: false
kube-api-access-h8m79:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
Optional: false
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: op=Exists
node.cilium.io/agent-not-ready op=Exists
Events:
Type Reason Age From Message
Normal Created 53m (x318 over 27h) kubelet Created container: cilium-operator
Warning Failed 53m (x318 over 27h) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "cilium-operator-generic": executable file not found in $PATH
Warning BackOff 3m15s (x8043 over 27h) kubelet Back-off restarting failed container cilium-operator in pod cilium-operator-5d5878f8fb-s4qrn_kube-system(ffd8a1c5-6ede-4218-a84e-1edf44318473)
Normal Pulled 117s (x328 over 27h) kubelet Container image "dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator:v1.19.2" already present on machine
=== Image present in offline registry ===
dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator-generic v1.19.1 f1b5c176c6ee8 33.4MB
dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator-generic v1.19.2 63ae62180908e 45.7MB
dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator v1.19.2 63ae62180908e 45.7MB
dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator v1.19.1 e5091458a7e48 45.6MB
user1@sz-bianyi-112:~/mao.wei11/kubespray-deploy/cilium$
Anything else we need to know
Additional context — Proposed fix
If maintainers confirm this is a real issue, I can submit a PR with the following change:
File 1: roles/kubespray_defaults/defaults/main/download.yml:237
diff- cilium_operator_image_repo: "{{ quay_image_repo }}/cilium/operator"
- cilium_operator_image_repo: "{{ quay_image_repo }}/cilium/operator-generic"
File 2: roles/network_plugin/cilium/templates/values.yaml.j2:154-157
diffoperator:
image:
- repository: {{ cilium_operator_image_repo }}
- override: "{{ cilium_operator_image_repo }}:{{ cilium_operator_image_tag }}"
tag: {{ cilium_operator_image_tag }}
Using operator.image.override prevents the chart helper from adding another -generic suffix (since cilium_operator_image_repo already ends in -generic).
Cloud variant considerations
The current default targets only non-cloud (generic) deployments, matching the most common Kubespray scenario. For users deploying with eni.enabled, azure.enabled, or alibabacloud.enabled, the fix would need to be conditionalized. Happy to extend the PR if maintainers prefer.
What happened?
Environment
bdbfcaae8(v2.30.0-94),bug also present on
origin/masterHEADdockerhub_image_reposet)Summary
Kubespray syncs the
cilium/operatorimage to offline registries, but theCilium Helm chart requires
cilium/operator-genericfor non-clouddeployments. This mismatch causes broken deployments for offline-registry
users.
Root cause
Cilium chart image naming convention
In
cilium/templates/cilium-operator/_helpers.tpl, the chart constructsthe operator image name as:
where
{cloud}is determined bycilium.operator.clouddefine:awsifeni.enabledazureifazure.enabledalibabacloudifalibabacloud.enabledgenericotherwise (default for non-cloud, including bare-metal)So the rendered image for a non-cloud deployment with default values is:
The same
{cloud}variable is also used in the deployment'scommand:Kubespray mismatch
roles/kubespray_defaults/defaults/main/download.yml:237:This value is used in two places:
download.yml:599(image sync entry) — Kubespray pullsquay.io/cilium/operator:vX.Y.Zand pushes it to the offline registryas
<registry>/cilium/operator:vX.Y.Z.roles/network_plugin/cilium/templates/values.yaml.j2:154-157—rendered to chart values as:
The chart then applies its helper logic and ends up requesting image
<registry>/cilium/operator-generic:vX.Y.Z— a name not synced tothe offline registry.
Why online-registry users don't hit this
Online users get
quay.io/cilium/operator-generic:vX.Y.Zdirectly (chartdefault repository + chart helper). Kubespray's override of the
repositoryfield reuses the samecilium/operatorbase, so the charthelper still produces a valid name in
quay.io/cilium/operator-generic,which exists upstream.
Why this hasn't been reported
Offline-registry users typically have an image sync workflow that
inadvertently masks the bug:
cilium/operator:vX.Y.Zin Kubespray's listbuild base)
including
cilium/operator-genericcilium/operator-genericsuccessfully — but gets thewrong image content (contains
cilium-operatorbinary, notcilium-operator-generic)deployment's
commandhappens to match the wrong binary, and thepod runs — appearing as a working deployment
This silent failure mode can persist until a chart upgrade synchronizes
the deployment's
commandfield with the chart's expectation(
cilium-operator-generic), at which point new pods CrashLoopBackOff:Reproduction
the offline registry from
cilium_image_list.cilium/operator(notcilium/operator-generic) ends upin the offline registry:
Evidence
Available on request. Key data points:
quay.io/cilium/operator-generic:v1.19.2--set operator.image.repository=<custom>/cilium/operatorstill produces
<custom>/cilium/operator-generic:v1.19.2(chart helperunconditionally adds
-{cloud}suffix)ReplicaSet specs over several upgrades; the only Ready one used
image=
operator+ command=cilium-operator(matching by luck), whilethe deterministic chart output (image=
operator-generic+command=
cilium-operator-generic) consistently failedWhat did you expect to happen?
cilium-operator deployment should be Running with image and command fields that match what the offline registry contains. Specifically:
Kubespray should sync the upstream image with the same name the chart's helper will compute (cilium/operator-generic for non-cloud deployments), OR
Kubespray should pass operator.image.override to the chart so the chart skips the helper's suffix logic and uses the explicit image name Kubespray has synced.
In either case, the final deployed image and command should be consistent and point to a valid binary inside the image.
How can we reproduce it (as minimally and precisely as possible)?
Configure Kubespray for offline registry deployment:
Sync images to your offline registry using whatever workflow you have (typically pulling from
quay.ioand pushing to your private registry). Do not auto-retag — only push the exact image name incilium_image_list.Verify the offline registry only contains
cilium/operator(the name Kubespray syncs):Deploy with:
Observe
cilium-operatorpods CrashLoopBackOff withErrImagePullorexec: "cilium-operator-generic": executable file not found.Verify root cause with
helm template:OS
Ubuntu 24
Version of Ansible
ansible [core 2.18.12]
Version of Python
python version = 3.12.4 (main, Jul 5 2024, 11:37:28) [GCC 9.4.0] (/usr/local/python3.12/bin/python3.12)
Version of Kubespray (commit)
commit bdbfcaa tag: v2.30.0-94-gbdbfcaae8 (master branch, 94 commits after v2.30.0)
Network plugin used
cilium
Full inventory with variables
https://gist.github.com/Feelings0220/e531c5a94af04ecbc279314086cdfd45
Command used to invoke ansible
ansible-playbook -i inventory//hosts.yaml \ cluster.yml \ -b \ --become-user=root \ -e kube_version=v1.34.4 \ -e cilium_version=1.19.2
Output of ansible run
Welcome to Ubuntu 24.04.4 LTS (GNU/Linux 6.8.0-100-generic x86_64)
System information as of Tue May 12 02:57:30 PM CST 2026
System load: 0.45 Processes: 840
Usage of /home: 1.9% of 10.00TB Users logged in: 1
Memory usage: 2% IPv4 address for ens1f0: 10.8.9.168
Swap usage: 0% IPv4 address for ens1f0: 10.8.9.150
Temperature: 70.0 C
Expanded Security Maintenance for Applications is not enabled.
127 updates can be applied immediately.
To see these additional updates run: apt list --upgradable
Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status
Failed to connect to https://changelogs.ubuntu.com/meta-release-lts. Check your Internet connection or proxy settings
=== cilium status (after upgrade) ===
/¯¯
/¯¯_/¯¯\ Cilium: OK
_/¯¯_/ Operator: 2 errors
/¯¯_/¯¯\ Envoy DaemonSet: OK
_/¯¯_/ Hubble Relay: 1 errors, 2 warnings
__/ ClusterMesh: disabled
DaemonSet cilium Desired: 8, Ready: 8/8, Available: 8/8
DaemonSet cilium-envoy Desired: 8, Ready: 8/8, Available: 8/8
Deployment cilium-operator Desired: 3, Ready: 1/3, Available: 1/3, Unavailable: 2/3
Deployment hubble-relay Desired: 1, Unavailable: 1/1
Containers: cilium Running: 8
cilium-envoy Running: 8
cilium-operator Running: 3
clustermesh-apiserver
hubble-relay Pending: 1
Cluster Pods: 45/45 managed by Cilium
Helm chart version: 1.19.2
Image versions cilium dockerhub.kubekey.local/kubernetes-kubespray/cilium/cilium:v1.19.2: 8
cilium-envoy dockerhub.kubekey.local/kubernetes-kubespray/cilium/cilium-envoy:v1.34.10-1762597008-ff7ae7d623be00078865cff1b0672cc5d9bfc6d5: 8
cilium-operator dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator:v1.19.2: 3
hubble-relay dockerhub.kubekey.local/kubernetes-kubespray/cilium/hubble-relay:v1.19.2@sha256:9987c73bad48c987fd065185535fd15a6717cbe8a8caf7fc7ef0413532cf490e: 1
Errors: cilium-operator cilium-operator 2 pods of Deployment cilium-operator are not ready
cilium-operator cilium-operator deployment cilium-operator is rolling out - 2 out of 3 pods updated
hubble-relay hubble-relay 1 pods of Deployment hubble-relay are not ready
Warnings: hubble-relay hubble-relay-755c6b7747-xj4rw pod is pending
hubble-relay hubble-relay-755c6b7747-xj4rw pod is pending
=== cilium-operator pods ===
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cilium-operator-55bb5d64cc-j2kz6 1/1 Running 1 (30h ago) 30h 10.8.9.95 worker-a-03
cilium-operator-5d5878f8fb-pxbhn 0/1 CrashLoopBackOff 329 (28s ago) 27h 10.8.9.169 master-03
cilium-operator-5d5878f8fb-s4qrn 0/1 CrashLoopBackOff 327 (115s ago) 27h 10.8.9.94 worker-a-02
=== Most recent crash pod logs (last 30 lines) ===
Crash pod: pod/cilium-operator-5d5878f8fb-pxbhn
pod/cilium-operator-5d5878f8fb-s4qrn
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-h8m79 (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cilium-config-path:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cilium-config
Optional: false
kube-api-access-h8m79:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
Optional: false
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: op=Exists
node.cilium.io/agent-not-ready op=Exists
Events:
Type Reason Age From Message
Normal Created 53m (x318 over 27h) kubelet Created container: cilium-operator
Warning Failed 53m (x318 over 27h) kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "cilium-operator-generic": executable file not found in $PATH
Warning BackOff 3m15s (x8043 over 27h) kubelet Back-off restarting failed container cilium-operator in pod cilium-operator-5d5878f8fb-s4qrn_kube-system(ffd8a1c5-6ede-4218-a84e-1edf44318473)
Normal Pulled 117s (x328 over 27h) kubelet Container image "dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator:v1.19.2" already present on machine
=== Image present in offline registry ===
dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator-generic v1.19.1 f1b5c176c6ee8 33.4MB
dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator-generic v1.19.2 63ae62180908e 45.7MB
dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator v1.19.2 63ae62180908e 45.7MB
dockerhub.kubekey.local/kubernetes-kubespray/cilium/operator v1.19.1 e5091458a7e48 45.6MB
user1@sz-bianyi-112:~/mao.wei11/kubespray-deploy/cilium$
Anything else we need to know
Additional context — Proposed fix
If maintainers confirm this is a real issue, I can submit a PR with the following change:
File 1: roles/kubespray_defaults/defaults/main/download.yml:237
diff- cilium_operator_image_repo: "{{ quay_image_repo }}/cilium/operator"
File 2: roles/network_plugin/cilium/templates/values.yaml.j2:154-157
diffoperator:
image:
tag: {{ cilium_operator_image_tag }}
Using operator.image.override prevents the chart helper from adding another -generic suffix (since cilium_operator_image_repo already ends in -generic).
Cloud variant considerations
The current default targets only non-cloud (generic) deployments, matching the most common Kubespray scenario. For users deploying with eni.enabled, azure.enabled, or alibabacloud.enabled, the fix would need to be conditionalized. Happy to extend the PR if maintainers prefer.