What happened?
When cilium_tunnel_mode: disabled and cilium_routing_mode: native are explicitly set in the Kubespray inventory, the Cilium agents fail to start.
Despite the configuration in k8s-net-cilium.yml clearly stating that tunneling should be disabled, the rendered Helm chart attempts to initialize tunnel protocols (vxlan/geneve) and standalone DNS proxies. The agent logs show the application crashing due to conflicting settings: it is being forced to run in tunnel mode by the Helm chart, even though the user configuration explicitly requested native routing.
What did you expect to happen?
Kubespray should honor the inventory settings. If cilium_tunnel_mode: disabled and cilium_routing_mode: native are set, the rendered Helm chart is expected to disable all tunneling (tunnelProtocol: "") to prevent the agent from entering a conflicted state.
How can we reproduce it (as minimally and precisely as possible)?
- Use Kubespray v2.30.0.
- Configure
k8s-net-cilium.yml with the following variables:
cilium_tunnel_mode: disabled
- Run
cluster.yml or upgrade-cluster.yml.
- Observe the kube-system namespace. Cilium and Hubble relay pods will fail to start and enter
CrashLoopBackOff.
OS
RHEL 10
Version of Ansible
As bundled in quay.io/kubespray/kubespray:v2.30.0 Docker image
Version of Python
As bundled in quay.io/kubespray/kubespray:v2.30.0 Docker image
Version of Kubespray (commit)
v2.30.0
Network plugin used
cilium
Full inventory with variables
# k8s-net-cilium.yml
cilium_enable_ipv4: true
cilium_enable_ipv6: false
cilium_l2announcements: false
cilium_tunnel_mode: disabled
cilium_routing_mode: native
cilium_kube_proxy_replacement: true
cilium_auto_direct_node_routes: true
cilium_native_routing_cidr: 10.233.0.0/16
cilium_enable_hubble: true
cilium_enable_hubble_metrics: true
cilium_hubble_install: true
cilium_hubble_tls_generate: true
Command used to invoke ansible
ansible-playbook -i inventory/hosts.yml upgrade-cluster.yml -b -v -u <username>
Output of ansible run
The Ansible playbook completes successfully without errors, but the cluster state is degraded.
Running kubectl get pods -n kube-system shows:
cilium-xxxxx 0/1 CrashLoopBackOff ...
hubble-relay-xxxxx 0/1 CrashLoopBackOff ...
Anything else we need to know
roles/network_plugin/cilium/templates/values.yaml.j2 does not translate cilium_tunnel_mode: disabled into tunnelProtocol: "" in the rendered Helm values. As a result, Cilium 1.18 defaults to vxlan, which directly conflicts with routingMode: native and crashes the agent on startup.
The following manual Helm upgrade resolves the issue:
helm upgrade cilium cilium/cilium \
-n kube-system \
--version 1.18.6 \
--reuse-values \
--set tunnelProtocol="" \
--set routingMode=native
What happened?
When
cilium_tunnel_mode: disabledandcilium_routing_mode: nativeare explicitly set in the Kubespray inventory, the Cilium agents fail to start.Despite the configuration in
k8s-net-cilium.ymlclearly stating that tunneling should be disabled, the rendered Helm chart attempts to initialize tunnel protocols (vxlan/geneve) and standalone DNS proxies. The agent logs show the application crashing due to conflicting settings: it is being forced to run in tunnel mode by the Helm chart, even though the user configuration explicitly requested native routing.What did you expect to happen?
Kubespray should honor the inventory settings. If
cilium_tunnel_mode: disabledandcilium_routing_mode: nativeare set, the rendered Helm chart is expected to disable all tunneling (tunnelProtocol: "") to prevent the agent from entering a conflicted state.How can we reproduce it (as minimally and precisely as possible)?
k8s-net-cilium.ymlwith the following variables:cluster.ymlorupgrade-cluster.yml.CrashLoopBackOff.OS
RHEL 10
Version of Ansible
As bundled in
quay.io/kubespray/kubespray:v2.30.0Docker imageVersion of Python
As bundled in
quay.io/kubespray/kubespray:v2.30.0Docker imageVersion of Kubespray (commit)
v2.30.0
Network plugin used
cilium
Full inventory with variables
Command used to invoke ansible
Output of ansible run
The Ansible playbook completes successfully without errors, but the cluster state is degraded.
Running
kubectl get pods -n kube-systemshows:Anything else we need to know
roles/network_plugin/cilium/templates/values.yaml.j2does not translatecilium_tunnel_mode: disabledintotunnelProtocol: ""in the rendered Helm values. As a result, Cilium 1.18 defaults tovxlan, which directly conflicts withroutingMode: nativeand crashes the agent on startup.The following manual Helm upgrade resolves the issue:
helm upgrade cilium cilium/cilium \ -n kube-system \ --version 1.18.6 \ --reuse-values \ --set tunnelProtocol="" \ --set routingMode=native