What happened?
The Update server field in kubelet kubeconfig task in roles/kubernetes/kubeadm/tasks/main.yml writes the server: field at 4-space indentation. After kubeadm upgrade node processes kubelet.conf, the file uses a 4-character list prefix style (- cluster:), which means 4 spaces is outside the cluster: block. This causes kubelet v1.35+ to crash-loop on worker nodes after an upgrade with:
invalid configuration: no server found for cluster "default-cluster"
After lineinfile (current broken behavior):
clusters:
- cluster:
certificate-authority: /etc/kubernetes/ssl/ca.crt
server: https://localhost:6443 # wrong: 4 spaces, OUTSIDE cluster block
name: kubernetes
As results, after the upgrade, in the worker node kubelet is not running, and we are still having:
Taints: node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule
What did you expect to happen?
clusters:
- cluster:
certificate-authority: /etc/kubernetes/ssl/ca.crt
server: https://172.0.0.1:6443 # correct: 8 spaces, inside cluster block
name: kubernetes
How can we reproduce it (as minimally and precisely as possible)?
Deploy a kubespray cluster with at least one worker-only node (not in kube_control_plane)
Configure with nginx-proxy (default when kube_apiserver_endpoint is localhost/127.0.0.1)
Run upgrade-cluster.yml targeting Kubernetes v1.35+
Observe the worker node become NotReady after upgrade; kubelet logs show:
invalid configuration: no server found for cluster "default-cluster"
OS
RHEL 9
Version of Ansible
ansible [core 2.18.16]
config file = None
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /opt/rbbn/ace-infra/pyenv/lib/python3.14/site-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /opt/rbbn/ace-infra/pyenv/bin/ansible
python version = 3.14.4 (main, May 12 2026, 10:57:44) [GCC 8.5.0 20210514 (Red Hat 8.5.0-16.0.2)] (/opt/rbbn/ace-infra/pyenv/bin/python)
jinja version = 3.1.6
libyaml = True
Version of Python
Python 3.14.4
Version of Kubespray (commit)
2.31
Network plugin used
calico
Full inventory with variables
all:
hosts:
main-1-1:
ansible_host: < ip address >
access_ip: < ip address >
ip: < ip address >
ip6: < ip address >
etcd_member_name: main-1-1
main-2-1:
ansible_host: < ip address >
access_ip: < ip address >
ip: < ip address >
ip6: < ip address >
etcd_member_name: main-2-1
main-3-1:
ansible_host: < ip address >
access_ip: < ip address >
ip: < ip address >
ip6: < ip address >
etcd_member_name: main-3-1
first-single-1:
ansible_host: < ip address >
access_ip: < ip address >
ip: < ip address >
ip6: < ip address >
children:
kube_control_plane:
hosts:
main-1-1:
main-2-1:
main-3-1:
etcd:
hosts:
main-1-1:
main-2-1:
main-3-1:
kube_node:
hosts:
main-1-1:
main-2-1:
main-3-1:
first-single-1:
k8s_cluster:
children:
kube_control_plane:
kube_node:
calico_rr:
hosts: {}
Command used to invoke ansible
ansible-playbook -i hosts.yaml --become --extra-vars ansible_user=epic --extra-vars ansible_ssh_private_key_file=/home/user/.ssh/id_user --forks 10 -v --timeout 60 ./upgrade-cluster.yml
Output of ansible run
The ansible run ended successfully, althou worker is tainted, kubelet not running and and:
calico-node-tjxsg 0/1 Pending 0 21m first-single-1
kube-proxy-w625q 0/1 Pending 0 2m23s first-single-1
Anything else we need to know
No response
What happened?
The Update server field in kubelet kubeconfig task in roles/kubernetes/kubeadm/tasks/main.yml writes the server: field at 4-space indentation. After kubeadm upgrade node processes kubelet.conf, the file uses a 4-character list prefix style (- cluster:), which means 4 spaces is outside the cluster: block. This causes kubelet v1.35+ to crash-loop on worker nodes after an upgrade with:
invalid configuration: no server found for cluster "default-cluster"
After lineinfile (current broken behavior):
As results, after the upgrade, in the worker node kubelet is not running, and we are still having:
Taints: node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule
What did you expect to happen?
How can we reproduce it (as minimally and precisely as possible)?
Deploy a kubespray cluster with at least one worker-only node (not in kube_control_plane)
Configure with nginx-proxy (default when kube_apiserver_endpoint is localhost/127.0.0.1)
Run upgrade-cluster.yml targeting Kubernetes v1.35+
Observe the worker node become NotReady after upgrade; kubelet logs show:
invalid configuration: no server found for cluster "default-cluster"
OS
RHEL 9
Version of Ansible
ansible [core 2.18.16]
config file = None
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /opt/rbbn/ace-infra/pyenv/lib/python3.14/site-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /opt/rbbn/ace-infra/pyenv/bin/ansible
python version = 3.14.4 (main, May 12 2026, 10:57:44) [GCC 8.5.0 20210514 (Red Hat 8.5.0-16.0.2)] (/opt/rbbn/ace-infra/pyenv/bin/python)
jinja version = 3.1.6
libyaml = True
Version of Python
Python 3.14.4
Version of Kubespray (commit)
2.31
Network plugin used
calico
Full inventory with variables
Command used to invoke ansible
ansible-playbook -i hosts.yaml --become --extra-vars ansible_user=epic --extra-vars ansible_ssh_private_key_file=/home/user/.ssh/id_user --forks 10 -v --timeout 60 ./upgrade-cluster.yml
Output of ansible run
The ansible run ended successfully, althou worker is tainted, kubelet not running and and:
calico-node-tjxsg 0/1 Pending 0 21m first-single-1
kube-proxy-w625q 0/1 Pending 0 2m23s first-single-1
Anything else we need to know
No response