Skip to content

Worker node kubelet crash-loop after upgrade: lineinfile task writes server: at wrong YAML indentation in kubelet.conf #13277

@liofko

Description

@liofko

What happened?

The Update server field in kubelet kubeconfig task in roles/kubernetes/kubeadm/tasks/main.yml writes the server: field at 4-space indentation. After kubeadm upgrade node processes kubelet.conf, the file uses a 4-character list prefix style (- cluster:), which means 4 spaces is outside the cluster: block. This causes kubelet v1.35+ to crash-loop on worker nodes after an upgrade with:

invalid configuration: no server found for cluster "default-cluster"

After lineinfile (current broken behavior):

clusters:
  - cluster:
        certificate-authority: /etc/kubernetes/ssl/ca.crt
    server: https://localhost:6443       # wrong: 4 spaces, OUTSIDE cluster block
    name: kubernetes

As results, after the upgrade, in the worker node kubelet is not running, and we are still having:
Taints: node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule

What did you expect to happen?

clusters:
    - cluster:
           certificate-authority: /etc/kubernetes/ssl/ca.crt
           server: https://172.0.0.1:6443   # correct: 8 spaces, inside cluster block
    name: kubernetes

How can we reproduce it (as minimally and precisely as possible)?

Deploy a kubespray cluster with at least one worker-only node (not in kube_control_plane)
Configure with nginx-proxy (default when kube_apiserver_endpoint is localhost/127.0.0.1)
Run upgrade-cluster.yml targeting Kubernetes v1.35+
Observe the worker node become NotReady after upgrade; kubelet logs show:
invalid configuration: no server found for cluster "default-cluster"

OS

RHEL 9

Version of Ansible

ansible [core 2.18.16]
config file = None
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /opt/rbbn/ace-infra/pyenv/lib/python3.14/site-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = /opt/rbbn/ace-infra/pyenv/bin/ansible
python version = 3.14.4 (main, May 12 2026, 10:57:44) [GCC 8.5.0 20210514 (Red Hat 8.5.0-16.0.2)] (/opt/rbbn/ace-infra/pyenv/bin/python)
jinja version = 3.1.6
libyaml = True

Version of Python

Python 3.14.4

Version of Kubespray (commit)

2.31

Network plugin used

calico

Full inventory with variables

all:
  hosts:
    main-1-1:
      ansible_host: < ip address >
      access_ip: < ip address >
      ip: < ip address >
      ip6: < ip address >
      etcd_member_name: main-1-1
    main-2-1:
      ansible_host: < ip address >
      access_ip: < ip address >
      ip: < ip address >
      ip6: < ip address >
      etcd_member_name: main-2-1
    main-3-1:
      ansible_host: < ip address >
      access_ip: < ip address >
      ip: < ip address >
      ip6: < ip address >
      etcd_member_name: main-3-1
    first-single-1:
      ansible_host: < ip address >
      access_ip: < ip address >
      ip: < ip address >
      ip6: < ip address >
  children:
    kube_control_plane:
      hosts:
        main-1-1:
        main-2-1:
        main-3-1:
    etcd:
      hosts:
        main-1-1:
        main-2-1:
        main-3-1:
    kube_node:
      hosts:
        main-1-1:
        main-2-1:
        main-3-1:
        first-single-1:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
    calico_rr:
      hosts: {}

Command used to invoke ansible

ansible-playbook -i hosts.yaml --become --extra-vars ansible_user=epic --extra-vars ansible_ssh_private_key_file=/home/user/.ssh/id_user --forks 10 -v --timeout 60 ./upgrade-cluster.yml

Output of ansible run

The ansible run ended successfully, althou worker is tainted, kubelet not running and and:
calico-node-tjxsg 0/1 Pending 0 21m first-single-1
kube-proxy-w625q 0/1 Pending 0 2m23s first-single-1

Anything else we need to know

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    RHEL 9kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions