What happened?
Hi all,
to trigger calico_rr installation, we have added calico_rr group in our invetory.
All tasks needed to complete its installation were executed.
Inside roles/network_plugin/calico/rr/tasks/update-node.yml the block has a reiteration logic that succeeded on our side after 3 reiteration.
Playbook cluster.yml set "any_errors_fatal" var to "true" if not defined
... ... ... ...
- name: Install Calico Route Reflector
hosts: calico_rr
gather_facts: false
any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults }
- { role: network_plugin/calico/rr, tags: ['network', 'calico_rr'] }
... ... ... ...
I suppose that failures occurred during rescued successfully iterations, due to the value of "any_errors_fatal", cause the premature stop of the whole playbook avoiding execution of all kuberntes apps tasks.
In this output extract, you can see a successful recap, but no other tasks are triggered after caclico_rr install end.
In the same extract you can see that "retry_count" var is incremented to "2" (<10 limit) during rescue of the block.
extract_calico_rr.txt
We aren't executing playbook with any tags or skip and removing calico_rr group from inventory we succeeded to have all kubernetes app installed.
Could you help us?
Thanks
Massimiliano
What did you expect to happen?
We are expecting to have calico_rr installed and kubernetes-app enbaled too, like metallb.
We are asking if for this specific case the default value "true" for "any_errors_fatal" is really needed or is better to use a check to verify if failures are real or not.
How can we reproduce it (as minimally and precisely as possible)?
In our case issue happens just trying to install calico_rr (adding group in inventory) and adding "metallb_enabled: true" into the play vars.
No metallb is installed at all if the execution of task "network_plugin/calico/rr/tasks/update-node.yml:34" fails on one or more hosts of the target group.
OS
RHEL 9
Version of Ansible
ansible [core 2.18.16]
config file = None
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /opt/python3_12_0/venvs/kubespray_main/lib64/python3.12/site-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = ./ansible
python version = 3.12.1 (main, Nov 25 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-4)] (/opt/python3_12_0/venvs/kubespray_main/bin/python3)
jinja version = 3.1.6
libyaml = True
Version of Python
Python 3.12.1
Version of Kubespray (commit)
2.31.0
Network plugin used
calico
Full inventory with variables
---
#####################################
upgrade_infra: false
reset_infra: false
#####################################
ansible_user: k8sansible
ansible_ssh_private_key_file: ./secrets/.ssh/k8sansiblersa
ansible_timeout: 120
ansible_become_timeout: 120
ansible_ssh_extra_args: >-
-o ServerAliveInterval=30
-o ServerAliveCountMax=10
-o StrictHostKeyChecking=no
-o ControlMaster=auto
-o ControlPersist=30m
-o ControlPath=/tmp/ansible-ssh-%h-%p-%r
# Debug var
unsafe_show_logs: true
################################
## KUBESPRAY vars
kubespray:
python:
version: "3.12.0"
download_run_once: true
download_localhost: true
download_force_cache: true # MUST be set to TRUE to use ansible CONTROLLER as CACHE
download_cache_dir: /opt/gitlab-runner/ansible/kubespray_cache
download_container: true
kube_image_repo: acr.azurecr.io/k8s
quay_image_repo: acr.azurecr.io/quayiok8s
docker_image_repo: acr.azurecr.io/dockeriok8s
################################
## K8S vars
bin_dir: /usr/bin
kube_version: 1.35.0
kube_network_plugin: calico
kube_log_level: 2
disable_ipv6: true
disable_ipv6_dns: true
disable_selinux: true
etcd_deployment_type: host
k8s_image_pull_policy: IfNotPresent
# Container management
container_manager: containerd
containerd_storage_dir: /opt/containerd/images
containerd_state_dir: /opt/containerd/state
containerd_registries_mirrors:
- prefix: acr.azurecr.io
mirrors:
- host: https://acr.azurecr.io
capabilities: ["pull", "resolve"]
skip_verify: false
header:
Authorization: "Basic *************************************************************************************"
################################
## DNS CONFIGURATION
# NO CHANGES MUST be APLLIED to nodes /etc/resolv.conf cause company DNS servers cannot be overridden.
# They ARE MANDATORY to all commands underlying lookup to AD and DS for authentication and authorization of nodes and users.
resolvconf_mode: none
# Upstream DNS servers for early cluster deployment and fallback
# If an infrastructure service (outside of cluster) are defined through FQDN somewhere
# following DNS will be the only one used, otherwise timeout.
upstream_dns_servers:
- 172.17.8.105
# CoreDNS static host entries for Azure ACR using dns_etchosts
# This is used by both CoreDNS and NodeLocalDNS
# dns_etchosts: |
# 10.163.68.69 acr.azurecr.io
# 10.163.68.68 acrfbkpronpci01.germanywestcentral.data.azurecr.io
# /etc/hosts custom entries for static host resolutions (for node-level resolution)
custom_etc_hosts:
# Azure ACR - static IP resolution via /etc/hosts (Private Endpoint IP)
- domain: "acr.azurecr.io"
ip: "10.163.68.69"
################################
## ADDONS
dashboard_enabled: true
metrics_server_enabled: true
helm_enabled: true
cert_manager_enabled: true
# Override cert-manager image repo (default: quay.io/jetstack)
jetstack_image_repo: acr.azurecr.io/quayiok8s/jetstack
################################
## METALLB configuration
# EXAMPLE: https://github.com/TayoG/Kubernetes-kubespray/blob/master/docs/metallb.md
kube_proxy_strict_arp: true
metallb_enabled: true
metallb_speaker_enabled: true
metallb_namespace: metallb
metallb_config:
address_pools:
primary:
ip_range:
- 172.17.253.8-172.17.253.10
metallb_auto_assign: true
layer2:
- primary
#abilitare InPlacePodVerticalScaling
kube_feature_gates:
- InPlacePodVerticalScaling=true
Command used to invoke ansible
ansible-playbook -vv -i $inventory --become --become-user root $wd/../ansible/kubespray_setup.yml
Output of ansible run
Just final tasks cause whole output play, even if compressed is not loaded.
extract_calico_rr.txt
Anything else we need to know
No response
What happened?
Hi all,
to trigger calico_rr installation, we have added calico_rr group in our invetory.
All tasks needed to complete its installation were executed.
Inside roles/network_plugin/calico/rr/tasks/update-node.yml the block has a reiteration logic that succeeded on our side after 3 reiteration.
Playbook cluster.yml set "any_errors_fatal" var to "true" if not defined
I suppose that failures occurred during rescued successfully iterations, due to the value of "any_errors_fatal", cause the premature stop of the whole playbook avoiding execution of all kuberntes apps tasks.
In this output extract, you can see a successful recap, but no other tasks are triggered after caclico_rr install end.
In the same extract you can see that "retry_count" var is incremented to "2" (<10 limit) during rescue of the block.
extract_calico_rr.txt
We aren't executing playbook with any tags or skip and removing calico_rr group from inventory we succeeded to have all kubernetes app installed.
Could you help us?
Thanks
Massimiliano
What did you expect to happen?
We are expecting to have calico_rr installed and kubernetes-app enbaled too, like metallb.
We are asking if for this specific case the default value "true" for "any_errors_fatal" is really needed or is better to use a check to verify if failures are real or not.
How can we reproduce it (as minimally and precisely as possible)?
In our case issue happens just trying to install calico_rr (adding group in inventory) and adding "metallb_enabled: true" into the play vars.
No metallb is installed at all if the execution of task "network_plugin/calico/rr/tasks/update-node.yml:34" fails on one or more hosts of the target group.
OS
RHEL 9
Version of Ansible
ansible [core 2.18.16]
config file = None
configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /opt/python3_12_0/venvs/kubespray_main/lib64/python3.12/site-packages/ansible
ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
executable location = ./ansible
python version = 3.12.1 (main, Nov 25 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-4)] (/opt/python3_12_0/venvs/kubespray_main/bin/python3)
jinja version = 3.1.6
libyaml = True
Version of Python
Python 3.12.1
Version of Kubespray (commit)
2.31.0
Network plugin used
calico
Full inventory with variables
Command used to invoke ansible
ansible-playbook -vv -i $inventory --become --become-user root $wd/../ansible/kubespray_setup.yml
Output of ansible run
Just final tasks cause whole output play, even if compressed is not loaded.
extract_calico_rr.txt
Anything else we need to know
No response