Make the apiversion-upgrade management cluster HA#6329
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
c6ab190 to
0ac7656
Compare
|
/test pull-cluster-api-provider-azure-apiversion-upgrade |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6329 +/- ##
=======================================
Coverage 43.85% 43.85%
=======================================
Files 291 291
Lines 25344 25344
=======================================
Hits 11114 11114
Misses 13457 13457
Partials 773 773 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Passed on first run: https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/6329/pull-cluster-api-provider-azure-apiversion-upgrade/2061535087366770688 I'll fix the linter errors and run it again. |
0ac7656 to
fc83052
Compare
|
/test pull-cluster-api-provider-azure-apiversion-upgrade |
|
Passed again on the second try: https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/6329/pull-cluster-api-provider-azure-apiversion-upgrade/2061568060594065408 Given that this job only passes ~ 20% of the time, I think this fix is working. |
What type of PR is this?
/kind flake
What this PR does / why we need it:
The
apiversion-upgradejob creates a self-hosted management cluster, upgrades allproviders on it with
clusterctl upgrade apply, then scales workload clusters. CAPI'sClusterctlUpgradeSpechardcodes a single control-plane machine for that managementcluster, so its public API server load balancer has one backend. Under the load of the
provider upgrade, that single node can briefly go unreachable, which surfaces as
"provider not ready after 5m0s" and fails the (required) job. It currently passes less
than 20% of the time.
This vendors
ClusterctlUpgradeSpecinto CAPZ with one addition, aManagementClusterControlPlaneMachineCountfield, and runs the upgrade specs with a3-node HA management control plane. The clone is a near-verbatim copy of CAPI v1.13.2;
the only functional change is marked "CAPZ ADDITION".
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
See kubernetes-sigs/cluster-api#13766.
The intent is to prove out the fix here, then cherry-pick the
ManagementClusterControlPlaneMachineCountfield upstream to cluster-api. Once it landsupstream,
test/e2e/clusterctl_upgrade.goshould be deleted and the callers switchedback to
capi_e2e.ClusterctlUpgradeSpec.TODOs:
Release note: