Skip to content

Add test watching for overloaded network via etcd logging #30013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dgoodwin
Copy link
Contributor

Outcome from OCPBUGS-56921, lets see how common this is in the wild

Outcome from OCPBUGS-56921, lets see how common this is in the wild
@dgoodwin dgoodwin changed the title Add test watching overloaded network via etcd logging Add test watching for overloaded network via etcd logging Jul 23, 2025
@openshift-ci openshift-ci bot requested review from deads2k and sjenning July 23, 2025 17:00
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2025
Copy link

openshift-trt bot commented Jul 24, 2025

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New tests seen in this PR at sha: f876d6a

  • "[sig-etcd] etcd should not log excessive overloaded network messages" [Total: 44, Pass: 44, Fail: 0, Flake: 0]

Output: msg,
},
}
return []*junitapi.JUnitTestCase{failure}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking we don't want to flake and monitor. Looks like we are good on presubmits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcdOverloadedNetworkLimit = 10000
is rather high so we are likely good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I thought about it as well and came to same conclusion, if this is going on, I'd like to know asap. 44 passes on the pr so far though so hopefully very rare.

@neisw
Copy link
Contributor

neisw commented Jul 24, 2025

/lgtm
talked myself out of reviewing for flake

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2025
@dgoodwin
Copy link
Contributor Author

/hold

I mis-pushed another test I was working on that is nowhere near ready this morning, didn't realize I hadn't switched to a new branch. Sorry about that. Will force update to just the etcd overload test.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 24, 2025
@dgoodwin dgoodwin force-pushed the overloaded-network-montest branch from f876d6a to c38bda8 Compare July 24, 2025 15:46
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2025
@neisw
Copy link
Contributor

neisw commented Jul 24, 2025

/test unit

@dgoodwin
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 24, 2025
@neisw
Copy link
Contributor

neisw commented Jul 24, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2025
Copy link
Contributor

openshift-ci bot commented Jul 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, neisw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

openshift-ci bot commented Jul 24, 2025

@dgoodwin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn-upgrade c38bda8 link true /test e2e-gcp-ovn-upgrade
ci/prow/e2e-gcp-ovn-etcd-scaling c38bda8 link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/e2e-metal-ipi-serial-2of2 c38bda8 link false /test e2e-metal-ipi-serial-2of2
ci/prow/e2e-aws-ovn-single-node-upgrade c38bda8 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-ovn-serial-1of2 c38bda8 link true /test e2e-aws-ovn-serial-1of2
ci/prow/e2e-azure-ovn-upgrade c38bda8 link false /test e2e-azure-ovn-upgrade
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 c38bda8 link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-metal-ipi-virtualmedia c38bda8 link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-metal-ipi-ovn-ipv6 c38bda8 link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-openstack-ovn c38bda8 link false /test e2e-openstack-ovn
ci/prow/e2e-vsphere-ovn-etcd-scaling c38bda8 link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-gcp-disruptive c38bda8 link false /test e2e-gcp-disruptive
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway c38bda8 link false /test e2e-metal-ipi-ovn-dualstack-local-gateway
ci/prow/e2e-aws-ovn-kube-apiserver-rollout c38bda8 link false /test e2e-aws-ovn-kube-apiserver-rollout
ci/prow/e2e-metal-ipi-ovn c38bda8 link false /test e2e-metal-ipi-ovn
ci/prow/e2e-aws-ovn-fips c38bda8 link true /test e2e-aws-ovn-fips
ci/prow/e2e-aws-disruptive c38bda8 link false /test e2e-aws-disruptive
ci/prow/e2e-gcp-fips-serial-2of2 c38bda8 link false /test e2e-gcp-fips-serial-2of2
ci/prow/e2e-metal-ipi-serial-1of2 c38bda8 link false /test e2e-metal-ipi-serial-1of2
ci/prow/okd-scos-e2e-aws-ovn c38bda8 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-metal-ipi-serial-ovn-ipv6-2of2 c38bda8 link false /test e2e-metal-ipi-serial-ovn-ipv6-2of2
ci/prow/e2e-openstack-serial c38bda8 link false /test e2e-openstack-serial
ci/prow/e2e-metal-ipi-ovn-kube-apiserver-rollout c38bda8 link false /test e2e-metal-ipi-ovn-kube-apiserver-rollout
ci/prow/e2e-azure-ovn-etcd-scaling c38bda8 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-gcp-fips-serial-1of2 c38bda8 link false /test e2e-gcp-fips-serial-1of2
ci/prow/e2e-gcp-ovn-techpreview-serial-2of2 c38bda8 link false /test e2e-gcp-ovn-techpreview-serial-2of2
ci/prow/e2e-metal-ipi-serial-ovn-ipv6-1of2 c38bda8 link false /test e2e-metal-ipi-serial-ovn-ipv6-1of2
ci/prow/e2e-metal-ipi-ovn-dualstack c38bda8 link false /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-aws-ovn-etcd-scaling c38bda8 link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-hypershift-conformance c38bda8 link false /test e2e-hypershift-conformance

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

openshift-trt bot commented Jul 24, 2025

Job Failure Risk Analysis for sha: c38bda8

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:azure SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-gcp-disruptive IncompleteTests
Tests for this run (107) are below the historical average (214): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Open Bugs
etcd-scaling jobs failing ~60% of the time
pull-ci-openshift-origin-main-e2e-gcp-ovn-techpreview-serial-2of2 Medium
[sig-instrumentation] disruption/metrics-api connection/reused should be available throughout the test
Potential external regression detected for High Risk Test analysis
---
[sig-instrumentation] disruption/metrics-api connection/new should be available throughout the test
Potential external regression detected for High Risk Test analysis

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: c38bda8

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade High - "[sig-etcd] etcd should not log excessive overloaded network messages" is a new test that failed 1 time(s) against the current commit

New tests seen in this PR at sha: c38bda8

  • "[sig-etcd] etcd should not log excessive overloaded network messages" [Total: 33, Pass: 32, Fail: 1, Flake: 0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants