Skip to content

antrea-agent DaemonSet not ready consistently when running e2e with all features enabled #4673

@tnqn

Description

@tnqn

Describe the bug

TestWireguard failed consistently because of DaemonSet not ready within 1m30.
https://github.com/antrea-io/antrea/actions/runs/4312092832/jobs/7522929072
https://github.com/antrea-io/antrea/actions/runs/4312423243/jobs/7524153226

--- PASS: TestTrafficControl (16.72s)
    --- PASS: TestTrafficControl/TestMirrorToRemote (2.33s)
    --- PASS: TestTrafficControl/TestMirrorToLocal (2.42s)
    --- PASS: TestTrafficControl/TestRedirectToLocal (4.01s)
=== RUN   TestUpgrade
    upgrade_test.go:30: Skipping test as we are not testing for upgrade
--- SKIP: TestUpgrade (0.00s)
=== RUN   TestVMAgent
    fixtures.go:147: Skipping test as there no Linux or Windows VMs
--- SKIP: TestVMAgent (0.00s)
=== RUN   TestWireGuard
    fixtures.go:228: Creating 'testwireguard-lxbil1rz' K8s Namespace
2023/03/02 10:27:31 Applying Antrea YAML
2023/03/02 10:27:32 Waiting for all Antrea DaemonSet Pods
2023/03/02 10:27:33 Checking CoreDNS deployment
    fixtures.go:120: The following modules have been found on Node 'kind-control-plane': [wireguard]
    fixtures.go:120: The following modules have been found on Node 'kind-worker': [wireguard]
    fixtures.go:120: The following modules have been found on Node 'kind-worker2': [wireguard]
I0302 10:27:33.837933   18943 framework.go:2379] Sending SIGINT to 'antrea-agent-coverage'
I0302 10:27:33.916128   18943 framework.go:2385] Copying coverage files from Pod 'antrea-agent-9542l'
I0302 10:27:34.223265   18943 framework.go:2379] Sending SIGINT to 'antrea-agent-coverage'
I0302 10:27:34.307809   18943 framework.go:2385] Copying coverage files from Pod 'antrea-agent-jg4tq'
I0302 10:27:34.664204   18943 framework.go:2379] Sending SIGINT to 'antrea-agent-coverage'
I0302 10:27:34.750361   18943 framework.go:2385] Copying coverage files from Pod 'antrea-agent-tsthd'
    wireguard_test.go:53: Failed to enable WireGuard tunnel: error when restarting antrea-agent Pod: antrea-agent DaemonSet not ready within 1m30s;

But the problem is not in Wireguard, but because the agent failed to set NO_FLOOD config for traffic control ports created in previous test.

I0302 10:27:40.551956      15 log_file.go:93] Set log file max size to 104857600
I0302 10:27:40.553839      15 feature_gate.go:245] feature gates: &{map[AllAlpha:true AllBeta:true AntreaIPAM:true AntreaPolicy:true AntreaProxy:true Egress:true EndpointSlice:true ExternalNode:true FlowExporter:true IPsecCertAuth:true L7NetworkPolicy:true Multicast:false Multicluster:true NetworkPolicyStats:true NodeIPAM:true NodePortLocal:true SecondaryNetwork:true ServiceExternalIP:true SupportBundleCollection:true TopologyAwareHints:true Traceflow:true TrafficControl:true]}
I0302 10:27:40.554066      15 agent.go:99] Starting Antrea agent (version v1.11.0-dev-6441929)
I0302 10:27:40.554168      15 client.go:87] No kubeconfig file was specified. Falling back to in-cluster config
I0302 10:27:40.555155      15 prometheus.go:171] Initializing prometheus metrics
I0302 10:27:40.555510      15 ovs_client.go:71] Connecting to OVSDB at address /var/run/openvswitch/db.sock
I0302 10:27:40.558001      15 agent.go:400] Setting up node network
I0302 10:27:40.585293      15 agent.go:1017] "Setting Node MTU" MTU=1450
I0302 10:27:40.585489      15 agent.go:1036] "Configured IPv4 Subnet CIDR on this Node" subnet="10.244.2.0/24"
I0302 10:27:40.588088      15 ovs_client.go:114] Bridge exists: c00c270d-9fad-462a-b348-ac31c41e0502
I0302 10:27:40.600836      15 agent.go:372] "Adding interface to cache" interfaceName="antrea-l7-tap1"
I0302 10:27:40.601020      15 agent.go:372] "Adding interface to cache" interfaceName="antrea-tun0"
I0302 10:27:40.601203      15 agent.go:372] "Adding interface to cache" interfaceName="agnhost-5e0db3"
I0302 10:27:40.601317      15 agent.go:372] "Adding interface to cache" interfaceName="antrea-l7-tap0"
I0302 10:27:40.601463      15 agent.go:372] "Adding interface to cache" interfaceName="return1"
I0302 10:27:40.601576      15 agent.go:372] "Adding interface to cache" interfaceName="local-pa-5c132c"
I0302 10:27:40.601684      15 agent.go:372] "Adding interface to cache" interfaceName="antrea-gw0"
I0302 10:27:40.601826      15 agent.go:826] Tunnel port antrea-tun0 already exists on OVS bridge
I0302 10:27:40.601960      15 agent.go:710] Gateway port antrea-gw0 already exists on OVS bridge
I0302 10:27:40.602061      15 agent.go:716] Setting gateway interface antrea-gw0 MTU to 1450
I0302 10:27:40.603077      15 net_linux.go:176] IP configuration for interface antrea-gw0 does not need to change
I0302 10:27:40.603414      15 net_linux.go:176] IP configuration for interface antrea-gw0 does not need to change
I0302 10:27:40.611939      15 agent.go:392] "Set port no-flood successfully" PortName="antrea-l7-tap1"
I0302 10:27:40.617339      15 agent.go:392] "Set port no-flood successfully" PortName="antrea-l7-tap0"
F0302 10:27:40.624021      15 main.go:53] Error running agent: error initializing agent: failed to set port return1 with no-flood: fail to set no-food config for port -1 on bridge br-int: exit status 1, stderr: ovs-ofctl: invalid option -- '1'
goroutine 110 [running]:
k8s.io/klog/v2/internal/dbg.Stacks(0x0)
	/go/pkg/mod/k8s.io/klog/v2@v2.80.1/internal/dbg/dbg.go:35 +0x89
k8s.io/klog/v2.(*loggingT).output(0x3e3c1e0, 0x3, 0x0, 0xc0000f3b90, 0x1, {0x30d057a?, 0x1?}, 0xc0002db800?, 0x0)
	/go/pkg/mod/k8s.io/klog/v2@v2.80.1/klog.go:935 +0x686
k8s.io/klog/v2.(*loggingT).printfDepth(0x3e3c1e0, 0x0?, 0x0, {0x0, 0x0}, 0x0?, {0x266e80d, 0x17}, {0xc00046d410, 0x1, ...})
	/go/pkg/mod/k8s.io/klog/v2@v2.80.1/klog.go:736 +0x1f3
k8s.io/klog/v2.(*loggingT).printf(...)
	/go/pkg/mod/k8s.io/klog/v2@v2.80.1/klog.go:718
k8s.io/klog/v2.Fatalf(...)
	/go/pkg/mod/k8s.io/klog/v2@v2.80.1/klog.go:1621
antrea.io/antrea/cmd/antrea-agent.newAgentCommand.func1(0xc0001ee300?, {0xc0006fef00, 0x0, 0x8})
	/antrea/cmd/antrea-agent/main.go:53 +0x2f4
github.com/spf13/cobra.(*Command).execute(0xc0001ee300, {0xc0000f90f0, 0x8, 0x8})
	/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920 +0x847
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001ee300)
	/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
	/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968
antrea.io/antrea/cmd/antrea-agent.main()
	/antrea/cmd/antrea-agent/main.go:32 +0x1e
github.com/confluentinc/bincover.RunTest(0x27898f0)
	/go/pkg/mod/github.com/confluentinc/bincover@v0.1.0/instrument_bin.go:93 +0x210
antrea.io/antrea/cmd/antrea-agent.TestBincoverRunMain(0x11?)
	/antrea/cmd/antrea-agent/bincover_run_main_test.go:27 +0x25
testing.tRunner(0xc00018f040, 0x27898b8)
	/usr/local/go/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1493 +0x35f

It should be related to #4654. There are a few problems need to resolve:

  1. If TrafficControl is deleted during antrea-agent's downtime, how to ensure the corresponding ports are deleted.
  2. Setting NO_FLOOD should check if the OVS port is still valid.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions