Describe the bug
In an AWS EKS 1.31 cluster using Antrea 2.3.0, multicast traffic routing/cache is prematurely cleaned up when a receiver pod stops the subscription process, without terminating the pod itself. This leads to a situation where restarting the subscription inside the same pod does not restore multicast traffic flow unless an additional receiver joins, or the Antrea agent or the node is restarted.
To Reproduce
- Deploy three pods across different nodes:
1.1 One multicast traffic sender
1.2 Two multicast traffic receivers subscribing to the same multicast group
- Start multicast traffic generation.
- Confirm that both receivers are successfully receiving traffic.
- In the first receiver pod, manually stop the subscription process (e.g., using CTRL+C), without terminating the pod.
- Observe that the second receiver continues to receive traffic.
- Wait approximately 7 minutes.
- Attempt to restart the subscription process in the first receiver pod.
Expected
The first receiver pod should be able to rejoin the multicast group and start receiving traffic again without needing external intervention (e.g., reboot, adding another receiver).
Actual behavior
After about 7 minutes, Antrea appears to remove the multicast route and cache entries for the node where the first receiver is located. Restarting the subscription process in the pod does not restore multicast traffic. The traffic only resumes if:
- Any new receiver joins the multicast group (regardless of whether it is on the same node or a different node).
- The receiver pod or the Antrea agent pod is restarted.
Versions:
- Antrea version: 2.3.0
- Kubernetes version: v1.32.3-eks-bcf3d70
- Container runtime: containerd 1.7.27-1.amzn2023.0.2
- Linux kernel version on the Kubernetes Nodes: 6.1.132-147.221.amzn2023.x86_64
Additional context
- The issue occurs only when there were initially multiple receivers.
- If only one receiver is present, multicast traffic continues without issue.
- No pod restart or network event is triggered when stopping the subscription process
Describe the bug
In an AWS EKS 1.31 cluster using Antrea 2.3.0, multicast traffic routing/cache is prematurely cleaned up when a receiver pod stops the subscription process, without terminating the pod itself. This leads to a situation where restarting the subscription inside the same pod does not restore multicast traffic flow unless an additional receiver joins, or the Antrea agent or the node is restarted.
To Reproduce
1.1 One multicast traffic sender
1.2 Two multicast traffic receivers subscribing to the same multicast group
Expected
The first receiver pod should be able to rejoin the multicast group and start receiving traffic again without needing external intervention (e.g., reboot, adding another receiver).
Actual behavior
After about 7 minutes, Antrea appears to remove the multicast route and cache entries for the node where the first receiver is located. Restarting the subscription process in the pod does not restore multicast traffic. The traffic only resumes if:
Versions:
Additional context