[Multicast] Remove stale local members in the group cache#7154
Conversation
|
/test-all |
antoninbas
left a comment
There was a problem hiding this comment.
The PR description reads:
The fix is to generate IGMP leave message for the stale members even if it is the last member on the current Node. It also calls checkLastMember in function clearStaleGroups when no local members are left, rather than directly adds the group into the worker queue.
This would imply that we are making 2 changes as part of this PR, but really I only see one (replacing c.queue.Add with a call to c. checkLastMember). I don't see the first change described: "generate IGMP leave message for the stale members even if it is the last member on the current Node". What am I missing?
| @@ -1393,6 +1461,6 @@ func createIGMPJoinMessage(groups []net.IP, version uint8) []util.Message { | |||
| } | |||
|
|
|||
| func TestMain(m *testing.M) { | |||
There was a problem hiding this comment.
I wonder why we have a TestMain specifically for this package
There was a problem hiding this comment.
I can't recall the exact reason, maybe it is to reset igmpMaxResponseTime. Removed the function in the latest change, and introduced a dedicated function to reset the igmpMaxResponseTime instead.
ce3572e to
8144d73
Compare
I add an additional check that no local members are existing with the if condition: https://github.com/antrea-io/antrea/pull/7154/files#diff-8e302e747c71db4b020ddf40f1debcb088af5aaea45266888f5f153c1be1b67bR184 The issue was actually because the groupStatus.lastIGMPReport is timeout, and there still exists local members in the cache. The old logic ignores the existence of the local members and directly enqueue the group. |
0101102 to
9f98601
Compare
|
/test-multicast-e2e |
9f98601 to
e9789db
Compare
|
/test-multicast-e2e |
I think I get it now. Before this change when there was a single member left and timing out we would always hit the first case ( |
|
Can one of the admins verify this patch? |
antoninbas
left a comment
There was a problem hiding this comment.
does the definition of the groupIsStale function need to change as well?
Thanks for catching it, yes, we should update it . update: Thinking more, I would leave the only condition of "len(status.localmembers) == 0" to decide if a group is stale, as we don't need to wail until multicast group timed out after the last local receiver has left when calling |
ec788e3 to
0359504
Compare
|
/test-multicast-e2e |
|
Any other comments in this PR @antoninbas @tnqn ? |
antoninbas
left a comment
There was a problem hiding this comment.
LGTM, please backport as needed
tnqn
left a comment
There was a problem hiding this comment.
The fix LGTM, but I wonder if status.lastIGMPReport is still useful.
| if diff > c.mcastGroupTimeout { | ||
| // Notify worker to remove the group from groupCache if all its members are not updated before mcastGroupTimeout. | ||
| c.queue.Add(status.group.String()) | ||
| if diff > c.mcastGroupTimeout && len(status.localMembers) == 0 { |
There was a problem hiding this comment.
question: is there still meaningful to track when the last IGMP report is received and have the check here, since every member will be checked individually?
There was a problem hiding this comment.
Removed the field lastIGMPReport in struct GroupMemberStatus
0359504 to
7af7928
Compare
|
/test-multicast-e2e |
| // Create a "leave" event for a local member if it is not updated before mcastGroupTimeout. | ||
| for member, lastUpdate := range status.localMembers { | ||
| if now.Sub(lastUpdate) > c.mcastGroupTimeout { | ||
| containerDiff := now.Sub(lastUpdate) |
There was a problem hiding this comment.
could it be just named diff now as there is no name conflict? containerDiff makes people wonder what container represents here.
This change is to resolve the issue that the same receiver may fail to receive multicast packets after it rejoins the group with encap mode. The issue happens when the last local member has left the Multicast group, but there exists receivers located on other Nodes in the cluster. The issue was introduced because the Multicast controller directly adds the group into the worker queue but didn't update status in the cache, which makes the re-join event from the same Pod is ignored. The fix is to generate IGMP leave message for the stale members even if it is the last member on the current Node. It also calls checkLastMember in function `clearStaleGroups` when no local members are left, rather than directly adds the group into the worker queue. It also removes the field "lastIGMPReport" in struct GroupMemberStatus. Signed-off-by: Wenying Dong <wenying.dong@broadcom.com>
7af7928 to
b8b9277
Compare
|
/test-multicast-e2e |
This change is to resolve the issue that the same receiver may fail to receive multicast packets after it rejoins the group with encap mode. The issue happens when the last local member has left the Multicast group, but there exists receivers located on other Nodes in the cluster. The issue was introduced because the Multicast controller directly adds the group into the worker queue but didn't update status in the cache, which makes the re-join event from the same Pod is ignored. The fix is to generate IGMP leave message for the stale members even if it is the last member on the current Node. It also calls checkLastMember in function `clearStaleGroups` when no local members are left, rather than directly adds the group into the worker queue. It also removes the field "lastIGMPReport" in struct GroupMemberStatus. Fixes antrea-io#7140 Signed-off-by: Wenying Dong <wenying.dong@broadcom.com>
This change is to resolve the issue that the same receiver may fail to receive multicast packets after it rejoins the group with encap mode. The issue happens when the last local member has left the Multicast group, but there exists receivers located on other Nodes in the cluster. The issue was introduced because the Multicast controller directly adds the group into the worker queue but didn't update status in the cache, which makes the re-join event from the same Pod is ignored. The fix is to generate IGMP leave message for the stale members even if it is the last member on the current Node. It also calls checkLastMember in function `clearStaleGroups` when no local members are left, rather than directly adds the group into the worker queue. It also removes the field "lastIGMPReport" in struct GroupMemberStatus. Fixes antrea-io#7140 Signed-off-by: Wenying Dong <wenying.dong@broadcom.com>
This change is to resolve the issue that the same receiver may fail to receive multicast packets after it rejoins the group with encap mode. The issue happens when the last local member has left the Multicast group, but there exists receivers located on other Nodes in the cluster. The issue was introduced because the Multicast controller directly adds the group into the worker queue but didn't update status in the cache, which makes the re-join event from the same Pod is ignored. The fix is to generate IGMP leave message for the stale members even if it is the last member on the current Node. It also calls checkLastMember in function `clearStaleGroups` when no local members are left, rather than directly adds the group into the worker queue. It also removes the field "lastIGMPReport" in struct GroupMemberStatus. Fixes #7140 Signed-off-by: Wenying Dong <wenying.dong@broadcom.com>
This change is to resolve the issue that the same receiver may fail to receive multicast packets after it rejoins the group with encap mode. The issue happens when the last local member has left the Multicast group, but there exists receivers located on other Nodes in the cluster. The issue was introduced because the Multicast controller directly adds the group into the worker queue but didn't update status in the cache, which makes the re-join event from the same Pod is ignored. The fix is to generate IGMP leave message for the stale members even if it is the last member on the current Node. It also calls checkLastMember in function `clearStaleGroups` when no local members are left, rather than directly adds the group into the worker queue. It also removes the field "lastIGMPReport" in struct GroupMemberStatus. Fixes #7140 Signed-off-by: Wenying Dong <wenying.dong@broadcom.com>
This change is to resolve the issue that the same receiver may fail to receive multicast packets after it rejoins the group with encap mode. The issue happens when the last local member has left the Multicast group, but there exists receivers located on other Nodes in the cluster.
The issue was introduced because the Multicast controller directly adds the group into the worker queue but didn't update status in the cache, which makes the re-join event from the same Pod is ignored.
The fix is to generate IGMP leave message for the stale members even if it is the last member on the current Node. It also calls checkLastMember in function
clearStaleGroupswhen no local members are left, rather than directly adds the group into the worker queue. It also removes the field "lastIGMPReport" in struct GroupMemberStatus.Fix: #7140