-
Notifications
You must be signed in to change notification settings - Fork 80
Safer lifecycle watch api #961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
846d9c2
!watch Safer lifecycle watch API
ktoso 91ad117
multi node deathwatch for distributed actors
ktoso 9c5ab35
+testkit improve testkit for lifecycle tests
ktoso 50c97c1
!docker use 5.7 nightlies
ktoso b842358
5.7 "nightly" workarounds; beta 1 has no issues
ktoso 5955887
workaround fragile closures with generic actors /and inheriting context
ktoso File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
155 changes: 155 additions & 0 deletions
155
Sources/DistributedActors/Cluster/DistributedNodeDeathWatcher.swift
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,155 @@ | ||
| //===----------------------------------------------------------------------===// | ||
| // | ||
| // This source file is part of the Swift Distributed Actors open source project | ||
| // | ||
| // Copyright (c) 2018-2022 Apple Inc. and the Swift Distributed Actors project authors | ||
| // Licensed under Apache License v2.0 | ||
| // | ||
| // See LICENSE.txt for license information | ||
| // See CONTRIBUTORS.md for the list of Swift Distributed Actors project authors | ||
| // | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| import Distributed | ||
| import Logging | ||
|
|
||
| /// Implements ``LifecycleWatch`` semantics in presence of ``Node`` failures. | ||
| /// | ||
| /// Depends on a failure detector (e.g. SWIM) to actually detect a node failure, however once detected, | ||
| /// it handles notifying all _local_ actors which have watched at least one actor the terminating node. | ||
| /// | ||
| /// ### Implementation | ||
| /// In order to avoid every actor having to subscribe to cluster events and individually handle the relationship between those | ||
| /// and individually watched actors, the watcher handles subscribing for cluster events on behalf of actors which watch | ||
| /// other actors on remote nodes, and messages them upon a node becoming down. | ||
| /// | ||
| /// Actor which is notified automatically when a remote actor is `context.watch()`-ed. | ||
| /// | ||
| /// Allows manually mocking membership changes to trigger terminated notifications. | ||
| internal actor DistributedNodeDeathWatcher { | ||
| // TODO(distributed): actually use this actor rather than the behavior | ||
|
|
||
| typealias ActorSystem = ClusterSystem | ||
|
|
||
| private let log: Logger | ||
|
|
||
| private let selfNode: UniqueNode | ||
| private var membership: Cluster.Membership = .empty | ||
|
|
||
| /// Members which have been `removed` | ||
| // TODO: clear after a few days, or some max count of nodes, use sorted set for this | ||
| private var nodeTombstones: Set<UniqueNode> = [] | ||
|
|
||
| /// Mapping between remote node, and actors which have watched some actors on given remote node. | ||
| private var remoteWatchCallbacks: [UniqueNode: Set<WatcherAndCallback>] = [:] | ||
|
|
||
| private var eventListenerTask: Task<Void, Error>? | ||
|
|
||
| init(actorSystem: ActorSystem) async { | ||
| var log = actorSystem.log | ||
| self.log = log | ||
| self.selfNode = actorSystem.cluster.uniqueNode | ||
| // initialized | ||
|
|
||
| let events = actorSystem.cluster.events | ||
| self.eventListenerTask = Task { | ||
| for try await event in events { | ||
| switch event { | ||
| case .membershipChange(let change): | ||
| self.membershipChanged(change) | ||
| case .snapshot(let membership): | ||
| let diff = Cluster.Membership._diff(from: .empty, to: membership) | ||
| for change in diff.changes { | ||
| self.membershipChanged(change) | ||
| } | ||
| case .leadershipChange, .reachabilityChange: | ||
| break // ignore those, they don't affect downing | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| func watchActor( | ||
| on remoteNode: UniqueNode, | ||
| by watcher: ClusterSystem.ActorID, | ||
| whenTerminated nodeTerminatedFn: @escaping @Sendable (UniqueNode) async -> Void | ||
| ) { | ||
| guard !self.nodeTombstones.contains(remoteNode) else { | ||
| // the system the watcher is attempting to watch has terminated before the watch has been processed, | ||
| // thus we have to immediately reply with a termination system message, as otherwise it would never receive one | ||
| Task { | ||
| await nodeTerminatedFn(remoteNode) | ||
| } | ||
| return | ||
| } | ||
|
|
||
| let record = WatcherAndCallback(watcherID: watcher, callback: nodeTerminatedFn) | ||
| self.remoteWatchCallbacks[remoteNode, default: []].insert(record) | ||
| } | ||
|
|
||
| func removeWatcher(id: ClusterSystem.ActorID) { | ||
| // TODO: this can be optimized a bit more I suppose, with a reverse lookup table | ||
| let removeMe = WatcherAndCallback(watcherID: id, callback: { _ in () }) | ||
| for (node, var watcherAndCallbacks) in self.remoteWatchCallbacks { | ||
| if watcherAndCallbacks.remove(removeMe) != nil { | ||
| self.remoteWatchCallbacks[node] = watcherAndCallbacks | ||
| } | ||
| } | ||
| } | ||
|
|
||
| func cleanupTombstone(node: UniqueNode) { | ||
| _ = self.nodeTombstones.remove(node) | ||
| } | ||
|
|
||
| func membershipChanged(_ change: Cluster.MembershipChange) { | ||
| guard let change = self.membership.applyMembershipChange(change) else { | ||
| return // no change, nothing to act on | ||
| } | ||
|
|
||
| // TODO: make sure we only handle ONCE? | ||
| if change.status >= .down { | ||
| // can be: down, leaving or removal. | ||
| // on any of those we want to ensure we handle the "down" | ||
| self.handleAddressDown(change) | ||
| } | ||
| } | ||
|
|
||
| func handleAddressDown(_ change: Cluster.MembershipChange) { | ||
| let terminatedNode = change.node | ||
|
|
||
| if let watchers = self.remoteWatchCallbacks.removeValue(forKey: terminatedNode) { | ||
| for watcher in watchers { | ||
| Task { | ||
| await watcher.callback(terminatedNode) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // we need to keep a tombstone, so we can immediately reply with a terminated, | ||
| // in case another watch was just in progress of being made | ||
| self.nodeTombstones.insert(terminatedNode) | ||
| } | ||
|
|
||
| func cancel() { | ||
| self.eventListenerTask?.cancel() | ||
| self.eventListenerTask = nil | ||
| } | ||
| } | ||
|
|
||
| extension DistributedNodeDeathWatcher { | ||
| struct WatcherAndCallback: Hashable { | ||
| /// Address of the local watcher which had issued this watch | ||
| let watcherID: ClusterSystem.ActorID | ||
| let callback: @Sendable (UniqueNode) async -> Void | ||
|
|
||
| func hash(into hasher: inout Hasher) { | ||
| hasher.combine(self.watcherID) | ||
| } | ||
|
|
||
| static func == (lhs: WatcherAndCallback, rhs: WatcherAndCallback) -> Bool { | ||
| lhs.watcherID == rhs.watcherID | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workaround for 5.7 docker images being pre-wwdc.