This issue is a more practical and immediate equivalent of the discussion in #184.
From the start of the project a hacky naive approach to finding the addresses of other "actors" has been this idea of an arbiter which we implement as a "special actor" with methods for looking up (socket) addresses by name. This is of course not an ideal system since there will always be a race during a multi-tree startup for the "arbiter" address as well no flexible consensus system for how that position can be transferred to another tree / root actor when the first is torn down / fails. The fragility is further emphasized in how root actors "check" for the registry (arbiter) existing which is simply do a fast TCP connect and drop on the supposed arbiter socket address.
Summarizing the current naive/questionable design for an address registry:
- a single socket address is allocated to some root actor designated the "arbiter" (aka a registry actor) and this address is passed to other python programs which would like to search for actors also using this same registry
- the way to "check" if the arbiter "exists" is to do a nasty TCP connect/drop which results in us having to specially handle and remap
trio.BrokenResourceErrors to an internal TransportClosed error which is ignore silently
- there is no mechanism for fail-over, arbiter re-election, transfer of the registry between trees
Digging into "why" this is in the code:
This "arbiter" idea was originally adopted from other "actor system" projects:
Places to start some research
- gossip protocol
- matrix "federation replication" api
- raft
WIP, will come back.
This issue is a more practical and immediate equivalent of the discussion in #184.
From the start of the project a hacky naive approach to finding the addresses of other "actors" has been this idea of an arbiter which we implement as a "special actor" with methods for looking up (socket) addresses by name. This is of course not an ideal system since there will always be a race during a multi-tree startup for the "arbiter" address as well no flexible consensus system for how that position can be transferred to another tree / root actor when the first is torn down / fails. The fragility is further emphasized in how root actors "check" for the registry (arbiter) existing which is simply do a fast TCP connect and drop on the supposed arbiter socket address.
Summarizing the current naive/questionable design for an address registry:
trio.BrokenResourceErrors to an internalTransportClosederror which is ignore silentlyDigging into "why" this is in the code:
This "arbiter" idea was originally adopted from other "actor system" projects:
Places to start some research
WIP, will come back.