-
Notifications
You must be signed in to change notification settings - Fork 489
Description
Connection Manager Overhaul
This Issue is an EPIC to track the work related to the Connection Manager Overhaul. Each milestone context and initial thoughts are described next.
Background
As we land new features like the auto-relay and rendezvous as part of improving connectivity and discoverability in libp2p libp2p/js-libp2p#703, the connection manager overhaul becomes an important work stream to guarantee these protocols work as expected. In addition, this work will be important for some already implemented features/protocols like webrtc-star
and bootstrap
. Finally, this work is really important to enable the DHT work.
This overhaul should be an initial step towards the future ConnMgr v2.
Milestones Overview
Milestone | Issue | PR | State |
---|---|---|---|
0) Documentation - Baseline |
NA | #757 | WIP |
1) Watermarks Observation - Proactive Dial |
TODO | TODO | TODO |
2) Keep Alive |
TODO | TODO | TODO |
3) Protect Connections - Connection Tags |
TODO | TODO | TODO |
4) Protect Connections - Decaying Tags |
TODO | TODO | TODO |
5) Watermarks Observation - Trimming |
TODO | TODO | TODO |
6) Connection Gater |
TODO | #1142 | Done |
7) Dial retry |
TODO | TODO | TODO |
8) Disconnect message |
TODO | TODO | TODO |
These milestones do not need to be worked on in the displayed sequence. For instance, Connection tags, Connection Gater and Keep Alive can be isolated and implemented.
Context
The Connection manager is responsible for managing all the connections a peer has over time. It allows users to enforce an upper bound on the total number of open connections. To avoid possible service disruptions, connections can be tagged with metadata and optionally "protected" to guarantee that essential connections are kept alive.
0) Documentation - Connection flows
Create a DISCOVERABILITY_AND_CONNECTIVITY.md
document to be a subsequent to the GETTING_STARTED
document. After someone getting up to speed with how to configure and start libp2p on the getting started document, they should move into how to setup their peer/network according to their use case/environment, in order to enable peers to be discovered and connections with them to be established.
This will be divided in two categories:
- define a baseline of what is a desirable set of connections for each environment / use case
- improve current documentation to clarify some flows like the webrtc-star server
- context: Unable to connect two browser tabs with a stand alone js-ipfs node ipfs/js-ipfs#3235
- use own webrtc-star server
- context: Unable to connect two browser tabs with a stand alone js-ipfs node ipfs/js-ipfs#3235
1) Watermarks observation
Proactive dial
The connection manager proactively dials known peers, in order to have a meaningful set of connections to enable a node to work as expected, according to each use case/environment.
We have been relying on the connection manager low watermark, so that the peer keeps a reasonable number of arbitrary connections. Once we introduce protected connections, as well as tagging important peers, the proactive dial strategy can be modified to keep trying to dial more meaningful peers.
Proactive dial strategies
The following dial strategies should exist:
- Find our closest peers on the network, and attempt to stay connected to
n
to them. If peers from the previous search are no longer our closest peers, we should untag those connections, or just let decaying tags handle this. - Finding, connecting to and protecting our gossipsub peers (same topics search)
- Finding and binding to relays with AutoRelay
- Finding and binding to application protocol peers (as needed via MulticodecTopology) -- We should clarify what libp2p will handle intrinsically and what users need to do. Ideally, I think libp2p should search for multicodecs for registered topologies automatically.
- ...
The above dial strategies should have sane defaults, but also support to be overwritten.
We should have an interval to double check if we have the most meaningful peers connected to, as well as to proactively dial on some events like Peer discovery/disconnect.
TODO: different strategy for Startup/Persistence?
Subsystems should be able to ask the connection manager for a slice of the connection pool. A connection that belongs in my gossipsub mesh should probably be protected
TODO: Figure out API for interaction between subsystems/topologies and connMgr
Subsystems might want to provide a selector function to choose a peer they care want. AutoRelay will want to check if a peer has metadata with hop = true
Trim Connections
The connection manager trims less useful connections to be below a high watermark number.
- New connections should be given a grace period before they are subject to trimming - Short ttl decay tags
- Trimming automatically run on demand
- Verification on every Peer connect event
- Attempt to keep a balance between subsystems connections and their needs
- If a subsystem is exceeding its agreed allocation of connections, then we would look at disconnecting peers from it that no other system is using.
2) Keep Alive
Currently, if a connection does not have anything going on for a while, it will timeout and close.
Libp2p should guarantee that specific connections are alive. This is important for keeping connected to peers important to us, both in terms of infrastructure or application layer. Remote listening (webrtc-star, relay, etc) is really important in this context.
Keep Alive should be used for protected peers via the API (Milestone 3) and Peers provided in the configuration.
In most cases, a ping on the connection should be enough, but this needs to be tested for each transport.
3) Protect important connections
ConnManager tracks connections to peers, and allows consumers to associate metadata with each peer. This enables connections to be trimmed based on implementation-defined metadata per peer.
To see: #369
Connection tags
API
(based on go interface: https://github.com/libp2p/go-libp2p-core/blob/master/connmgr/manager.go)
- Tag a peer with a string, associating a weight with the tag.
tagPeer (peerId: PeerId, tag: string, weight: number) : void
- Untag removes the tagged value from the peer.
untagPeer (peerId: PeerId, tag: string) : void
- Get the metadata assicuated with the peer connection
getTagInfo (peerId: PeerId) : TagInfo
- tagInfo should be stored in the metadataBook
- Protect a peer from having its connection(s) pruned.
protect (peerId: PeerId, tag: string)
- This would need to return a boolean or throw
- Unprotect a peer from having its connection(s) pruned.
unProtect (peerId: PeerId, tag: string)
- Check if a peer connection is protected.
isProtected (peerId: PeerId, tag: string)
Data structures
/**
* TagInfo object stores metadata associated with a peer
* @typedef {Object} TagInfo
* @property {Map<string, number>} tags map with tags and their current weight
* @property {number} firstSeen timestamp of first connection establishment.
* @property {number} weight seq counter.
*/
Integration with Trim connections
Connection tags will allows the trimming to become more intelligent in this stage. Peers should be iterated and the weight of the tags should be used as a first criterium.
4) Decaying tags
Note: Inspired by go-libp2p https://github.com/libp2p/go-libp2p-core/blob/master/connmgr/decay.go
A decaying tag is one whose value automatically decays over time. The decay behaviour is encapsulated in a user-provided decaying function (DecayFn). The function is called on every tick (determined by the interval parameter), and returns either the new value of the tag, or whether it should be erased altogether.
We do not set values on a decaying function, but "bump" decaying tags by a delta value. This calls the BumpFn with the old value and the delta, to determine the new value.
While users should be able to provide their own functions, we should provide some preset functions to be used. Behaviours that are straightforward to implement include:
- Decay a tag by -1, or by half its current value, on every tick.
- Every time a value is bumped, sum it to its current value.
- Exponentially boost a score with every bump.
- Sum the incoming score, but keep it within min, max bounds.
This is particularly important for scenarios like the Bootstrap discovery. When it starts, these connections are really important to get to know other peers. But as time passes and new connection exist, peers should disconnect from the bootstrap nodes.
API
setDecayingTag(tag: string, interval: time, decayFn: function, bumpFn: function)
// DecayFn applies a decay to the peer's score. The implementation must call
// DecayFn at the interval supplied when registering the tag.
//
// It receives a copy of the decaying value, and returns the score after
// applying the decay, as well as a flag to signal if the tag should be erased.
type DecayFn func(value DecayingValue) (after int, rm bool)
// BumpFn applies a delta onto an existing score, and returns the new score.
//
// Non-trivial bump functions include exponential boosting, moving averages,
// ceilings, etc.
type BumpFn func(value DecayingValue, delta int) (after int)
5) Connection Gater
TODO: https://github.com/libp2p/go-libp2p-core/blob/master/connmgr/gater.go
Related: #175
6) Connection Retry
Retry a dial if it fails on a first attempt.
7) Disconnect
Sometimes it will be possible to have flows where a peer A wants to disconnect from peer B because it has a lot of connections, all of them more important that the connection with peer B. However, peer B wants to be connected to peer A. A message should be exchanged so that peer B understands that it should not retry it (for a given time?) and eventually a peer exchange. This needs to be spec'ed. Initial discussion at libp2p/go-libp2p#238
Notes
- Subsystems, such as pubsub, auto-relay, should provide a function to rank what peers they would like to have connections with.
References
Metadata
Metadata
Assignees
Labels
Type
Projects
Status