Skip to content

update: protocols overview #287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
273 changes: 88 additions & 185 deletions content/concepts/introduction/protocols/overview.md
Original file line number Diff line number Diff line change
@@ -1,206 +1,109 @@
---
title: "Overview"
description: There are protocols everywhere you look when you're writing network applications, and libp2p is especially thick with them.
title: "What is a libp2p Protocol"
description: "There are protocols everywhere you look when you're writing network applications, and libp2p is especially thick with them."
weight: 20
aliases:
- "/concepts/protocols"
- "/concepts/fundamentals/protocols"
---

There are protocols everywhere you look when you're writing network applications, and libp2p is
especially thick with them.

The kind of protocols this article is concerned with are the ones built with libp2p itself,
using the core libp2p abstractions like [transport](/concepts/transport), [peer identity](/concepts/peers#peer-id/), [addressing](/concepts/addressing/), and so on.

Throughout this article, we'll call this kind of protocol that is built with libp2p
a **libp2p protocol**, but you may also see them referred to as "wire protocols" or "application protocols".

These are the protocols that define your application and provide its core functionality.

This article will walk through some of the key [defining features of a libp2p protocol](#what-is-a-libp2p-protocol), give an overview of the [protocol negotiation process](#protocol-negotiation), and outline some of the [core libp2p protocols](#core-libp2p-protocols) that are included with libp2p and provide key functionality.

## What is a libp2p protocol?

A libp2p protocol has these key features:
libp2p is composed of various core abstractions, such as
[peer identity](../core-abstractions/peers.md#peer-id),
and [addressing](../core-abstractions/addressing.md/), and
relies on protocols to facilitate communication between peers.
These protocols are networking protocol that follows certain conventions
to allow for peer-to-peer networking.
Comment on lines +10 to +15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
libp2p is composed of various core abstractions, such as
[peer identity](../core-abstractions/peers.md#peer-id),
and [addressing](../core-abstractions/addressing.md/), and
relies on protocols to facilitate communication between peers.
These protocols are networking protocol that follows certain conventions
to allow for peer-to-peer networking.
There are protocols everywhere you look when you're writing network applications, and libp2p is
especially thick with them.
The kind of protocols this article is concerned with are the ones built with libp2p itself,
using the core libp2p abstractions like [transports](/concepts/transport), [peer identity](/concepts/peers#peer-id/), [addressing](/concepts/addressing/), and so on.
Throughout this article, we'll call this kind of protocol that is built with libp2p
a **libp2p protocol**, but you may also see them referred to as "wire", "application" or "networking" protocols.
Together these libp2p protocols enable decentralized peer-to-peer networking and a wide variety of use cases to be built.

I prefer this.


### Protocol IDs

libp2p protocols have unique string identifiers, which are used in the [protocol negotiation](#protocol-negotiation) process when connections are first opened.
Each protocol has a unique string identifier called a protocol ID,
which negotiates the use of the protocol during the establishment
of a new stream between two peers.
Comment on lines +19 to +21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Each protocol has a unique string identifier called a protocol ID,
which negotiates the use of the protocol during the establishment
of a new stream between two peers.
Each protocol has a unique identifier called a protocol ID.
The protocol ID helps negotiate connection establishment between two peers.


By convention, protocol ids have a path-like structure, with a version number as the final component:
Protocol IDs typically follow a path-like structure with a version number
as the final component.

```shell
/my-app/amazing-protocol/1.0.1
```

Breaking changes to your protocol's wire format or semantics should result in a new version
number. See the [protocol negotiation section](#protocol-negotiation) for more information about
how version selection works during the dialing and listening process.

{{< alert icon="💡" context="info">}}
While libp2p will technically accept any string as a valid protocol id,
using the recommended path structure with a version component is both
developer-friendly and enables easier matching by version.
{{< /alert >}}

#### Handler functions

To accept connections, a libp2p application will register handler functions for protocols using their protocol id with the
[switch][definition_switch] (aka "swarm"), or a higher level interface such as [go's Host interface](https://github.com/libp2p/go-libp2p-core/blob/master/host/host.go).

The handler function will be invoked when an incoming stream is tagged with the registered protocol id.
If you register your handler with a match function, you can choose whether
to accept non-exact string matches for protocol ids, for example, to match on semantic major versions.

#### Binary streams

The "medium" over which a libp2p protocol transpires is a bi-directional binary stream with the following
properties:

- Bidirectional, reliable delivery of binary data
- Each side can read and write from the stream at any time
- Data is read in the same order as it was written
- Can be "half-closed", that is, closed for writing and open for reading, or closed for reading and open for writing
- Supports backpressure
- Readers can't be flooded by eager writers <!-- TODO(yusef) elaborate: how is backpressure implemented? is it transport-depdendent? -->

Behind the scenes, libp2p will also ensure that the stream is [secure](/concepts/secure-comms/) and efficiently
[multiplexed](/concepts/stream-multiplexing/). This is transparent to the protocol handler, which reads and writes
unencrypted binary data over the stream.

The format of the binary data and the mechanics of what to send when and by whom are all up to the protocol to determine. For inspiration, some [common patterns](#common-patterns) that are used in libp2p's internal protocols are outlined below.

## Protocol Negotiation

When dialing out to initiate a new stream, libp2p will send the protocol id of the protocol you want to use.
The listening peer on the other end will check the incoming protocol id against the registered protocol handlers.

If the listening peer does not support the requested protocol, it will end the stream, and the dialing peer can
try again with a different protocol, or possibly a fallback version of the initially requested protocol.

If the protocol is supported, the listening peer will echo back the protocol id as a signal that future data
sent over the stream will
use the agreed protocol semantics.

This process of reaching agreement about what protocol to use for a given stream or connection is called
**protocol negotiation**.

### Matching protocol ids and versions

When you register a protocol handler, there are two methods you can use.

The first takes two arguments: a protocol id, and a handler function. If an incoming stream request sends an exact
match for the protocol id, the handler function will be invoked with the new stream as an argument.

#### Using a match function

The second kind of protocol registration takes three arguments: the protocol id, a protocol match function, and the handler function.

When a stream request comes in whose protocol id doesn't have any exact matches, the protocol id will be passed through
all of the registered match functions. If any returns `true`, the associated handler function will be invoked.

This gives you a lot of flexibility to do your own "fuzzy matching" and define whatever rules for protocol matching
make sense for your application.

### Dialing a specific protocol

When dialing a remote peer to open a new stream, the initiating peer sends the protocol id that they'd like to use. The remote peer will use
the matching logic described above to accept or reject the protocol. If the protocol is rejected, the dialing peer can try again.

When dialing, you can optionally provide a list of protocol ids instead of a single id. When you provide multiple protocol ids, they will
each be tried in succession, and the first successful match will be used if at least one of the protocols is supported by the remote peer.
This can be useful if you support a range of protocol versions, since you can propose the most recent version and fallback to older versions
if the remote hasn't adopted the latest version yet.
A new stream passes through a protocol multiplexer called
[Multistream-select](multistream.md), which routes the stream to the appropriate
protocol handler based on the protocol ID.

### Handler functions

A handler function handles each libp2p protocol, responsible
for defining the protocol's behavior once the stream has been established.
The handler function is invoked when an incoming stream is received with a
registered protocol ID.

The handler function can also specify a match function,
which allows for the acceptance of non-exact string matches for protocol IDs.

A libp2p application will define a stream handler that takes over the
stream after protocol negotiation. Everything is sent and received after the
application protocol defines the negotiation phase.
Comment on lines +34 to +46
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Handler functions
A handler function handles each libp2p protocol, responsible
for defining the protocol's behavior once the stream has been established.
The handler function is invoked when an incoming stream is received with a
registered protocol ID.
The handler function can also specify a match function,
which allows for the acceptance of non-exact string matches for protocol IDs.
A libp2p application will define a stream handler that takes over the
stream after protocol negotiation. Everything is sent and received after the
application protocol defines the negotiation phase.

This whole section should be deleted imo. This page starts at a very high level and then dives right into handler functions in a jarring transition. I don't see the usefulness of explaining this concept here anyways.


### Binary streams

The medium over which a protocol operates is a bi-directional
binary stream. This stream provides the following features:
Comment on lines +50 to +51
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The medium over which a protocol operates is a bi-directional
binary stream. This stream provides the following features:
libp2p is built on top of a stream abstraction.
A stream is a concept in computer science where over time data becomes available between two or more computers/nodes.


- **Bidirectional, reliable delivery of binary data**: Both peers can read and write
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Bidirectional, reliable delivery of binary data**: Both peers can read and write
The primary benefit provided to libp2p protocols by the stream abstraction is:
**Bidirectionality and reliable delivery of binary data**: Both peers can read and write

from the stream at any time, and data is read in the same order as it was written.
The stream can also be "half-closed", meaning it can be closed for writing while
still open for reading or closed for reading while still open for writing.
- **Supports backpressure**: Eager writers cannot flood readers with data, as the
stream automatically regulates data flow to prevent overload.
Comment on lines +57 to +58
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Supports backpressure**: Eager writers cannot flood readers with data, as the
stream automatically regulates data flow to prevent overload.
- **Supports backpressure**: Eager writers cannot flood readers with data, as the
stream automatically regulates data flow to prevent overload.

Not by default, if TCP + mplex is used, there's no support for backpressure. I think this existed in the legacy document but it should be deleted


Behind the scenes, libp2p also ensures that the stream is securely encrypted and
efficiently multiplexed, allowing multiple logical streams to be multiplexed over
Comment on lines +60 to +61
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Behind the scenes, libp2p also ensures that the stream is securely encrypted and
efficiently multiplexed, allowing multiple logical streams to be multiplexed over
Behind the scenes, libp2p also ensures that the stream is securely encrypted and
efficiently multiplexed, allowing multiple logical streams to be multiplexed over

link to secure channel and muxer docs?

a single underlying connection. These details are transparent to the protocol handler,
which reads and writes unencrypted binary data over the stream.

## Life cycle of a stream

The life cycle of a libp2p stream involves the following steps:

- **Dialing out**: When a peer wants to initiate a new stream with another peer,
it sends the protocol ID of the protocol it wants to use over the connection.
- **Protocol negotiation**: The listening peer on the other end checks the incoming
protocol ID against its list of registered protocol handlers. Suppose it does not
support the requested protocol. In that case, it sends "na" (not available) on the stream.
The dialing peer can try again with a different protocol or a fallback
version of the initially requested protocol. If the protocol is supported, the
listening peer echoes the protocol ID as a signal that future data sent over
the stream will use the agreed-upon protocol semantics.
- **Stream establishment**: Once peers agree on the protocol ID, the stream is
established, and the designated invokes the handler function to take over the stream.
Everything sent and received over the stream from this point on is defined by the
application-level protocol.
- **Stream closure**: When either peer finishes using the stream, it can be closed
by either side. If the stream is half-closed, the other side can continue to read
or write until it is closed.
Comment on lines +65 to +84
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Life cycle of a stream
The life cycle of a libp2p stream involves the following steps:
- **Dialing out**: When a peer wants to initiate a new stream with another peer,
it sends the protocol ID of the protocol it wants to use over the connection.
- **Protocol negotiation**: The listening peer on the other end checks the incoming
protocol ID against its list of registered protocol handlers. Suppose it does not
support the requested protocol. In that case, it sends "na" (not available) on the stream.
The dialing peer can try again with a different protocol or a fallback
version of the initially requested protocol. If the protocol is supported, the
listening peer echoes the protocol ID as a signal that future data sent over
the stream will use the agreed-upon protocol semantics.
- **Stream establishment**: Once peers agree on the protocol ID, the stream is
established, and the designated invokes the handler function to take over the stream.
Everything sent and received over the stream from this point on is defined by the
application-level protocol.
- **Stream closure**: When either peer finishes using the stream, it can be closed
by either side. If the stream is half-closed, the other side can continue to read
or write until it is closed.

I would move this to the stream multiplexing document in a stream overview subsection
Is the protocols overview a good place for this?


## Core libp2p protocols

In addition to the protocols that you write when developing a libp2p application, libp2p itself defines several foundational protocols that are used for core features.

### Common patterns

The protocols described below all use [protocol buffers](https://developers.google.com/protocol-buffers/) (aka protobuf) to define message schemas.

Messages are exchanged over the wire using a very simple convention which prefixes binary
message payloads with an integer that represents the length of the payload in bytes. The
length is encoded as a [protobuf varint](https://developers.google.com/protocol-buffers/docs/encoding#varints) (variable-length integer).

### Ping

| **Protocol id** | spec | | | implementations |
|--------------------|------|---------------|---------------|-------------------|
| `/ipfs/ping/1.0.0` | N/A | [go][ping_go] | [js][ping_js] | [rust][ping_rust] |

The ping protocol is a simple liveness check that peers can use to quickly see if another peer is online.

After the initial protocol negotiation, the dialing peer sends 32 bytes of random binary data. The listening
peer echoes the data back, and the dialing peer will verify the response and measure
the latency between request and response.

### Identify

| **Protocol id** | spec | | | implementations |
|------------------|--------------------------------|-------------------|-------------------|-----------------------|
| `/ipfs/id/1.0.0` | [identify spec][spec_identify] | [go][identify_go] | [js][identify_js] | [rust][identify_rust] |

The `identify` protocol allows peers to exchange information about each other, most notably their public keys
and known network addresses.
In addition to the protocols written when developing a libp2p application, libp2p defines
several foundational protocols used for core features.

The basic identify protocol works by establishing a new stream to a peer using the identify protocol id
shown in the table above.

When the remote peer opens the new stream, they will fill out an [`Identify` protobuf message][identify_proto] containing
information about themselves, such as their public key, which is used to derive their [`PeerId`](/concepts/peers/).

Importantly, the `Identify` message includes an `observedAddr` field that contains the [multiaddr][definition_multiaddr] that
the peer observed the request coming in on. This helps peers determine their NAT status, since it allows them to
see what other peers observe as their public address and compare it to their own view of the network.

#### identify/push

| **Protocol id** | spec & implementations |
|-----------------------|-------------------------------------|
| `/ipfs/id/push/1.0.0` | same as [identify above](#identify) |

A slight variation on `identify`, the `identify/push` protocol sends the same `Identify` message, but it does so proactively
instead of in response to a request.

This is useful if a peer starts listening on a new address, establishes a new [relay circuit](/concepts/circuit-relay/), or
learns of its public address from other peers using the standard `identify` protocol. Upon creating or learning of a new address,
the peer can push the new address to all peers it's currently aware of. This keeps everyone's routing tables up to date and
makes it more likely that other peers will discover the new address.

### kad-dht

`kad-dht` is a [Distributed Hash Table][wiki_dht] based on the [Kademlia][wiki_kad] routing algorithm, with some modifications.

libp2p uses the DHT as the foundation of its [peer routing](/concepts/peer-routing/) and [content routing](/concepts/content-routing/) functionality. To learn more about DHT and the Kademlia algorithm,
check out the [Distributed Hash Tables guide][dht] on the IPFS documentation site. In addition, check out the [libp2p implementations page](https://libp2p.io/implementations/) for updates on all the kad-libp2p implementations.

### Circuit Relay

| **Protocol id** | spec | | implementations |
|-------------------------------|----------------------------------|----------------|-----------------|
| `/libp2p/circuit/relay/0.1.0` | [circuit relay spec][spec_relay] | [go][relay_go] | [js][relay_js] |

As described in the [Circuit Relay article](/concepts/circuit-relay/), libp2p provides a protocol
for tunneling traffic through relay peers when two peers are unable to connect to each other
directly. See the article for more information on working with relays, including notes on relay
addresses and how to enable automatic relay connection when behind an intractable NAT.
{{< alert icon="" context="note">}}
Check out the [libp2p implementations page](https://libp2p.io/implementations/) for
updates on all the libp2p implementations.
{{< /alert >}}

[ping_go]: https://github.com/libp2p/go-libp2p/tree/master/p2p/protocol/ping
[ping_js]: https://github.com/libp2p/js-libp2p-ping
[ping_rust]: https://github.com/libp2p/rust-libp2p/blob/master/protocols/ping/src/lib.rs
[spec_identify]: https://github.com/libp2p/specs/pull/97/files
[identify_go]: https://github.com/libp2p/go-libp2p/tree/master/p2p/protocol/identify
[identify_js]: https://github.com/libp2p/js-libp2p-identify
[identify_rust]: https://github.com/libp2p/rust-libp2p/tree/master/protocols/identify/src
[identify_proto]: https://github.com/libp2p/go-libp2p/blob/master/p2p/protocol/identify/pb/identify.proto
[spec_relay]: https://github.com/libp2p/specs/tree/master/relay
[relay_js]: https://github.com/libp2p/js-libp2p-circuit
[relay_go]: https://github.com/libp2p/go-libp2p-circuit
[definition_switch]: /reference/glossary/#switch
[definition_multiaddr]: /reference/glossary/#multiaddr
[wiki_dht]: https://en.wikipedia.org/wiki/Distributed_hash_table
[wiki_kad]: https://en.wikipedia.org/wiki/Kademlia
[dht]: https://docs.ipfs.tech/concepts/dht/
| **Specification** | **Protocol ID** |
|--------------------------------------------------------------------------------------------|------------------------------------|
| [AutoNAT](https://github.com/libp2p/specs/blob/master/autonat/README.md#autonat-protocol) | `/libp2p/autonat/1.0.0` |
| [Circuit Relay v2 (hop)](https://github.com/libp2p/specs/blob/master/relay/circuit-v2.md) | `/libp2p/circuit/relay/0.2.0/hop` |
| [Circuit Relay v2 (stop)](https://github.com/libp2p/specs/blob/master/relay/circuit-v2.md) | `/libp2p/circuit/relay/0.2.0/stop` |
| [DCUtR](https://github.com/libp2p/specs/blob/master/relay/DCUtR.md) | `/libp2p/dcutr/1.0.0` |
| [Fetch](https://github.com/libp2p/specs/tree/master/fetch) | `/libp2p/fetch/0.0.1` |
| [GossipSub v1.0](https://github.com/libp2p/specs/tree/master/pubsub/gossipsub) | `/libp2p/gossipsub/1.0.0` |
| [GossipSub v1.1](https://github.com/libp2p/specs/tree/master/pubsub/gossipsub) | `/libp2p/gossipsub/1.1.0` |
| [Identify](https://github.com/libp2p/specs/blob/master/identify/README.md) | `/ipfs/id/1.0.0` |
| [Identify (push)](https://github.com/libp2p/specs/blob/master/identify/README.md) | `/ipfs/id/push/1.0.0` |
| [Kademlia DHT](https://github.com/libp2p/specs/blob/master/kad-dht/README.md) | `/ipfs/kad/1.0.0` |
| [Ping](https://github.com/libp2p/specs/blob/master/ping/ping.md) | `/ipfs/ping/1.0.0` |
| [Rendezvous](https://github.com/libp2p/specs/blob/master/rendezvous/README.md) | `/libp2p/rendezvous/1.0.0` |