Skip to content

Commit bb41e39

Browse files
Merge pull request #1031 from ipfs/nodes
Adds definitions for node terminology and links
2 parents cf987ac + 84799e3 commit bb41e39

File tree

6 files changed

+123
-65
lines changed

6 files changed

+123
-65
lines changed

docs/concepts/faq.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,9 @@ The quickest way to get IPFS up and running on your machine is by installing [IP
2626

2727
For installing and initializing IPFS from the command line, check out the [command-line quick start](../how-to/command-line-quick-start.md) guide.
2828

29+
### Why doesn't my SHA hash match my CID?
30+
When you add a file to IPFS, IPFS splits it into smaller blocks. IPFS hashes each of these pieces individually, building a [Merkle Directed Acyclic Graphs (DAGs)](../concepts/merkle-dag.md) and resulting in an overall different hash.
31+
2932
## Contributing to IPFS
3033

3134
### How do I start contributing to IPFS?
@@ -40,7 +43,7 @@ Filecoin and IPFS are two separate, complementary protocols, both created by Pro
4043

4144
In short: IPFS addresses and moves content, while Filecoin is an incentive layer to persist data.
4245

43-
These components are separable - you can use one without the other, and IPFS already supports more self-organized or altruistic forms of data persistence via tools like [IPFS Cluster](https://cluster.ipfs.io/). Compatibility between IPFS and Filecoin is intended to be as seamless as possible, but we expect it to evolve over time. You can view the [draft spec for IPFS-Filecoin Interoperability](https://github.com/filecoin-project/specs/issues/143) and [ideas for future improvements](https://github.com/filecoin-project/specs/issues/144) to learn more.
46+
These components are separable - you can use one without the other, and IPFS already supports more self-organized or altruistic forms of data persistence via tools like [IPFS Cluster](https://cluster.ipfs.io/). Compatibility between IPFS and Filecoin is intended to be as seamless as possible, but we expect it to evolve. You can view the [draft spec for IPFS-Filecoin Interoperability](https://github.com/filecoin-project/specs/issues/143) and [ideas for future improvements](https://github.com/filecoin-project/specs/issues/144) to learn more.
4447

4548
## IPFS and Protocol Labs
4649

docs/concepts/glossary.md

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ A Block is a binary blob of data identified by a [CID](#cid). It could be raw by
6666

6767
### Bootstrap node
6868

69-
A Bootstrap Node is a trusted peer on the IPFS network through which an IPFS node learns about other peers on the network. [More about Bootstrapping](../how-to/modify-bootstrap-list.md)
69+
A Bootstrap Node is a trusted peer on the IPFS network through which an IPFS node learns about other peers on the network. Both go-ipfs and js-ipfs use bootstrap nodes to enter the Distributed Hash Table (DHT). See [Bootstrap](../concepts/nodes/#bootstrap)
7070

7171
## C
7272

@@ -148,6 +148,10 @@ The Datastore is the on-disk storage system used by an IPFS node. Configuration
148148

149149
Direct Connection Upgrade through Relay (DCUtR) protocol enables [hole punching](#hole-punching) for NAT traversal when port forwarding is not possible. A peer will coordinate with the counterparty using a [relayed connection](#circuit-relay-v2), to upgrade to a direct connection through a NAT/firewall whenever possible. [More about DCUtR](https://github.com/libp2p/specs/blob/master/relay/DCUtR.md)
150150

151+
### Delegate routing node
152+
153+
GO-IPFS nodes with their API ports exposed and some HTTP API commands accessible. JS-IPFS nodes use them to query the DHT and also publish content without having to actually run DHT logic on their own. See [Delegate routing](../concepts/nodes/#types)
154+
151155
### DHT
152156

153157
A _Distributed Hash Table_ (DHT) is a distributed key-value store where keys are cryptographic hashes. In IPFS, each peer is responsible for a subset of the IPFS DHT. [More about DHT](dht.md)
@@ -186,6 +190,10 @@ An IPFS Gateway acts as a bridge between traditional web browsers and IPFS. Thro
186190

187191
Garbage Collection (GC) is the process within each IPFS node of clearing out cached files and blocks. Nodes need to clear out previously cached resources to make room for new resources. [Pinned resources](#pinning) are never deleted.
188192

193+
### GO-IPFS node
194+
195+
The primary IPFS reference implementation, i.e., implements all requirements from the corresponding IPFS specification. It runs on servers and user machines with full IPFS capabilities, enabling experimentation. See [Nodes > GO-IPFS](../concepts/nodes/#go-ipfs).
196+
189197
### Graph
190198

191199
In computer science, a Graph is an abstract data type from the field of graph theory within mathematics. The [Merkle-DAG](#merkledag) used in IPFS is a specialized graph.
@@ -224,6 +232,10 @@ The InterPlanetary Name System (IPNS) is a system for creating and updating muta
224232

225233
## J
226234

235+
### JS-IPFS node
236+
237+
* Runs in the browser with a limited set of capabilities. See [Nodes > JS-IPFS](../concepts/nodes/#implementations).
238+
227239
### JSON
228240

229241
JavaScript Object Notation (JSON) is a lightweight data-interchange format. JSON is a text format that is completely language independent, human-readable, and easy to parse and generate. [More about JSON](https://www.json.org/)
@@ -298,7 +310,7 @@ Network Address Translation (NAT) enables communication between two networks by
298310

299311
### Node
300312

301-
In IPFS, a node or [peer](#peer) is the IPFS program that you run on your local computer to store files and then connect to the IPFS network. [More about IPFS Node](../how-to/command-line-quick-start.md#take-your-node-online).
313+
In IPFS, a node or [peer](#peer) is the IPFS program that you run on your local computer to store files and then connect to the IPFS network. See [Nodes](../concepts/nodes/#nodes).
302314

303315
### Node (in graphs)
304316

@@ -330,6 +342,10 @@ Pinning is the method of telling an IPFS node that particular data is important
330342

331343
A vendor-agnostic [API specification](https://ipfs.github.io/pinning-services-api-spec/) that anyone can implement to provide a service for [remote pinning](#remote-pinning).
332344

345+
### Preload node
346+
347+
Part of the process of making a UnixFS DAG publicly available via the preload node's `wantlist`, causing it to fetch data. Other nodes requesting the content can then resolve it from the preload node using Bitswap, as the data is now present in the preload node’s blockstore. See [Nodes > Preload](https://docs.ipfs.io/concepts/nodes/#preload).
348+
333349
### Protobuf
334350

335351
Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. IPFS uses it in [DAG-PB](#dag-pb). [More about Protocol Buffers](https://en.wikipedia.org/wiki/Protocol_Buffers)
@@ -342,21 +358,21 @@ Publish-subscribe (Pubsub) is an experimental feature in IPFS. Publishers send m
342358

343359
## R
344360

345-
### Remote Pinning
361+
### Relay node
346362

347-
A variant of [pinning](#pinning) that uses a third-party service to ensure that data persists on IPFS, even when your local node goes offline or your local copy of data is deleted during garbage collection. [More about working with remote pinning services](../how-to/work-with-pinning-services.md).
363+
A means to establish connectivity between libp2p nodes (e.g., IPFS nodes) that wouldn't otherwise be able to establish a direct connection to each other. This may be due to nodes that are behind NAT (Network Address Translation), reverse proxies, firewalls, etc. See [Nodes > Relay](../concepts/nodes/#relay)
348364

349-
### Relay
365+
### Remote Pinning
350366

351-
The Relay is a means to establish connectivity between libp2p nodes (e.g., IPFS nodes) that wouldn't otherwise be able to establish a direct connection to each other. This may be due to nodes that are behind NAT, reverse proxies, firewalls, etc. [More about Relay](https://github.com/libp2p/specs/tree/master/relay)
367+
A variant of [pinning](#pinning) that uses a third-party service to ensure that data persists on IPFS, even when your local node goes offline or your local copy of data is deleted during garbage collection. [More about working with remote pinning services](../how-to/work-with-pinning-services.md).
352368

353369
### Repo
354370

355371
The Repository (Repo) is a directory where IPFS stores all its settings and internal data. It is created with the `ipfs init` command. [More about Repo](../how-to/command-line-quick-start.md#install-ipfs)
356372

357373
### Root
358374

359-
A root is a [node](#node) in a [graph](#graph) that links to at least one other node. In an IPLD graph, roots are used to aggregate multiple chunks of a file together.
375+
A root is a [node](#node) in a [graph](#graph) that links to at least one other node. In an IPLD graph, roots are used to aggregate multiple chunks of a file together.
360376

361377
If you have a 600MiB file `A`, it can be split into 3 chunks `B`, `C`, and `D` since the block size of IPFS is 256MiB. The node `A` that links to each of these three chunks is the root. The CID of this root is what IPFS shows you as the CID of the file.
362378

@@ -384,7 +400,7 @@ A Self-certifying File System (SFS) is a distributed file system that doesn't re
384400

385401
### Sharding
386402

387-
An introduction of horizontal partition of data in a database or a data structure. The main purpose is to spread load and improve performance. An example of sharding in IPFS is [HAMT-sharding](#hamt-sharding) of big [UnixFS](#unixfs) directories.
403+
An introduction of horizontal partition of data in a database or a data structure. The main purpose is to spread load and improve performance. An example of sharding in IPFS is [HAMT-sharding](#hamt-sharding) of big [UnixFS](#unixfs) directories.
388404

389405
### Signing (Cryptographic)
390406

docs/concepts/hashing.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,6 @@ description: Learn about cryptographic hashes and why they're critical to how IP
66

77
# Hashing
88

9-
::: tip
10-
If you're interested in how cryptographic hashes fit into how IPFS works with files in general, check out this video from IPFS Camp 2019! [Core Course: How IPFS Deals With Files](https://www.youtube.com/watch?v=Z5zNPwMDYGg)
11-
:::
12-
139
Cryptographic hashes are functions that take some arbitrary input and return a fixed-length value. The particular value depends on the given hash algorithm in use, such as [SHA-1](https://en.wikipedia.org/wiki/SHA-1) (used by git), [SHA-256](https://en.wikipedia.org/wiki/SHA-2), or [BLAKE2](<https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE2>), but a given hash algorithm always returns the same value for a given input. Have a look at Wikipedia's [full list of hash functions](https://en.wikipedia.org/wiki/List_of_hash_functions) for more.
1410

1511
As an example, the input:
@@ -32,32 +28,37 @@ However, the exact same input generates the following output using **SHA-256**:
3228

3329
Notice that the second hash is longer than the first one. This is because SHA-1 creates a 160-bit hash, while SHA-256 creates a 256-bit hash. The prepended `0x` indicates that the following hash is represented as a hexadecimal number.
3430

35-
Hashes can be represented in different bases (`base2`, `base16`, `base32`, etc.). In fact, IPFS makes use of that as part of its [content identifiers](content-addressing.md) and supports multiple base representations at the same time, using the [Multibase](https://github.com/multiformats/multibase) protocol.
31+
Hashes can be represented in different bases (`base2`, `base16`, `base32`, etc.). In fact, IPFS uses that as part of its [content identifiers](content-addressing.md) and supports multiple base representations at the same time, using the [Multibase](https://github.com/multiformats/multibase) protocol.
3632

3733
For example, the SHA-256 hash of "Hello world" from above can be represented as base 32 as:
3834

3935
```
4036
mtwirsqawjuoloq2gvtyug2tc3jbf5htm2zeo4rsknfiv3fdp46a
4137
```
38+
::: tip
39+
If you're interested in how cryptographic hashes fit into how IPFS works with files in general, check out this video from IPFS Camp 2019! [Core Course: How IPFS Deals With Files](https://www.youtube.com/watch?v=Z5zNPwMDYGg)
40+
:::
4241

43-
## Hashes are important
42+
## Important hash characteristics
4443

45-
Cryptographic hashes come with a couple of very important characteristics:
44+
Cryptographic hashes come with several important characteristics:
4645

4746
- **deterministic** - the same input message always returns exactly the same output hash
4847
- **uncorrelated** - a small change in the message should generate a completely different hash
4948
- **unique** - it's infeasible to generate the same hash from two different messages
5049
- **one-way** - it's infeasible to guess or calculate the input message from its hash
5150

52-
These features also mean we can use a cryptographic hash to identify any piece of data: the hash is unique to the data we calculated it from and it's not too long so sending it around the network doesn't take up a lot of resource. A hash is a fixed length, so the SHA-256 hash of a one-gigabyte video file is still only 32 bytes.
51+
These features also mean we can use a cryptographic hash to identify any piece of data: the hash is unique to the data we calculated it from and it's not too long so sending it around the network doesn't take up a lot of resource. A hash is a fixed length, so the SHA-256 hash of a one-gigabyte video file is still only 32 bytes.
5352

54-
That's critical for a distributed system like IPFS, where we want to be able to store and retrieve data from many places. A computer running IPFS can ask all the peers it's connected to whether they have a file with a particular hash and, if one of them does, they send back the whole file. Without a short, unique identifier like a cryptographic hash, that wouldn't be possible. This technique is called [content addressing](content-addressing.md) — because the content itself is used to form an address, rather than information about the computer and disk location it's stored at.
53+
That's critical for a distributed system like IPFS, where we want to be able to store and retrieve data from many places. A computer running IPFS can ask all the peers it's connected to whether they have a file with a particular hash and, if one of them does, they send back the whole file. Without a short, unique identifier like a cryptographic hash, [content addressing](content-addressing.md) wouldn't be possible.
5554

56-
## Content identifiers are not file hashes
55+
## Example: Content Identifiers are not file hashes
5756

58-
Hash functions are widely used as to check for file integrity. A download provider may publish the output of a hash function for a file, often called a _checksum_. The checksum enables users to verify that a file has not been altered since it was published. This check is done by performing the same hash function against the downloaded file that was used to generate the checksum. If that checksum that the user receives from the downloaded file exactly matches the checksum on the website, then the user knows that the file was not altered and can be trusted.
57+
Hash functions are widely used to check for file integrity. Because IPFS splits content into blocks and verifies them through [directed acyclic graphs (DAGs)](../concepts/merkle-dag.md), SHA file hashes won't match CIDs. Here's an example of what will happen if you try to do that.
5958

60-
Let us look at a concrete example. When you download an image file for [Ubuntu Linux](https://ubuntu.com/) you might see the following `SHA-256` checksum on the Ubuntu website listed for verification purposes:
59+
A download provider may publish the output of a hash function for a file, often called a _checksum_. The checksum enables users to verify that a file has not been altered since it was published. This check is done by performing the same hash function against the downloaded file that was used to generate the checksum. If that checksum that the user receives from the downloaded file exactly matches the checksum on the website, then the user knows that the file was not altered and can be trusted.
60+
61+
For example, when you download an image file for [Ubuntu Linux](https://ubuntu.com/) you might see the following `SHA-256` checksum on the Ubuntu website listed for verification purposes:
6162

6263
```
6364
0xB45165ED3CD437B9FFAD02A2AAD22A4DDC69162470E2622982889CE5826F6E3D ubuntu-20.04.1-desktop-amd64.iso
@@ -80,7 +81,7 @@ added QmPK1s3pNYLi9ERiq3BDxKa4XosgWwFRQUydHUtz4YgpqB ubuntu-20.04.1-desktop-amd6
8081
2.59 GiB / 2.59 GiB [==========================================================================================] 100.00%
8182
```
8283

83-
The string `QmPK1s3pNYLi9ERiq3BDxKa4XosgWwFRQUydHUtz4YgpqB` returned by the `ipfs add` command is the content identifier (CID) of the file `ubuntu-20.04.1-desktop-amd64.iso`. We can utilize the [CID Inspector](https://cid.ipfs.io/) to see what the CID includes. The actual hash is listed under `DIGEST (HEX)`:
84+
The string `QmPK1s3pNYLi9ERiq3BDxKa4XosgWwFRQUydHUtz4YgpqB` returned by the `ipfs add` command is the content identifier (CID) of the file `ubuntu-20.04.1-desktop-amd64.iso`. We can use the [CID Inspector](https://cid.ipfs.io/) to see what the CID includes. The actual hash is listed under `DIGEST (HEX)`:
8485

8586
```
8687
NAME: sha2-256
@@ -101,4 +102,6 @@ ubuntu-20.04.1-desktop-amd64.iso: FAILED
101102
shasum: WARNING: 1 computed checksum did NOT match
102103
```
103104

104-
As we can see, the hash included in the CID does NOT match the hash of the input file `ubuntu-20.04.1-desktop-amd64.iso`. To understand what the hash contained in the CID is, we must understand how IPFS stores files. IPFS uses a [directed acyclic graph (DAG)](merkle-dag.md) to keep track of all the data stored in IPFS. A CID identifies one specific node in this graph. This identifier is the result of hashing the node's contents using a cryptographic hash function like `SHA256`.
105+
As we can see, the hash included in the CID does NOT match the hash of the input file `ubuntu-20.04.1-desktop-amd64.iso`.
106+
107+
To understand what the hash contained in the CID is, we must understand how IPFS stores files. IPFS uses a directed acyclic graph (DAG) to keep track of all the data stored in IPFS. A CID identifies one specific node in this graph. This identifier is the result of hashing the node's contents using a cryptographic hash function like SHA256.

0 commit comments

Comments
 (0)