-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Hey guys, hope you're all doing fine in the current situation & as a long time mapper I like to thank you for all the work you put into this project! :)
Interplanetary Filesystem
IPFS is a network protocol that allows exchanging data efficiently in a worldwide mesh-network. The content is addressed by a Content-ID (CID) - by default SHA256 - and ensures that the content wasn't altered.
All interconnections are dynamically established and terminated, based on you the requests to the daemon and the queries in the global Distributed Hash-Table - which is used to resolve Content-IDs to Peers, and Peers to IP+ports.
Storage concept
There are multiple data types, but the most interesting for you is UnixFS (files and folders). A Content-ID of a folder is immutable and thus ensures that all data inside a folder can be verified after receiving it.
IPFS has a build-in 'name system' that allows to assign a static id (the public key of an RSA or ed25519 key) and point it to changing content. This way you can switch a link from one folder version to a different folder version atomically. The static IDs are accessed through/ipfs/ and the content IDs are accessed through/ipfs/ on a web-gateway.
An example page via a CID-Link on a Gateway
Software for end-users
But you don't need a gateway to access such URLs, there are also Browser Plugins (For Firefox and Chrome), which can resolve and access them directly, Desktop Clients (for Windows / Linux / MacOS) and there's a wget
replacement which uses IPFS directly to access the URL.
Backwards compatibility
You can offer a webpage which is accessible via HTTP(s) and IPFS at the same time. The browser plugins automatically detect if a webpage got a DNSLink-Entry and will switch to IPFS. All IPFS-project pages are for example stored on an IPFS cluster and served by a regular web server and can also be fetched by the browser plugins.
On the website itself, you can link an URL to one of the web-gateways, to allow users with regular browsers to access the data, without having to install anything.
If the link points towards a folder it looks like this dataset.
Cluster
IPFS alone does not guarantee data replication, everything is just stored locally for other clients to access. To achieve data replication, you need the cluster daemon. It will create a set of elements and let you add or remove them. Each element can be tagged with an expiry time (after it will be removed automatically) a minimum and a maximum replication amount.
Maximum will set the number of copies created on add, while a drop below minimum will automatically start additional replication of the data.
Altering the cluster-configuration
A cluster can dynamically grow or shrink without any configuration needed, and new data is preferable allocated to the peers which got the freest space. This way every need peer in the cluster will extend the available storage in the cluster.
Write access on the cluster is defined with the cluster configuration file (A JSON file), which lists a number of public-keys that are allowed to alter the set of elements.
Adding cluster members
Following a cluster is very simple, everyone with a locally running IPFS-daemon can start a cluster-follower which reads the cluster configuration file and communicates with the local IPFS-daemon to do the necessary replications.
Those public collaboration clusters are available since the last release of IPFS-cluster and some of the clusters are listed here:
https://collab.ipfscluster.io/
Server Outages
Server outages are no issue. The cluster has no 'master' which is necessary for the operations. Nodes with write access can go completely offline, while the data is still available.
Server outages of third-parties might trigger additionally copies of data, to guarantee the availability inside the cluster - if necessary.
If a server of the cluster comes back online, it will receive the full delta of the cluster metadata, catch up and continue the operation automatically.
Data integrity
All data is checked for integrity block by block (default block size max 256K) via SHA256 sum according to the CID (and it's metadata).
Tamper resistance
The data held on the mirrors cannot be tampered with since IPFS would just don't accept those data, because of the wrong checksum. Nobody without your keys can write to the cluster and nobody without your keys can alter the IPFS-Name system entry.
Community aspect
IPFS allows to easily read-access of the files on the mirrors but also allows everyone in the community to set up a cluster follower without having to list an additional URL on a Wiki page which needs to be cleaned up if some of the servers are no longer available etc.
Disaster recovery
Private key for Cluster-Write-Access lost
If the write key of a cluster is lost, a new cluster has to be created. This requires a daemon restart with a new configuration file and refetch of the cluster-metadata all cluster-followers. The data-integrity is unaffected since the data will stay online and on a reimport stay the same.
This can be mitigated by an alternative write key which is securely stored on a backup location.
Complete data loss all (project) servers
Since there are third-party servers, the data-integrity won't be affected. Regarding write access, look above.
Data-integration issues on a cluster server
On this server, the data store needs to be verified. All data with errors will be removed and refetched by the cluster-follower.
If the databases are affected too, IPFS can be wiped (private key doesn't need to be maintained), and the ipfs-cluster-follower can be wiped as well (private key doesn't need to be maintained).
The follower and IPFS can then be restarted again and will pull the full metadata-history again, then receiving any newly written data.
If the follower-identity is maintained (the private key isn't wiped) the cluster follower will fetch his part of the replication again.
Data loss on the whole cluster
If some data is completely lost on the cluster it can be added again by adding the same data on an IPFS node. So an offline-backup, for example, can restore the data on the cluster.
Data transfer speed
Netflix did a great job improving the IPFS component which organizes the data transfers - Bitswap. There's an blogpost about that. It will be part of the next major release, to be released within the next month.
Archiving via IPFS-cluster
Since IPFS allows everyone to replicate the data easily and offer redundancies this way, it might be an interesting solution for your backups as well - in a second cluster installation.
A third party person outside of the main team could hold the write access to this archiving/backup cluster. The main team adds all backup files to IPFS and the third-party-person puts the CID of the backup-folder to the cluster-pin.
If files from the mirror-cluster should be archived, they can just be added via ContentID to the backup cluster and are automatically transferred.