You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 16, 2020. It is now read-only.
The Main Epic: Replicate 350 TB of Data Between 3 Peers (and the World)
People (hypothetical):
Jack (Stanford)
Michelle (U Toronto/EDGI)
Amy (a University in Midwest)
IPFS team
Anyone out there following along
Technical Considerations:
If we can roll out filestore in time (see #95 and #91), we can update this plan to have Jack tell ipfs to "track" the data rather than "adding" it to ipfs. This would allow him to serve his original copy of the dataset directly to the network without creating a second copy on his local machines. In the meantime, we can start the experiment using ipfs add with smaller volumes of data (ie. 5-10TB). This will allow us to start surfacing and addressing issues around:
Providers UX
Blockstore Performance
Delegated Content Routing
Memory Usage
Deployment/Ops Experience
Advance Prep: Downloading the Data & Setting up the Network
Jack gradually adds more of the dataset to ipfs, giving the new root hashes to Michelle and Amy. They replicate the data.
Move to the Public Network
After testing is complete, switch the nodes to the public/default IPFS network. Provide the blocks on the DHT and publish the root hashes for people in the general public to pin.
Follow-up
At the end of the sprint, we will need to follow up on a lot of things. See #103
The text was updated successfully, but these errors were encountered:
flyingzumwalt
changed the title
Main data.gov Epic: Replicate 300 TB of Data Between 3 Peers (and the World)
Main data.gov Epic: Replicate 350 TB of Data Between 3 Peers (and the World)
Jan 16, 2017
flyingzumwalt
changed the title
Main data.gov Epic: Replicate 350 TB of Data Between 3 Peers (and the World)
Main data.gov Epic: Replicate 350 TB of Data Between 3 Peers (and then the World)
Jan 16, 2017
UPDATE: Based on initial crawls of the first 3000 datasets, @mejackreed has modified his estimates of the total size of data.gov. The entire corpus of data.gov might only be between 1TB and 10TB. We have identified at least one other large climate dataset, that we will try to download in addition to data.gov.
How this impacts the experiment
If it does turn out that the entire data.gov corpus is under 10TB, it will impact this experiment in a couple ways:
More people will be able to participate in the network, pinning the entire corpus on their IPFS nodes
The additional datasets, like this 30TB NOAA dataset will be included in the experiment and replicated to institutional collaborators for the purposes of testing the system and backing up those datasets temporarily, but it will be easy to either pin or skip those datasets independently of the main data.gov corpus. At the very least, it will be much easier to find new homes for those datasets and move them to the new homes over IPFS.
The IPFS team will have to find an even bigger dataset to test our systems at loads over 100TB. 😄
Uh oh!
There was an error while loading. Please reload this page.
The Main Epic: Replicate 350 TB of Data Between 3 Peers (and the World)
People (hypothetical):
Technical Considerations:
If we can roll out filestore in time (see #95 and #91), we can update this plan to have Jack tell ipfs to "track" the data rather than "adding" it to ipfs. This would allow him to serve his original copy of the dataset directly to the network without creating a second copy on his local machines. In the meantime, we can start the experiment using
ipfs add
with smaller volumes of data (ie. 5-10TB). This will allow us to start surfacing and addressing issues around:Advance Prep: Downloading the Data & Setting up the Network
Test-run: 5TB
Test-runs: 50 TB, 100 TB, 300 TB
Jack gradually adds more of the dataset to ipfs, giving the new root hashes to Michelle and Amy. They replicate the data.
Move to the Public Network
After testing is complete, switch the nodes to the public/default IPFS network. Provide the blocks on the DHT and publish the root hashes for people in the general public to pin.
Follow-up
At the end of the sprint, we will need to follow up on a lot of things. See #103
The text was updated successfully, but these errors were encountered: