Skip to content

Commit f550f1d

Browse files
committed
Use sentence casing consistently and fix links
In the rest of the docs we mostly use sentence casing.
1 parent 6ca1fdd commit f550f1d

File tree

3 files changed

+47
-47
lines changed

3 files changed

+47
-47
lines changed

docs/how-to/scientific-data/landscape-guide.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
11
---
2-
title: Scientific Data and IPFS Landscape Guide
2+
title: Scientific data and IPFS landscape guide
33
description: an overview of the problem space, available tools, and architectural patterns for publishing and working with scientific data using IPFS.
44
---
55

6-
# Scientific Data and IPFS Landscape Guide
6+
# Scientific data and IPFS landscape guide
77

88
Scientific data and IPFS are naturally aligned: research teams need to share large datasets across institutions, verify data integrity, and ensure resilient access. From sensor networks to global climate modeling efforts, scientific communities are using IPFS content addressing and peer-to-peer distribution to solve problems traditional infrastructure can't.
99

1010
In this guide, you'll find an overview of the problem space, available tools, and architectural patterns for publishing and working with scientific data using IPFS.
1111

12-
## A Landscape in Flux
12+
## A landscape in flux
1313

1414
Science advances through collaboration, yet the infrastructure for sharing scientific data has historically developed in silos. Different fields adopted different formats, metadata conventions, and distribution mechanisms.
1515

1616
This fragmentation means there is no single "right way" to publish and share scientific data. Instead, this is an area of active innovation, with new tools and conventions emerging as communities identify common needs. Standards like [Zarr](https://zarr.dev) represent convergence points where different fields have found common ground.
1717

1818
This guide surveys the landscape and available tooling, but the right approach for your project depends on your specific constraints: the size and structure of your data, your collaboration patterns, your existing infrastructure, and your community's conventions. The goal is to help you understand the options so you can make informed choices.
1919

20-
## The Nature of Scientific Data
20+
## The nature of scientific data
2121

2222
Scientific data originates from a variety of sources. In the geospatial field, data is collected by sensors, measuring instruments, camera systems, and satellites. This data is commonly structured as multidimensional arrays (tensors), representing measurements across dimensions like time, latitude, longitude, and altitude.
2323

@@ -28,7 +28,7 @@ Key characteristics of scientific data include:
2828
- **Metadata-rich**: Extensive contextual information accompanies the raw measurements
2929
- **Collaborative**: Research often involves multiple institutions and scientists sharing and building upon datasets
3030

31-
## The Importance of Open Data Access
31+
## The importance of open data access
3232

3333
As hinted above, open access to scientific data accelerates research, enables reproducibility, and maximizes the return on public investment in science. Organizations worldwide have recognized this, leading to mandates for open data sharing in publicly funded research.
3434

@@ -42,7 +42,7 @@ These criteria are by no means exhaustive, for example initiatives like [FAIR](h
4242

4343
With that in mind, the next section will look at how these ideas come together with IPFS.
4444

45-
## The Benefits of IPFS for Scientific Data
45+
## The benefits of IPFS for scientific data
4646

4747
IPFS addresses several pain points in scientific data distribution:
4848

@@ -53,7 +53,7 @@ IPFS addresses several pain points in scientific data distribution:
5353

5454
To get a better sense of how these ideas which are central to IPFS' design are applied by the scientific community, it's worth looking at the [ORCESTRA Campaign Case Study](../../case-studies/orcestra.md) campaign, which uses IPFS to reap these benefits.
5555

56-
## Architectural Patterns
56+
## Architectural patterns
5757

5858
### CID-centric verifiable data management
5959

@@ -72,34 +72,34 @@ Ultimately the choice between these approaches for content-addressed data manage
7272
- How important is it to maintain a copy of the data in a content-addressed format? If no public publishing is expected and you only need integrity checks, you may choose not to store a full content-addressed replica and instead compute hashes on demand.
7373
- What libraries and which programming languages will you use to interact with the data? For example, Python’s xarray library, via fsspec, can read directly from a local IPFS gateway using [`ipfsspec`](https://github.com/fsspec/ipfsspec).
7474

75-
### Single Publisher
75+
### Single publisher
7676

7777
A single institution runs Kubo nodes to publish and provide data. Users retrieve via gateways or their own nodes.
7878

79-
### Collaborative Publishing
79+
### Collaborative publishing
8080

8181
Multiple institutions coordinate to provide the same datasets:
8282

8383
- Permissionless: single writer multiple follower providers
8484
- Coordination can happen out of band, for example via a shared pinset on GitHub. The original publisher must ensure their data is provided, but once it's added to the pinset, others can replicate it.
8585

86-
### Connecting to Existing Infrastructure
86+
### Connecting to existing infrastructure
8787

8888
IPFS can complement existing data infrastructure:
8989

9090
- STAC catalogs can include IPFS CIDs alongside traditional URLs
9191
- Data portals can offer IPFS as an alternative retrieval method
9292
- CI/CD pipelines can automatically add new data to IPFS nodes
9393

94-
## Geospatial Format Evolution: From NetCDF to Zarr
94+
## Geospatial format evolution: from NetCDF to Zarr
9595

9696
The scientific community has long relied on formats like NetCDF, HDF5, and GeoTIFF for storing multidimensional n-array data (also referred to as tensors). While these formats served research well, they were designed for local filesystems and face challenges in cloud and distributed environments, that have become the norm over the last decades. This has been a trend driven by both the size of datasets growing and the advent of cloud and distributed systems enabling the storage and processing of larger volumes of data.
9797

98-
### Limitations of Traditional Formats
98+
### Limitations of traditional formats
9999

100100
NetCDF and HDF5 interleave metadata with data, requiring large sequential reads to access metadata before reaching the data itself. This creates performance bottlenecks when accessing data over networks, whether that's cloud storage or a peer-to-peer network.
101101

102-
### The Rise of Zarr
102+
### The rise of Zarr
103103

104104
[Zarr](https://zarr.dev/) has emerged as a cloud-native format optimized for distributed storage:
105105

@@ -146,11 +146,11 @@ Metadata in scientific datasets serves to make the data self-describing, like wh
146146

147147
[**GeoZarr**](https://github.com/zarr-developers/geozarr-spec) is a specification for storing geospatial raster/grid data in the Zarr format. It defines conventions for how to encode coordinate reference systems, spatial dimensions, and other geospatial metadata within Zarr stores. It's conceptually downstream of the ideas in CF CDM (from the [netCDF ecosystem](https://docs.unidata.ucar.edu/netcdf-java/5.2/userguide/common_data_model_overview.html)), but designed for the Zarr ecosystem.
148148

149-
## Ecosystem Tooling
149+
## Ecosystem tooling
150150

151-
### Organizing Content-Addressed Data
151+
### Organizing content-addressed data
152152

153-
#### UnixFS and CAR Files
153+
#### UnixFS and CAR files
154154

155155
UnixFS is the default format for representing files and directories in IPFS. It chunks large files for incremental verification and parallel retrieval.
156156

@@ -181,7 +181,7 @@ To learn more about how to use MFS to organize your data, check out the guide on
181181
[IPFS Cluster](https://ipfscluster.io/) is a cluster solution built on top of Kubo for multi-node deployments. IPFS Cluster coordinates pinning across a set of Kubo nodes, ensuring data redundancy and availability.
182182
Support for the [Pinning API spec](https://ipfs.github.io/pinning-services-api-spec/).
183183

184-
#### Pinning Services
184+
#### Pinning services
185185

186186
Third-party pinning services provide managed infrastructure for persistent storage, useful when you don't want to run your own nodes.
187187
TODO: link to pinning services list in docs
@@ -201,7 +201,7 @@ ds = xr.open_dataset(
201201
)
202202
```
203203

204-
### Discovery, Metadata, and Data Portals: From discovery all the way to retrieval
204+
### Discovery, metadata, and data portals: from discovery all the way to retrieval
205205

206206
TODO: add an intro in the form of a user journey of a scientists looking for data, all the way to retrieving it.
207207

@@ -212,7 +212,7 @@ Content Discovery is an loaded term that can mean related, albeit distinct conce
212212
- Human-centric
213213
- **Content discovery:** also commonly known as **content routing**, refers to finding providers (nodes serving the data) for a given CID, including their network addresses. By default, IPFS supports a number of content routing systems: the Amino DHT, IPNI and Delegated Routing over HTTP as a common interface for interoperability.
214214

215-
### CID Discovery
215+
### CID discovery
216216

217217
When using content-addressed systems like IPFS, a new challenge emerges: how do users discover the Content Identifiers (CIDs) for datasets they want to access?
218218

@@ -259,7 +259,7 @@ STAC has a web browser, making navigation discovery https://github.com/radiantea
259259
260260
-->
261261

262-
## Next Steps
262+
## Next steps
263263

264264
- [Publishing Zarr Datasets with IPFS](./publish-geospatial-zarr-data.md) - A hands-on guide to publishing your first dataset
265265
- [Kubo Configuration Reference](https://github.com/ipfs/kubo/blob/master/docs/config.md)

docs/how-to/scientific-data/publish-geospatial-zarr-data.md

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
2-
title: Publish Geospatial Zarr Data with IPFS
2+
title: Publish geospatial Zarr data with IPFS
33
description: Learn how to publish geospatial datasets using IPFS and Zarr for decentralized distribution, data integrity, and open access.
44
---
55

6-
# Publish Geospatial Zarr Data with IPFS
6+
# Publish geospatial Zarr data with IPFS
77

88
In this guide, you will learn how to publish public geospatial data sets using IPFS, with a focus on the [Zarr](https://zarr.dev/) format. You'll learn how to leverage decentralized distribution with IPFS for better collaboration, data integrity, and open access.
99

@@ -15,19 +15,19 @@ If you are interested in a real-world example following the patterns in this gui
1515

1616
- [Why IPFS for Geospatial Data?](#why-ipfs-for-geospatial-data)
1717
- [Prerequisites](#prerequisites)
18-
- [Step 1: Prepare Your Zarr Data Set](#step-1-prepare-your-zarr-data-set)
19-
- [Step 2: Add Your Data Set to IPFS](#step-2-add-your-data-set-to-ipfs)
20-
- [Step 3: Organizing Your Data](#step-3-organizing-your-data)
21-
- [Step 4: Verify Providing Status](#step-4-verify-providing-status)
22-
- [Step 5: Content Discovery](#step-5-content-discovery)
23-
- [Option A: Share the CID Directly](#option-a-share-the-cid-directly)
24-
- [Option B: Use IPNS for Updatable References](#option-b-use-ipns-for-updatable-references)
25-
- [Option C: Use DNSLink for Human-Readable URLs](#option-c-use-dnslink-for-human-readable-urls)
26-
- [Accessing Published Data](#accessing-published-data)
27-
- [Choosing Your Approach](#choosing-your-approach)
18+
- [Step 1: Prepare your Zarr data set](#step-1-prepare-your-zarr-data-set)
19+
- [Step 2: Add your data set to IPFS](#step-2-add-your-data-set-to-ipfs)
20+
- [Step 3: Organizing your data](#step-3-organizing-your-data)
21+
- [Step 4: Verify providing status](#step-4-verify-providing-status)
22+
- [Step 5: Content discovery](#step-5-content-discovery)
23+
- [Option A: Share the CID directly](#option-a-share-the-cid-directly)
24+
- [Option B: Use IPNS for updatable references](#option-b-use-ipns-for-updatable-references)
25+
- [Option C: Use DNSLink for human-readable URLs](#option-c-use-dnslink-for-human-readable-urls)
26+
- [Accessing published data](#accessing-published-data)
27+
- [Choosing your approach](#choosing-your-approach)
2828
- [Reference](#reference)
2929

30-
## Why IPFS for Geospatial Data?
30+
## Why IPFS for geospatial data?
3131

3232
Geospatial data sets such as weather observations, satellite imagery, and sensor readings, are typically stored as multidimensional arrays, also commonly known as tensors.
3333

@@ -58,14 +58,14 @@ Before starting, ensure you have:
5858

5959
- A Zarr data set ready for publishing
6060
- Basic familiarity with the command line
61-
- [Kubo](/install/command-line/) or [IPFS Desktop](/install/ipfs-desktop/) installed on a machine.
61+
- [Kubo](../../install/command-line.md) or [IPFS Desktop](../../install/ipfs-desktop.md) installed on a machine.
6262

6363
:::callout
6464
See the [NAT and port forwarding guide](../nat-configuration.md) for more information on how to configure port forwarding so that your IPFS node is publicly reachable, thus allowing reliable retrievability of data by other nodes.
6565

6666
:::
6767

68-
## Step 1: Prepare Your Zarr Data Set
68+
## Step 1: Prepare your Zarr data set
6969

7070
When preparing your Zarr data set for IPFS, aim for approximately 1 MiB chunks to align with IPFS's 1 MiB maximum block size. While this is not a strict requirement, using larger Zarr chunks will cause IPFS to split them into multiple blocks, potentially increasing retrieval latency.
7171

@@ -93,7 +93,7 @@ Chunking in Zarr is a nuanced topic beyond the scope of this guide. For more inf
9393

9494
:::
9595

96-
## Step 2: Add Your Data Set to IPFS
96+
## Step 2: Add your data set to IPFS
9797

9898
Add your Zarr folder to IPFS using the `ipfs add` command:
9999

@@ -117,9 +117,9 @@ This command:
117117

118118
The `--quieter` flag outputs only the root CID, which identifies the complete dataset.
119119

120-
> **Note:** Check out the [lifecycle of data in IPFS](../../../concepts/lifecycle.md) to learn more about how merkleizing, pinning, and providing work under the hood.
120+
> **Note:** Check out the [lifecycle of data in IPFS](../../concepts/lifecycle.md) to learn more about how merkleizing, pinning, and providing work under the hood.
121121
122-
## Step 3: Organizing Your Data
122+
## Step 3: Organizing your data
123123

124124
Two options help manage multiple datasets on your node:
125125

@@ -186,7 +186,7 @@ ipfs files stat --hash /datasets/halo
186186

187187
`bafybeihqixf5ew7mfr74bzb74qiw2mgtnytabnpzjnf5xeejzq4p2ocygu` is a new CID representing the combined dataset containing all three HALO flight datasets. The original CIDs are referenced, not copied, so no data is duplicated.
188188

189-
## Step 4: Verify Providing Status
189+
## Step 4: Verify providing status
190190

191191
After adding, Kubo continuously announces your content to the network. Check the status:
192192

@@ -196,19 +196,19 @@ ipfs provide stat
196196

197197
For detailed diagnostics, see the [provide system documentation](https://github.com/ipfs/kubo/blob/master/docs/provide-stats.md).
198198

199-
## Step 5: Content Discovery
199+
## Step 5: Content discovery
200200

201201
Now that your data is available on the public network, the next step is making it discoverable to others. Choose a sharing approach based on your needs:
202202

203-
### Option A: Share the CID Directly
203+
### Option A: Share the CID directly
204204

205205
For one-off sharing, provide the CID directly:
206206

207207
```
208208
ipfs://bafybeif52irmuurpb27cujwpqhtbg5w6maw4d7zppg2lqgpew25gs5eczm
209209
```
210210

211-
### Option B: Use IPNS for Updatable References
211+
### Option B: Use IPNS for updatable references
212212

213213
If you want to share a stable identifier but be able to update the underlying dataset, create an [IPNS](https://docs.ipfs.tech/concepts/ipns/) identifier and share that instead. This is useful for datasets that get updated regularly — users can bookmark your IPNS name and always retrieve the latest version.
214214

@@ -222,7 +222,7 @@ ipfs name publish /ipfs/<new-dataset-cid>
222222

223223
IPNS is supported by all the retrieval methods in the [Accessing Published Data](#accessing-published-data) section below. Keep in mind that IPNS name resolution adds latency to the retrieval process.
224224

225-
### Option C: Use DNSLink for Human-Readable URLs
225+
### Option C: Use DNSLink for human-readable URLs
226226

227227
Link a DNS name to your CID by adding a TXT record:
228228

@@ -236,11 +236,11 @@ Users can then access your data using one of the following methods:
236236
- With Kubo: `ipfs cat /ipns/data.example.org/zarr.json`
237237
- Using ipfsspec in Python as detailed below in [Python with ipfsspec](#python-with-ipfsspec), which also supports IPNS names, so you can use `ipns://data.example.org/zarr.json` directly.
238238

239-
## Accessing Published Data
239+
## Accessing published data
240240

241241
Once published, users can access your Zarr datasets through multiple methods:
242242

243-
### IPFS HTTP Gateways
243+
### IPFS HTTP gateways
244244

245245
See the [retrieval guide](../../quickstart/retrieve.md).
246246

@@ -266,7 +266,7 @@ import { verifiedFetch } from '@helia/verified-fetch'
266266
const response = await verifiedFetch('ipfs://<cid>/zarr.json')
267267
```
268268

269-
## Choosing Your Approach
269+
## Choosing your approach
270270

271271
Consider these factors when planning your publishing strategy:
272272

docs/quickstart/retrieve.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ To fetch the CID using an IPFS gateway is as simple as loading one of the follow
139139

140140
In this quickstart guide, you learned the different approaches to retrieving CIDs from the IPFS network and how to pick the most appropriate method for your specific needs.
141141

142-
You then fetched the image that was pinned in the [publishing with a pinning service quickstart guide](./publish.md) using an IPFS Kubo node and an IPFS Gateway.
142+
You then fetched the image that was pinned in the [publishing with a pinning service quickstart guide](./pin.md) using an IPFS Kubo node and an IPFS Gateway.
143143

144144
Possible next steps include:
145145

0 commit comments

Comments
 (0)