You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"description": "JSON Schema for STAC GeoParquet metadata stored in Parquet file metadata",
6
+
"type": "object",
7
+
"properties": {
8
+
"version": {
9
+
"type": "string",
10
+
"const": "1.0.0",
11
+
"description": "The stac-geoparquet metadata version."
12
+
},
13
+
"collection": {
14
+
"type": "object",
15
+
"description": "This object represents a Collection in a SpatioTemporal Asset Catalog. Note that this object is not validated against the STAC Collection schema. You'll need to validate it separately from stac-geoparquet."
Copy file name to clipboardExpand all lines: spec/stac-geoparquet-spec.md
+41-8Lines changed: 41 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,11 +31,11 @@ most of the fields should be the same in STAC and in GeoParquet.
31
31
|_property columns_|_varies_| - | Each property should use the relevant Parquet type, and be pulled out of the properties object to be a top-level Parquet field |
32
32
33
33
- Must be valid GeoParquet, with proper metadata. Ideally the geometry types are defined and as narrow as possible.
34
-
- Strongly recommend to only have one GeoParquet per STAC 'Collection'. Not doing this will lead to an expanded GeoParquet schema (the union of all the schemas of the collection) with lots of empty data
34
+
- Strongly recommend storing items that are mostly homogeneous (i.e. have the same fields). Parquet is a columnar format; storing items with many different fields will lead to an expanded parquet Schema with lots of empty data. In practice, this means storing a single collection or only collections with very similar item properties in a single stac-geoparquet dataset.
35
35
- Any field in 'properties' of the STAC item should be moved up to be a top-level field in the GeoParquet.
36
36
- STAC GeoParquet does not support properties that are named such that they collide with a top-level key.
37
37
- datetime columns should be stored as a [native timestamp][timestamp], not as a string
38
-
- The Collection JSON should be included in the Parquet metadata. See [Collection JSON](#including-a-stac-collection-json-in-a-stac-geoparquet-collection) below.
38
+
- The Collection JSON objects should be included in the Parquet metadata. See [Collection JSON](#stac-collection-objects) below.
39
39
- Any other properties that would be stored as GeoJSON in a STAC JSON Item (e.g. `proj:geometry`) should be stored as a binary column with WKB encoding. This simplifies the handling of collections with multiple geometry types.
40
40
41
41
### Link Struct
@@ -69,17 +69,48 @@ To take advantage of Parquet's columnar nature and compression, the assets shoul
69
69
70
70
See [Asset Object][asset] for more.
71
71
72
-
## Including a STAC Collection JSON in a STAC Geoparquet Collection
72
+
### Parquet Metadata
73
+
74
+
stac-geoparquet uses Parquet [File Metadata](https://parquet.apache.org/docs/file-format/metadata/) to store metadata about the dataset.
75
+
All stac-geoparquet metadata is stored under the key `stac-geoparquet` in the parquet file metadata.
76
+
77
+
See [`example-metadata.json`](https://github.com/stac-utils/stac-geoparquet/blob/main/spec/example-metadata.json) for an example.
78
+
79
+
A [jsonschema schema file][schema] is provided for tools to validate against.
80
+
Note that the json-schema for stac-geoparquet does *not* validate the
81
+
`collection` object against the STAC json-schema. You'll need to validate that
Note that this metadata is distinct from the file metadata required by
91
+
[geoparquet].
92
+
93
+
#### Geoparquet Version
94
+
95
+
The field `version` stores the version of the stac-geoparquet
96
+
specification the data complies with. Readers can use this field to understand what
97
+
features and fields are available.
98
+
99
+
Currently, the only allowed value is the string `"1.0.0"`.
100
+
101
+
Note: early versions of this specificaiton didn't include a `version` field. Readers
102
+
aiming for maximum compatibility may attempt to read files without this key present,
103
+
despite it being required from 1.0.0 onwards.
104
+
105
+
#### STAC Collection Object
73
106
74
107
To make a stac-geoparquet file a fully self-contained representation, you can
75
-
include the Collection JSON in the Parquet metadata. If present in the [Parquet
76
-
file metadata][parquet-metadata], the key must be `stac:collection` and the
77
-
value must be a JSON string with the Collection JSON.
108
+
include the Collection JSON document in the Parquet metadata under the
109
+
`collection` key. This should contain a STAC [Collection].
78
110
79
111
## Referencing a STAC Geoparquet Collections in a STAC Collection JSON
80
112
81
-
A common use case of stac-geoparquet is to create a mirror of a STAC collection. To refer to this mirror in the original collection, use an [Asset Object](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#asset-object) at the collection level of the STAC JSON that includes the `application/vnd.apache.parquet` Media type and `collection-mirror` Role type to describe the function of the Geoparquet STAC Collection Asset.
82
-
113
+
A common use case of stac-geoparquet is to create a mirror of a STAC collection. To refer to this mirror in the original collection, use an [Asset Object](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#asset-object) at the collection level of the STAC JSON that includes the `application/vnd.apache.parquet` Media type and `collection-mirror` Role type to describe the function of the Geoparquet STAC Co
83
114
For example:
84
115
85
116
| Field Name | Type | Value |
@@ -105,3 +136,5 @@ The principles here can likely be used to map into other geospatial data formats
0 commit comments