3
3
Zarr storage specification version 2
4
4
====================================
5
5
6
- This document provides a technical specification of the protocol and format
7
- used for storing Zarr arrays. The key words "MUST", "MUST NOT", "REQUIRED",
8
- "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
9
- "OPTIONAL" in this document are to be interpreted as described in `RFC 2119
6
+ This document provides a technical specification of the protocol and format
7
+ used for storing Zarr arrays. The key words "MUST", "MUST NOT", "REQUIRED",
8
+ "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
9
+ "OPTIONAL" in this document are to be interpreted as described in `RFC 2119
10
10
<https://www.ietf.org/rfc/rfc2119.txt> `_.
11
11
12
12
Status
13
13
------
14
14
15
- This specification is the latest version. See :ref: `spec ` for previous
15
+ This specification is the latest version. See :ref: `spec ` for previous
16
16
versions.
17
17
18
18
Storage
19
19
-------
20
20
21
- A Zarr array can be stored in any storage system that provides a key/value
22
- interface, where a key is an ASCII string and a value is an arbitrary sequence
23
- of bytes, and the supported operations are read (get the sequence of bytes
24
- associated with a given key), write (set the sequence of bytes associated with
21
+ A Zarr array can be stored in any storage system that provides a key/value
22
+ interface, where a key is an ASCII string and a value is an arbitrary sequence
23
+ of bytes, and the supported operations are read (get the sequence of bytes
24
+ associated with a given key), write (set the sequence of bytes associated with
25
25
a given key) and delete (remove a key/value pair).
26
26
27
- For example, a directory in a file system can provide this interface, where
28
- keys are file names, values are file contents, and files can be read, written
29
- or deleted via the operating system. Equally, an S3 bucket can provide this
30
- interface, where keys are resource names, values are resource contents, and
27
+ For example, a directory in a file system can provide this interface, where
28
+ keys are file names, values are file contents, and files can be read, written
29
+ or deleted via the operating system. Equally, an S3 bucket can provide this
30
+ interface, where keys are resource names, values are resource contents, and
31
31
resources can be read, written or deleted via HTTP.
32
32
33
33
Below an "array store" refers to any system implementing this interface.
@@ -38,11 +38,11 @@ Arrays
38
38
Metadata
39
39
~~~~~~~~
40
40
41
- Each array requires essential configuration metadata to be stored, enabling
42
- correct interpretation of the stored data. This metadata is encoded using JSON
41
+ Each array requires essential configuration metadata to be stored, enabling
42
+ correct interpretation of the stored data. This metadata is encoded using JSON
43
43
and stored as the value of the ".zarray" key within an array store.
44
44
45
- The metadata resource is a JSON object. The following keys MUST be present
45
+ The metadata resource is a JSON object. The following keys MUST be present
46
46
within the object:
47
47
48
48
zarr_format
57
57
A string or list defining a valid data type for the array. See also
58
58
the subsection below on data type encoding.
59
59
compressor
60
- A JSON object identifying the primary compression codec and providing
61
- configuration parameters, or ``null `` if no compressor is to be used.
60
+ A JSON object identifying the primary compression codec and providing
61
+ configuration parameters, or ``null `` if no compressor is to be used.
62
62
The object MUST contain an ``"id" `` key identifying the codec to be used.
63
63
fill_value
64
64
A scalar value providing the default value to use for uninitialized
@@ -74,10 +74,10 @@ filters
74
74
75
75
Other keys MUST NOT be present within the metadata object.
76
76
77
- For example, the JSON object below defines a 2-dimensional array of 64-bit
78
- little-endian floating point numbers with 10000 rows and 10000 columns, divided
79
- into chunks of 1000 rows and 1000 columns (so there will be 100 chunks in total
80
- arranged in a 10 by 10 grid). Within each chunk the data are laid out in C
77
+ For example, the JSON object below defines a 2-dimensional array of 64-bit
78
+ little-endian floating point numbers with 10000 rows and 10000 columns, divided
79
+ into chunks of 1000 rows and 1000 columns (so there will be 100 chunks in total
80
+ arranged in a 10 by 10 grid). Within each chunk the data are laid out in C
81
81
contiguous order. Each chunk is encoded using a delta filter and compressed
82
82
using the Blosc compression library prior to storage::
83
83
@@ -109,8 +109,8 @@ Data type encoding
109
109
~~~~~~~~~~~~~~~~~~
110
110
111
111
Simple data types are encoded within the array metadata as a string,
112
- following the `NumPy array protocol type string (typestr) format
113
- <http://docs.scipy.org/doc/numpy/reference/arrays.interface.html> `_. The format
112
+ following the `NumPy array protocol type string (typestr) format
113
+ <http://docs.scipy.org/doc/numpy/reference/arrays.interface.html> `_. The format
114
114
consists of 3 parts:
115
115
116
116
* One character describing the byteorder of the data (``"<" ``: little-endian;
@@ -127,9 +127,9 @@ The byte order MUST be specified. E.g., ``"<f8"``, ``">i4"``, ``"|b1"`` and
127
127
``"|S12" `` are valid data type encodings.
128
128
129
129
Structured data types (i.e., with multiple named fields) are encoded as a list
130
- of two-element lists, following `NumPy array protocol type descriptions (descr)
131
- <http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#> `_. For
132
- example, the JSON list ``[["r", "|u1"], ["g", "|u1"], ["b", "|u1"]] `` defines a
130
+ of two-element lists, following `NumPy array protocol type descriptions (descr)
131
+ <http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#> `_. For
132
+ example, the JSON list ``[["r", "|u1"], ["g", "|u1"], ["b", "|u1"]] `` defines a
133
133
data type composed of three single-byte unsigned integers labelled "r", "g" and
134
134
"b".
135
135
@@ -147,37 +147,41 @@ Positive Infinity ``"Infinity"``
147
147
Negative Infinity ``"-Infinity" ``
148
148
================= ===============
149
149
150
+ If an array has a fixed length byte string data type (e.g., ``"|S12" ``), or a
151
+ structured data type, and if the fill value is not null, then the fill value
152
+ MUST be encoded as an ASCII string using the standard Base64 alphabet.
153
+
150
154
Chunks
151
155
~~~~~~
152
156
153
- Each chunk of the array is compressed by passing the raw bytes for the chunk
154
- through the primary compression library to obtain a new sequence of bytes
155
- comprising the compressed chunk data. No header is added to the compressed
156
- bytes or any other modification made. The internal structure of the compressed
157
- bytes will depend on which primary compressor was used. For example, the `Blosc
158
- compressor <https://github.com/Blosc/c-blosc/blob/master/README_HEADER.rst> `_
159
- produces a sequence of bytes that begins with a 16-byte header followed by
157
+ Each chunk of the array is compressed by passing the raw bytes for the chunk
158
+ through the primary compression library to obtain a new sequence of bytes
159
+ comprising the compressed chunk data. No header is added to the compressed
160
+ bytes or any other modification made. The internal structure of the compressed
161
+ bytes will depend on which primary compressor was used. For example, the `Blosc
162
+ compressor <https://github.com/Blosc/c-blosc/blob/master/README_HEADER.rst> `_
163
+ produces a sequence of bytes that begins with a 16-byte header followed by
160
164
compressed data.
161
165
162
- The compressed sequence of bytes for each chunk is stored under a key formed
163
- from the index of the chunk within the grid of chunks representing the array.
164
- To form a string key for a chunk, the indices are converted to strings and
166
+ The compressed sequence of bytes for each chunk is stored under a key formed
167
+ from the index of the chunk within the grid of chunks representing the array.
168
+ To form a string key for a chunk, the indices are converted to strings and
165
169
concatenated with the period character (".") separating each index. For
166
- example, given an array with shape (10000, 10000) and chunk shape (1000, 1000)
167
- there will be 100 chunks laid out in a 10 by 10 grid. The chunk with indices
168
- (0, 0) provides data for rows 0-1000 and columns 0-1000 and is stored under the
170
+ example, given an array with shape (10000, 10000) and chunk shape (1000, 1000)
171
+ there will be 100 chunks laid out in a 10 by 10 grid. The chunk with indices
172
+ (0, 0) provides data for rows 0-1000 and columns 0-1000 and is stored under the
169
173
key "0.0"; the chunk with indices (2, 4) provides data for rows 2000-3000 and
170
174
columns 4000-5000 and is stored under the key "2.4"; etc.
171
175
172
- There is no need for all chunks to be present within an array store. If a chunk
173
- is not present then it is considered to be in an uninitialized state. An
174
- unitialized chunk MUST be treated as if it was uniformly filled with the value
176
+ There is no need for all chunks to be present within an array store. If a chunk
177
+ is not present then it is considered to be in an uninitialized state. An
178
+ unitialized chunk MUST be treated as if it was uniformly filled with the value
175
179
of the "fill_value" field in the array metadata. If the "fill_value" field is
176
180
``null `` then the contents of the chunk are undefined.
177
181
178
- Note that all chunks in an array have the same shape. If the length of any
179
- array dimension is not exactly divisible by the length of the corresponding
180
- chunk dimension then some chunks will overhang the edge of the array. The
182
+ Note that all chunks in an array have the same shape. If the length of any
183
+ array dimension is not exactly divisible by the length of the corresponding
184
+ chunk dimension then some chunks will overhang the edge of the array. The
181
185
contents of any chunk region falling outside the array are undefined.
182
186
183
187
Filters
@@ -196,15 +200,15 @@ Hierarchies
196
200
Logical storage paths
197
201
~~~~~~~~~~~~~~~~~~~~~
198
202
199
- Multiple arrays can be stored in the same array store by associating each array
200
- with a different logical path. A logical path is simply an ASCII string. The
201
- logical path is used to form a prefix for keys used by the array. For example,
203
+ Multiple arrays can be stored in the same array store by associating each array
204
+ with a different logical path. A logical path is simply an ASCII string. The
205
+ logical path is used to form a prefix for keys used by the array. For example,
202
206
if an array is stored at logical path "foo/bar" then the array metadata will be
203
207
stored under the key "foo/bar/.zarray", the user-defined attributes will be
204
208
stored under the key "foo/bar/.zattrs", and the chunks will be stored under
205
209
keys like "foo/bar/0.0", "foo/bar/0.1", etc.
206
210
207
- To ensure consistent behaviour across different storage systems, logical paths
211
+ To ensure consistent behaviour across different storage systems, logical paths
208
212
MUST be normalized as follows:
209
213
210
214
* Replace all backward slash characters ("\\ ") with forward slash characters
@@ -221,24 +225,24 @@ After normalization, if splitting a logical path by the "/" character results
221
225
in any path segment equal to the string "." or the string ".." then an error
222
226
MUST be raised.
223
227
224
- N.B., how the underlying array store processes requests to store values under
228
+ N.B., how the underlying array store processes requests to store values under
225
229
keys containing the "/" character is entirely up to the store implementation
226
- and is not constrained by this specification. E.g., an array store could simply
227
- treat all keys as opaque ASCII strings; equally, an array store could map
228
- logical paths onto some kind of hierarchical storage (e.g., directories on a
230
+ and is not constrained by this specification. E.g., an array store could simply
231
+ treat all keys as opaque ASCII strings; equally, an array store could map
232
+ logical paths onto some kind of hierarchical storage (e.g., directories on a
229
233
file system).
230
234
231
235
Groups
232
236
~~~~~~
233
237
234
238
Arrays can be organized into groups which can also contain other groups. A
235
239
group is created by storing group metadata under the ".zgroup" key under some
236
- logical path. E.g., a group exists at the root of an array store if the
240
+ logical path. E.g., a group exists at the root of an array store if the
237
241
".zgroup" key exists in the store, and a group exists at logical path "foo/bar"
238
242
if the "foo/bar/.zgroup" key exists in the store.
239
243
240
- If the user requests a group to be created under some logical path, then groups
241
- MUST also be created at all ancestor paths. E.g., if the user requests group
244
+ If the user requests a group to be created under some logical path, then groups
245
+ MUST also be created at all ancestor paths. E.g., if the user requests group
242
246
creation at path "foo/bar" then groups MUST be created at path "foo" and the
243
247
root of the store, if they don't already exist.
244
248
@@ -256,7 +260,7 @@ zarr_format
256
260
257
261
Other keys MUST NOT be present within the metadata object.
258
262
259
- The members of a group are arrays and groups stored under logical paths that
263
+ The members of a group are arrays and groups stored under logical paths that
260
264
are direct children of the parent group's logical path. E.g., if groups exist
261
265
under the logical paths "foo" and "foo/bar" and an array exists at logical path
262
266
"foo/baz" then the members of the group at path "foo" are the group at path
@@ -265,8 +269,8 @@ under the logical paths "foo" and "foo/bar" and an array exists at logical path
265
269
Attributes
266
270
----------
267
271
268
- An array or group can be associated with custom attributes, which are simple
269
- key/value items with application-specific meaning. Custom attributes are
272
+ An array or group can be associated with custom attributes, which are simple
273
+ key/value items with application-specific meaning. Custom attributes are
270
274
encoded as a JSON object and stored under the ".zattrs" key within an array
271
275
store.
272
276
@@ -377,7 +381,7 @@ Modify the array attributes::
377
381
Storing multiple arrays in a hierarchy
378
382
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
379
383
380
- Below is an example of storing multiple Zarr arrays organized into a group
384
+ Below is an example of storing multiple Zarr arrays organized into a group
381
385
hierarchy, using a directory on the local file system as storage. This storage
382
386
implementation maps logical paths onto directory paths on the file system,
383
387
however this is an implementation choice and is not required.
0 commit comments