-
Notifications
You must be signed in to change notification settings - Fork 33
Closed
Labels
Description
Hi All!
I spent some time looking through the work surrounding the v3.0 core protocol over in #16. My goal for this issue is to summarize the current status of this work and help spur conversation in the community. Any feedback can then be used to guide and prioritize future work on the core protocol and protocol extensions.
cc @alimanfoo @jakirkham @joshmoore @ryan-williams
Specification development process document (current status)
- Defines concept of a core protocol, protocol extensions, stores, and codecs
- Will define the process for minor/major changes to the core protocol and how decisions are made
- Could use feedback from the community
Core protocol (current status)
-
Core concepts and terminology
- E.g. arrays, groups, chunks, etc.
- These all seem to be well defined and in good shape overall
-
Node names
- Restriction to node name characters and some possible names
- Case insensitive uniqueness of siblings
- Question: Are the restrictions on node names too restrictive?
-
Data types
- Core data types are boolean, integer, and floating point
- Complex and datetime dtypes can be implemented as protocol extensions
- Question: What about languages that don't easily support the full list of core data types? (xref Best practices when reading zarr arrays in languages with limited support for data types. community#25)
-
Chunking
- Core protocol consists of regular grid. Other grid types, e.g. non-uniform chunking or unknown chunk sizes, can be defined via protocol extensions
- Core protocol uses C- and F-order for the memory layout of each chunk. Other layouts, e.g. sparse memory layouts, are possible via protocol extensions
- Chunk encoding consists of a compressor codec. Note this does not include filters, which can be supported via protocol extensions
-
Metadata
- Three types of metadata documents: bootstrap metadata, array metadata, and group metadata
- The bootstrap metadata doc must be encoded in JSON, while the array and group metadata docs can use other encodings
- Bootstrap metadata document contains the protocol specification used (e.g. v3.0, v3.1, etc.), how the array and group metadata documents are encoded (default is JSON), and a list of protocol extensions used
- Array metadata document contains the array shape, data type, user-defined attributes, etc.
- Protocol extension points include: data type, chunk grid type, and chunk memory layout
extensions
metadata value need to be defined in protocol spec- Question: There seems to be a question about how to specify the
fill_value
for dtypes other than bool and int
- Group metadata document contains protocol extensions and user-defined attributes
-
Stores
- Defines abstract store interface which can be implemented on top of different storage technology backends
- Abstract interface methods for operating on keys and values in a store include
get
,set
,delete
, etc. - Not all abstract methods need to be implemented (e.g. can have a read-only store)
- Core protocol does not define any store implementations, but gives examples of possible implementations
- Some protocol operation need to be filled out
-
Protocol extensions
- This section needs to be completed
Protocol extensions (current status)
Three protocol extensions are currently in progress:
- Datetime data types - looks relatively filled out
- Complex data types - currently a scaffolding
- Filters - currently a scaffolding
Several other possible extensions are outlined in #49
Stores (current status)
- Currently one store spec in progress, the file system store
joshmoore