Skip to content

Core protocol v3.0 status #53

@jrbourbeau

Description

@jrbourbeau

Hi All!

I spent some time looking through the work surrounding the v3.0 core protocol over in #16. My goal for this issue is to summarize the current status of this work and help spur conversation in the community. Any feedback can then be used to guide and prioritize future work on the core protocol and protocol extensions.

cc @alimanfoo @jakirkham @joshmoore @ryan-williams

Specification development process document (current status)

  • Defines concept of a core protocol, protocol extensions, stores, and codecs
  • Will define the process for minor/major changes to the core protocol and how decisions are made
  • Could use feedback from the community

Core protocol (current status)

  • Core concepts and terminology

    • E.g. arrays, groups, chunks, etc.
    • These all seem to be well defined and in good shape overall
  • Node names

    • Restriction to node name characters and some possible names
    • Case insensitive uniqueness of siblings
    • Question: Are the restrictions on node names too restrictive?
  • Data types

  • Chunking

    • Core protocol consists of regular grid. Other grid types, e.g. non-uniform chunking or unknown chunk sizes, can be defined via protocol extensions
    • Core protocol uses C- and F-order for the memory layout of each chunk. Other layouts, e.g. sparse memory layouts, are possible via protocol extensions
    • Chunk encoding consists of a compressor codec. Note this does not include filters, which can be supported via protocol extensions
  • Metadata

    • Three types of metadata documents: bootstrap metadata, array metadata, and group metadata
    • The bootstrap metadata doc must be encoded in JSON, while the array and group metadata docs can use other encodings
    • Bootstrap metadata document contains the protocol specification used (e.g. v3.0, v3.1, etc.), how the array and group metadata documents are encoded (default is JSON), and a list of protocol extensions used
    • Array metadata document contains the array shape, data type, user-defined attributes, etc.
      • Protocol extension points include: data type, chunk grid type, and chunk memory layout
      • extensions metadata value need to be defined in protocol spec
      • Question: There seems to be a question about how to specify the fill_value for dtypes other than bool and int
    • Group metadata document contains protocol extensions and user-defined attributes
  • Stores

    • Defines abstract store interface which can be implemented on top of different storage technology backends
    • Abstract interface methods for operating on keys and values in a store include get, set, delete, etc.
    • Not all abstract methods need to be implemented (e.g. can have a read-only store)
    • Core protocol does not define any store implementations, but gives examples of possible implementations
    • Some protocol operation need to be filled out
  • Protocol extensions

    • This section needs to be completed

Protocol extensions (current status)

Three protocol extensions are currently in progress:

  • Datetime data types - looks relatively filled out
  • Complex data types - currently a scaffolding
  • Filters - currently a scaffolding

Several other possible extensions are outlined in #49

Stores (current status)

  • Currently one store spec in progress, the file system store

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions