diff --git a/docs/user-guide/arrays.rst b/docs/user-guide/arrays.rst index 76fb8e6910..4d1ad12abd 100644 --- a/docs/user-guide/arrays.rst +++ b/docs/user-guide/arrays.rst @@ -196,7 +196,7 @@ algorithm (compression level 3) internally within Blosc, and with the bit-shuffle filter applied. When using a compressor, it can be useful to get some diagnostics on the -compression ratio. Zarr arrays provide the :property:`zarr.Array.info` property +compression ratio. Zarr arrays provide the :attr:`zarr.Array.info` property which can be used to print useful diagnostics, e.g.: .. ipython:: python @@ -212,7 +212,7 @@ prints additional diagnostics, e.g.: .. note:: :func:`zarr.Array.info_complete` will inspect the underlying store and may - be slow for large arrays. Use :property:`zarr.Array.info` if detailed storage + be slow for large arrays. Use :attr:`zarr.Array.info` if detailed storage statistics are not needed. If you don't specify a compressor, by default Zarr uses the Blosc diff --git a/docs/user-guide/extending.rst b/docs/user-guide/extending.rst new file mode 100644 index 0000000000..405dcb92c0 --- /dev/null +++ b/docs/user-guide/extending.rst @@ -0,0 +1,91 @@ + +Extending Zarr +============== + +Zarr-Python 3 was designed to be extensible. This means that you can extend +the library by writing custom classes and plugins. Currently, Zarr can be extended +in the following ways: + +Custom codecs +------------- + +.. note:: + This section explains how custom codecs can be created for Zarr version 3 data. For Zarr + version 2, codecs should subclass the + `numcodecs.abc.Codec `_ + base class and register through + `numcodecs.registry.register_codec `_. + +There are three types of codecs in Zarr: +- array-to-array +- array-to-bytes +- bytes-to-bytes + +Array-to-array codecs are used to transform the array data before serializing +to bytes. Examples include delta encoding or scaling codecs. Array-to-bytes codecs are used +for serializing the array data to bytes. In Zarr, the main codec to use for numeric arrays +is the :class:`zarr.codecs.BytesCodec`. Bytes-to-bytes codecs transform the serialized bytestreams +of the array data. Examples include compression codecs, such as +:class:`zarr.codecs.GzipCodec`, :class:`zarr.codecs.BloscCodec` or +:class:`zarr.codecs.ZstdCodec`, and codecs that add a checksum to the bytestream, such as +:class:`zarr.codecs.Crc32cCodec`. + +Custom codecs for Zarr are implemented by subclassing the relevant base class, see +:class:`zarr.abc.codec.ArrayArrayCodec`, :class:`zarr.abc.codec.ArrayBytesCodec` and +:class:`zarr.abc.codec.BytesBytesCodec`. Most custom codecs should implemented the +``_encode_single`` and ``_decode_single`` methods. These methods operate on single chunks +of the array data. Alternatively, custom codecs can implement the ``encode`` and ``decode`` +methods, which operate on batches of chunks, in case the codec is intended to implement +its own batch processing. + +Custom codecs should also implement the following methods: + +- ``compute_encoded_size``, which returns the byte size of the encoded data given the byte + size of the original data. It should raise ``NotImplementedError`` for codecs with + variable-sized outputs, such as compression codecs. +- ``validate`` (optional), which can be used to check that the codec metadata is compatible with the + array metadata. It should raise errors if not. +- ``resolve_metadata`` (optional), which is important for codecs that change the shape, + dtype or fill value of a chunk. +- ``evolve_from_array_spec`` (optional), which can be useful for automatically filling in + codec configuration metadata from the array metadata. + +To use custom codecs in Zarr, they need to be registered using the +`entrypoint mechanism `_. +Commonly, entrypoints are declared in the ``pyproject.toml`` of your package under the +``[project.entry-points."zarr.codecs"]`` section. Zarr will automatically discover and +load all codecs registered with the entrypoint mechanism from imported modules. + +.. code-block:: toml + + [project.entry-points."zarr.codecs"] + "custompackage.fancy_codec" = "custompackage:FancyCodec" + +New codecs need to have their own unique identifier. To avoid naming collisions, it is +strongly recommended to prefix the codec identifier with a unique name. For example, +the codecs from ``numcodecs`` are prefixed with ``numcodecs.``, e.g. ``numcodecs.delta``. + +.. note:: + Note that the extension mechanism for the Zarr version 3 is still under development. + Requirements for custom codecs including the choice of codec identifiers might + change in the future. + +It is also possible to register codecs as replacements for existing codecs. This might be +useful for providing specialized implementations, such as GPU-based codecs. In case of +multiple codecs, the :mod:`zarr.core.config` mechanism can be used to select the preferred +implementation. + +Custom stores +------------- + +Coming soon. + +Custom array buffers +-------------------- + +Coming soon. + +Other extensions +---------------- + +In the future, Zarr will support writing custom custom data types and chunk grids. diff --git a/docs/user-guide/index.rst b/docs/user-guide/index.rst index 85c2e36a84..8647eeb3e6 100644 --- a/docs/user-guide/index.rst +++ b/docs/user-guide/index.rst @@ -24,7 +24,8 @@ Advanced Topics performance consolidated_metadata + extending + .. Coming soon async - extending diff --git a/pyproject.toml b/pyproject.toml index 1e164225e9..36842ba927 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -139,8 +139,8 @@ numpy = ["1.25", "2.1"] features = ["gpu"] [tool.hatch.envs.test.scripts] -run-coverage = "pytest --cov-config=pyproject.toml --cov=pkg --cov=tests" -run-coverage-gpu = "pip install cupy-cuda12x && pytest -m gpu --cov-config=pyproject.toml --cov=pkg --cov=tests" +run-coverage = "pytest --cov-config=pyproject.toml --cov=pkg --cov=src" +run-coverage-gpu = "pip install cupy-cuda12x && pytest -m gpu --cov-config=pyproject.toml --cov=pkg --cov=src" run = "run-coverage --no-cov" run-verbose = "run-coverage --verbose" run-mypy = "mypy src" @@ -160,7 +160,7 @@ numpy = ["1.25", "2.1"] version = ["minimal"] [tool.hatch.envs.gputest.scripts] -run-coverage = "pytest -m gpu --cov-config=pyproject.toml --cov=pkg --cov=tests" +run-coverage = "pytest -m gpu --cov-config=pyproject.toml --cov=pkg --cov=src" run = "run-coverage --no-cov" run-verbose = "run-coverage --verbose" run-mypy = "mypy src" diff --git a/src/zarr/storage/local.py b/src/zarr/storage/local.py index f9b1747c31..f4226792cb 100644 --- a/src/zarr/storage/local.py +++ b/src/zarr/storage/local.py @@ -189,6 +189,18 @@ async def set_partial_values( await concurrent_map(args, asyncio.to_thread, limit=None) # TODO: fix limit async def delete(self, key: str) -> None: + """ + Remove a key from the store. + + Parameters + ---------- + key : str + + Notes + ----- + If ``key`` is a directory within this store, the entire directory + at ``store.root / key`` is deleted. + """ # docstring inherited self._check_writable() path = self.root / key diff --git a/test.py b/test.py deleted file mode 100644 index 29dac92c8b..0000000000 --- a/test.py +++ /dev/null @@ -1,7 +0,0 @@ -import zarr - -store = zarr.DirectoryStore("data") -r = zarr.open_group(store=store) -z = r.full("myArray", 42, shape=(), dtype="i4", compressor=None) - -print(z.oindex[...])