Skip to content

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Aug 4, 2025

This PR will give each codec a v2 and v3 JSON de/serialization routines.

depends on #3318

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Sep 24, 2025
@d-v-b d-v-b marked this pull request as ready for review September 24, 2025 10:46
@d-v-b
Copy link
Contributor Author

d-v-b commented Sep 24, 2025

this is now in a phase where I would really appreciate eyes from @zarr-developers/python-core-devs.

The goal of this PR is twofold:

  1. To use the exact same codec classes to create Zarr V2 and Zarr V3 arrays. This will resolve a major problem for v2 -> v3 conversion. Today people have to switch out their codec classes, even when going from numcodecs.Gzip to zarr.codecs.gzip. This PR handles that translation internally.

  2. to gracefully handle numcodecs codecs, which includes codecs like numcodecs.gzip which is basically unchanged in zarr v3, to numcodecs.blosc which received breaking changes in zarr v3, all the way to codecs defined in imagecodecs like the jpeg codec, which isn't even standardized yet. This PR handles all three of those cases when creating zarr v2 or v3 arrays (because of goal 1).

In this PR, when a user shows up with compressor={"id": "gzip", "level": 1} or compressor=numcodecs.Gzip(level=1), we will resolve the compressor to an instance of zarr.codecs.GZipCodec. When a user shows up with a completely unknown codec that adheres to the numcodecs API, we will wrap the codec in a special wrapper class and make a best-effort attempt to create a valid codec pipeline around that codec.

This PR also adds typeddict classes for the v2 and v3 form of each codec, which was laborious but IMO worth it for type safety.

If you have time, please look this over and / or test this on your v2 -> v3 workloads. That would be extremely helpful. I think these changes are on the same scale as the data type changes, so this requires a lot of finesse and potentially follow-up PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant