Skip to content

don't serialize empty tuples for v2 filters, and warn when reading such metadata #2847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 20, 2025

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Feb 18, 2025

We were violating the v2 spec by serializing "no filters" as the empty array. We created arrays in the wild with this metadata, so any fix should be backwards compatible.

To fix this bug, this PR changes the filters parsing function so that it no longer returns empty tuples, and instead return None when no filters are specified. For backwards compatibility with datasets created with earlier versions of zarr-python 3.x, the from_dict method will replace empty lists / tuples with the value None and emit a warning.

After a reasonable amount of time (a few minor releases?) we should remove this special-casing and treat filters = [] as an error. Happy to chat about the timeline.

closes #2842

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 18, 2025
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Feb 18, 2025
@d-v-b d-v-b requested review from dcherian and dstansby February 18, 2025 19:39
@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 18, 2025

i don't think the coverage report is accurate

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems OK to me.

@@ -113,6 +113,13 @@ async def test_roundtrip_array_metadata(
assert actual == expected


@given(store=stores, meta=array_metadata()) # type: ignore[misc]
def test_array_metadata_meets_spec(store: Store, meta: ArrayV2Metadata | ArrayV3Metadata) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this test does nothing for v3 metadata, would it be possible to only run it on ArrayV2Metadata?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I made it do something ;) . really we should fill it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was more thinking that it would make sense to test the v2 and v3 specs in separate tests, since they are logically independent, but we can do that refactor later

@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 18, 2025

i will look over this again tomorrow. I think we should definitely try to fix the underlying bug this week, e.g. via this PR

@d-v-b d-v-b merged commit 96c9677 into zarr-developers:main Feb 20, 2025
30 checks passed
dcherian added a commit to moradology/zarr-python that referenced this pull request Feb 20, 2025
* main:
  don't serialize empty tuples for v2 filters, and warn when reading such metadata (zarr-developers#2847)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only use None for "no filters"
2 participants