Skip to content

FSStore not handling .zmetadata correctly when dimension_separator is set to / at store level. #1121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tasansal opened this issue Sep 4, 2022 · 4 comments
Labels
bug Potential issues with the zarr-python library

Comments

@tasansal
Copy link
Contributor

tasansal commented Sep 4, 2022

Zarr version

v2.12.0

Numcodecs version

0.10.2

Python Version

3.10

Operating System

Linux

Installation

using pip into virtual environment

Description

When creating an FSStore with dimension separator / the .zmetadata file also gets renamed to /zmetadata.
I have tested this on Google Cloud Storage.

This causes inconsistent behavior in local vs. cloud stores. The local files get structured like this:

/root
|_ my_array
|_ .zgroup
|_ zmetadata

Not the missing . from .zmetadata.

And on the cloud storage, this is what happens:

/root
|_ my_array
|_ .zgroup
|_ /
  |_ zmetadata

Note that extra prefix / before .zmetadata and the missing ..

It appears that the local FSStore (or path handler) removes one of the slashes from path_to_root//zmetadata, whereas, on the cloud stores, it doesn't.

When we use zarr.open_consolidated, in some edge cases this fails. Such as: creating a Zarr on-prem and copying it to Google Cloud doesn't work because file structures are different.

The ideal behavior would be

/root
|_ my_array
|_ .zgroup
|_ .zmetadata

as usual, my_array has a dimension separator set to /, and Zarr should parse that properly.

One possible workaround is NOT to use the dimension_separator at the store level but use it when creating arrays.
However, this is prone to error since we would have to specify it every time we create an array, or else there could be inconsistent . or / arrays within the store.

This works as expected, on both cloud and local:

import zarr


store = zarr.storage.FSStore("test.zarr", mode="w")
root = zarr.open_group(store)
ds = root.create_dataset("my_dataset", shape=(5, 5), chunks=(1, 1), dtype='float32', overwrite=True, dimension_separator='/')
zarr.consolidate_metadata(store)

Steps to reproduce

import zarr


store = zarr.storage.FSStore("test.zarr", mode="w", dimension_separator="/")
root = zarr.open_group(store)
root.create_dataset("my_dataset", shape=(5, 5), chunks=(1, 1), dtype='float32', overwrite=True)
zarr.consolidate_metadata(store)

Additional output

No response

@tasansal tasansal added the bug Potential issues with the zarr-python library label Sep 4, 2022
@LucaMarconato
Copy link

I have also encountered this bug, I am passing zmetadata to open_consolidated() as a workaround, but a fix upstream would be better please.

@joshmoore
Copy link
Member

joshmoore commented Jul 16, 2023

(Pdb) store['.zmetadata']
*** KeyError: '/zmetadata'

points to the key lookup itself. Still investigating, but one thought: add a call to zarr.consolidate_metadata(store) to the base test(s) so that all stores call it.

@joshmoore
Copy link
Member

In FSStore, adding this hard-coding:

    def _normalize_key(self, key):
        key = normalize_storage_path(key).lstrip('/')
        if key:
            *bits, end = key.split('/')

            if end not in (
                self._array_meta_key,
                self._group_meta_key,
                self._attrs_key,
                ".zmetadata",  # <-- added this
            ):
                end = end.replace('.', self.key_separator)
                key = '/'.join(bits + [end])

corrects the problem, though it has very, very strong code smells.

joshmoore added a commit to joshmoore/zarr-python that referenced this issue Jul 17, 2023
The normalization code in FSStore uses a list of well-known
files to prevent incorrectly re-writing keys when dimension
separator is "/" rather than ".". This was initialized only
with the names specified in the spec, which left out
".zmetadata" from consolidated metadata.

see: zarr-developers#1121
joshmoore added a commit to joshmoore/zarr-python that referenced this issue Jul 17, 2023
The normalization code in FSStore uses a list of well-known
files to prevent incorrectly re-writing keys when dimension
separator is "/" rather than ".". This was initialized only
with the names specified in the spec, which left out
".zmetadata" from consolidated metadata.

see: zarr-developers#1121
@jhamman
Copy link
Member

jhamman commented Oct 17, 2024

Now that dimension separator is not part of the store api, this is no longer an issue.

@jhamman jhamman closed this as completed Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

4 participants