Skip to content

Fixed '/' prepend bug in ABSStore #526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 12, 2019

Conversation

shikharsg
Copy link
Contributor

@shikharsg shikharsg commented Dec 11, 2019

Fixes '/' prepend bug in ABSStore, issue: #525

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • AppVeyor and Travis CI passes
  • Test coverage is 100% (Coveralls passes)

@shikharsg shikharsg changed the title Closes https://github.com/zarr-developers/zarr-python/issues/525 Fixed #525 Dec 11, 2019
@shikharsg shikharsg changed the title Fixed #525 Fixed '/' prepend bug in ABSStore Dec 11, 2019
@shikharsg shikharsg requested a review from jrbourbeau December 11, 2019 12:01
Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shikharsg, these are questions really more than comments, generally LGTM.

def _append_path_to_prefix(path, prefix):
return '/'.join([normalize_storage_path(prefix),
normalize_storage_path(path)])
def _append_path_to_prefix(self, path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, what happens here if self.prefix is None (default)?

Copy link
Contributor Author

@shikharsg shikharsg Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During instantiation in the __init__ function, any prefix supplied is passed to the zarr.util.normalize_storage_path function before storing it in self.prefix, so if prefix is None it will be stored as '' in self.prefix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool, thanks. In which case maybe the default value for the prefix parameter could the empty string rather than None, might be a little more obvious. Also if prefix is normalised in the constructor then probably could remove calls to normalize self.prefix in the methods.

account_key='bar', blob_service_kwargs={'is_emulated': True})
store.rmdir()
return store

def test_iterators_with_prefix(self):
store = self.create_store(prefix='test_prefix')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this makes sense as haven't looked super closely at the implementation, but maybe worth testing multiple prefixes here, with and without leading or trailing slashes, just to confirm everything behaves as you expect? E.g.:

for prefix in 'test_prefix', '/test_prefix', 'test_prefix/', 'test/prefix', '', None:
    store = self.create_store(prefix=prefix)
    # ... tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since prefix is passed to normalize_storage_path before storing in self.prefix any leading or trailing slashes will be removed.

Putting in some print statements in the tests:

    def test_iterators_with_prefix(self):
        for prefix in ['test_prefix', '/test_prefix', 'test_prefix/', 'test/prefix', '', None]:
            print(prefix)
            store = self.create_store(prefix=prefix)

            # test iterator methods on empty store
            assert 0 == len(store)
            assert set() == set(store)
            assert set() == set(store.keys())
            assert set() == set(store.values())
            assert set() == set(store.items())

            # setup some values
            store['a'] = b'aaa'
            store['b'] = b'bbb'
            store['c/d'] = b'ddd'
            store['c/e/f'] = b'fff'
            print([blob for blob in store.client.list_blob_names('test')])

            # test iterators on store with data
            assert 4 == len(store)
            assert {'a', 'b', 'c/d', 'c/e/f'} == set(store)
            assert {'a', 'b', 'c/d', 'c/e/f'} == set(store.keys())
            assert {b'aaa', b'bbb', b'ddd', b'fff'} == set(store.values())
            assert ({('a', b'aaa'), ('b', b'bbb'), ('c/d', b'ddd'), ('c/e/f', b'fff')} ==
                    set(store.items()))
 

Output is this:

test_prefix
['test_prefix/a', 'test_prefix/b', 'test_prefix/c/d', 'test_prefix/c/e/f']
/test_prefix
['test_prefix/a', 'test_prefix/b', 'test_prefix/c/d', 'test_prefix/c/e/f']
test_prefix/
['test_prefix/a', 'test_prefix/b', 'test_prefix/c/d', 'test_prefix/c/e/f']
test/prefix
['test/prefix/a', 'test/prefix/b', 'test/prefix/c/d', 'test/prefix/c/e/f']

['a', 'b', 'c/d', 'c/e/f']
None
['a', 'b', 'c/d', 'c/e/f']

Which is what you would expect

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.

@tjcrone
Copy link
Member

tjcrone commented Dec 11, 2019

Sorry if I am a little confused by this. But are you trying to put a Zarr group inside of a blob storage account not inside a blob container?

@shikharsg
Copy link
Contributor Author

shikharsg commented Dec 11, 2019

@tjcrone sorry if that wasn't clear. All metadata (.zgroup, .zarray...) and data(chunks) will be blobs inside a particular container. Nothing is at the storage account level.

The issue was that ABSStore prepends an extra '/' to all blob names. So if you opened a group in the root of the container the blobs would be named as ['/.zgroup', '/foo/.zgroup'], instead of ['.zgroup', 'foo/.zgroup']. The former is also awkward to look at in the storage explorer(see pictures in issue: #525).

@tjcrone
Copy link
Member

tjcrone commented Dec 11, 2019

I don't think we ever contemplated the idea of turning a blob container into a Zarr group, which is what you are doing here by passing prefix=None. I guess we always thought that Zarr groups (sometimes referred to as Zarr "files") would have names, and live inside blob containers, which seems like the right thing to do. Maybe it is fine to do this, but at the moment I remain skeptical. @alimanfoo, do you have an opinion here? Would you recommend treating a blob container like a Zarr group? More info on Azure blob containers here: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction.

@shikharsg
Copy link
Contributor Author

Ah i understand now. Yes, this PR will allow a zarr container to become a group(also array for that matter).

Before this PR we would probably have a structure like this:
['group1/.zgroup', 'group1/group2/.zgroup', 'group1/array1/.zarray']
where everything is inside of group1. I'm not sure how that would be "conceptually different" from making the container itself a group where we have in addition to the above three also a .zgroup in the container:
['.zgroup', 'group1/.zgroup', 'group1/group2/.zgroup', 'group1/array1/.zarray']

@shikharsg
Copy link
Contributor Author

@martindurant does this functionality exist in s3fs.S3Map, i.e. are we allowed to treat an S3 bucket as a zarr group?

@tjcrone
Copy link
Member

tjcrone commented Dec 11, 2019

In your example group1 is the name of the Zarr "file", which is typically given a .zarr extension, but that's not required, and it is inside of a blob container. I think the idea of treating a blob container as a Zarr "file" might be a bad idea, primarily because a container is not a blob. They are created and destroyed differently. I think if you tried to create a Zarr file that was a container rather than a blob inside a container, it would not work. Have you tried doing this?

@alimanfoo
Copy link
Member

FWIW I don't see any problem in putting a zarr group (or array) at the root of a bucket, if that's what you want to do, i.e., you don't want to use the bucket for anything else. Should work fine on S3 and GCS.

@martindurant
Copy link
Member

@shikharsg , I think so? Should try!

@tjcrone
Copy link
Member

tjcrone commented Dec 11, 2019

I think we should maintain as much parity with the S3/GCS functionality as possible. So if it is possible to treat an S3 bucket as a Zarr group or array, and it has the functionality of creating a bucket just like it would create a file when the Zarr object needs creating, I'm fine with this.

@@ -1994,8 +1994,7 @@ def _append_path_to_prefix(self, path):
if self.prefix == '':
return normalize_storage_path(path)
else:
return '/'.join([normalize_storage_path(self.prefix),
normalize_storage_path(path)])
return '/'.join([self.prefix, normalize_storage_path(path)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens here if prefix='/'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the prefix is normalized in the constructor, if you pass in '/' for the prefix it will still be stored as '' in self.prefix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay great. Thanks.

@tjcrone
Copy link
Member

tjcrone commented Dec 11, 2019

Have we confirmed that the S3 and GCS will allow placing a Zarr group or array at the root of a bucket? If they do, we should merge this. I really think that treating an entire bucket or container as a Zarr file is an odd thing to do, perhaps because it removes a layer of abstraction that I think is useful. But maybe it's me that's odd. I will obviously defer to @alimanfoo, on this.

@tjcrone
Copy link
Member

tjcrone commented Dec 11, 2019

I guess it is sortof like putting the contents of a file at the root of a file system (not as a file in /, but actually at the root, because you don't plan to use the file system anyway. I don't think most file systems will let you do this, because the root of the file system is a container for storing files and other containers, so doing this would be weird. But honestly if y'all think that storing Zarr collections at containers not in them is a good idea, then let's try it and see how it goes.

@shikharsg
Copy link
Contributor Author

With S3 it works:

>>> s3 = s3fs.S3FileSystem(secret='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY', key='AKIAIOSFODNN7EXAMPLE', anon=False, client_kwargs={'endpoint_url':'http://localhost:9000'})
>>> store = s3fs.S3Map(root='test', s3=s3, check=False)
>>> root = zarr.group(store=store)

image

Couldn't try GCS as I don't have an account and there doesn't seem to be an emulator.

@shikharsg
Copy link
Contributor Author

Thanks for the helpful comments @alimanfoo @tjcrone @martindurant . If there are no further reviews/objections can I go ahead and merge?

Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@Carreau Carreau added this to the v2.4 milestone Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants