Skip to content

Bug: zarr.open behaves different than zarr.open_group with mode w- on gs:// URIs #712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adrianloy opened this issue Mar 25, 2021 · 14 comments

Comments

@adrianloy
Copy link

The convenient function zarr.open does not work to create a zarr group on a gcp bucket with mode w-:

Minimal, reproducible code sample:

an_empty_gs_uri = "gs://my-bucket/test.zarr"
zarr.open(an_empty_gs_uri, mode=w-)

Problem description

If we use w- mode above code breaks, even if we dont have a group at the URI given:

---------------------------------------------------------------------------
ReadOnlyError                             Traceback (most recent call last)
<ipython-input-11-d0b46f9c7e3f> in <module>
----> 1 zarr.open(t3, mode='w-')

~/merantix/mxlabs-chameleon/venv/lib/python3.8/site-packages/zarr/convenience.py in open(store, mode, **kwargs)
     84             return open_array(store, mode=mode, **kwargs)
     85         else:
---> 86             return open_group(store, mode=mode, **kwargs)
     87 
     88     elif mode == "a":

~/merantix/mxlabs-chameleon/venv/lib/python3.8/site-packages/zarr/hierarchy.py in open_group(store, mode, cache_attrs, synchronizer, path, chunk_store, storage_options)
   1181             raise ContainsGroupError(path)
   1182         else:
-> 1183             init_group(store, path=path, chunk_store=chunk_store)
   1184 
   1185     # determine read only status

~/merantix/mxlabs-chameleon/venv/lib/python3.8/site-packages/zarr/storage.py in init_group(store, overwrite, path, chunk_store)
    470 
    471     # initialise metadata
--> 472     _init_group_metadata(store=store, overwrite=overwrite, path=path,
    473                          chunk_store=chunk_store)
    474 

~/merantix/mxlabs-chameleon/venv/lib/python3.8/site-packages/zarr/storage.py in _init_group_metadata(store, overwrite, path, chunk_store)
    497     meta = dict()  # type: ignore
    498     key = _path_to_prefix(path) + group_meta_key
--> 499     store[key] = encode_group_metadata(meta)
    500 
    501 

~/merantix/mxlabs-chameleon/venv/lib/python3.8/site-packages/zarr/storage.py in __setitem__(self, key, value)
   1058     def __setitem__(self, key, value):
   1059         if self.mode == 'r':
-> 1060             raise ReadOnlyError()
   1061         key = self._normalize_key(key)
   1062         path = self.dir_path(key)

ReadOnlyError: object is read-only

If we use mode w it works. Also zarr.open_group works with the URI and the mode w- .
From having a quick look, the underlying fsstore seems to be configured in read only mode when we use zarr.open(gs_uri, mode=w-).

Version and installation information

Please provide the following:

  • Value of zarr.__version__ 2.6.1
  • Value of numcodecs.__version__ 0.7.3
  • Value of fsspec.__version__ 0.8.7
  • Version of Python interpreter 3.8.5
  • Operating system (Linux/Windows/Mac): Linux
  • How Zarr was installed (e.g., "using pip into virtual environment", or "using conda") pip
@arogozhnikov
Copy link

arogozhnikov commented Dec 15, 2021

I believe my issue is related:

I've tried to create a dataset in s3 (all ok at this step), but if I try to open in any writeable mode ('a', 'r+', 'w-'), zarr forbids writing new datasets with the same error:

import zarr
import numpy as np

with zarr.open('s3://mybucket/zarr_experiment.zarr', mode='w') as g:
    g['x'] = np.arange(200 * 200).reshape(200, 200).astype('uint16')
    print(g['x'][:].sum())

with zarr.open('s3://mybucket/zarr_experiment.zarr', mode='r') as g:
    print(list(g))
    print(g['x'][:].sum())    

with zarr.open('s3://mybucket/zarr_experiment.zarr', mode='a') as g:
    print(list(g))
    print(g['x'][:].sum())
    g['y'] = np.arange(200 * 200).reshape(200, 200).astype('uint16')   # <--- this line always fails

all printed sums coincide and correct.

Error I get:

---------------------------------------------------------------------------
ReadOnlyError                             Traceback (most recent call last)
/var/folders/m7/d0p8pv5s6x5g09z3rwkgp24m0000gn/T/ipykernel_19780/2907830151.py in <module>
      2     print(list(g))
      3     print(g['x'][:].sum())
----> 4     g3['y'] = np.arange(200 * 200).reshape(200, 200).astype('uint16')

~/envs/pipeline/lib/python3.9/site-packages/zarr/hierarchy.py in __setitem__(self, item, value)
    350 
    351     def __setitem__(self, item, value):
--> 352         self.array(item, value, overwrite=True)
    353 
    354     def __delitem__(self, item):

~/envs/pipeline/lib/python3.9/site-packages/zarr/hierarchy.py in array(self, name, data, **kwargs)
    948         """Create an array. Keyword arguments as per
    949         :func:`zarr.creation.array`."""
--> 950         return self._write_op(self._array_nosync, name, data, **kwargs)
    951 
    952     def _array_nosync(self, name, data, **kwargs):

~/envs/pipeline/lib/python3.9/site-packages/zarr/hierarchy.py in _write_op(self, f, *args, **kwargs)
    659 
    660         with lock:
--> 661             return f(*args, **kwargs)
    662 
    663     def create_group(self, name, overwrite=False):

~/envs/pipeline/lib/python3.9/site-packages/zarr/hierarchy.py in _array_nosync(self, name, data, **kwargs)
    954         kwargs.setdefault('synchronizer', self._synchronizer)
    955         kwargs.setdefault('cache_attrs', self.attrs.cache)
--> 956         return array(data, store=self._store, path=path, chunk_store=self._chunk_store,
    957                      **kwargs)
    958 

~/envs/pipeline/lib/python3.9/site-packages/zarr/creation.py in array(data, **kwargs)
    372 
    373     # instantiate array
--> 374     z = create(**kwargs)
    375 
    376     # fill with data

~/envs/pipeline/lib/python3.9/site-packages/zarr/creation.py in create(shape, chunks, dtype, compressor, fill_value, order, store, synchronizer, overwrite, path, chunk_store, filters, cache_metadata, cache_attrs, read_only, object_codec, dimension_separator, **kwargs)
    136 
    137     # initialize array metadata
--> 138     init_array(store, shape=shape, chunks=chunks, dtype=dtype, compressor=compressor,
    139                fill_value=fill_value, order=order, overwrite=overwrite, path=path,
    140                chunk_store=chunk_store, filters=filters, object_codec=object_codec,

~/envs/pipeline/lib/python3.9/site-packages/zarr/storage.py in init_array(store, shape, chunks, dtype, compressor, fill_value, order, overwrite, path, chunk_store, filters, object_codec, dimension_separator)
    351     _require_parent_group(path, store=store, chunk_store=chunk_store, overwrite=overwrite)
    352 
--> 353     _init_array_metadata(store, shape=shape, chunks=chunks, dtype=dtype,
    354                          compressor=compressor, fill_value=fill_value,
    355                          order=order, overwrite=overwrite, path=path,

~/envs/pipeline/lib/python3.9/site-packages/zarr/storage.py in _init_array_metadata(store, shape, chunks, dtype, compressor, fill_value, order, overwrite, path, chunk_store, filters, object_codec, dimension_separator)
    378     if overwrite:
    379         # attempt to delete any pre-existing items in store
--> 380         rmdir(store, path)
    381         if chunk_store is not None:
    382             rmdir(chunk_store, path)

~/envs/pipeline/lib/python3.9/site-packages/zarr/storage.py in rmdir(store, path)
    121     if hasattr(store, 'rmdir'):
    122         # pass through
--> 123         store.rmdir(path)
    124     else:
    125         # slow version, delete one key at a time

~/envs/pipeline/lib/python3.9/site-packages/zarr/storage.py in rmdir(self, path)
   1207     def rmdir(self, path=None):
   1208         if self.mode == 'r':
-> 1209             raise ReadOnlyError()
   1210         store_path = self.dir_path(path)
   1211         if self.fs.isdir(store_path):

ReadOnlyError: object is read-only

All fresh: fsspec '2021.10.1', zarr '2.10.3', numcodecs: '0.9.1', mac os, pip, python 3.9.

@joshmoore
Copy link
Member

@adrianloy's 2.6 version likely discounts #696 (2.9). #660 was added in 2.6.0. Do either of you still know the last version that still worked for you?

@arogozhnikov
Copy link

@joshmoore I'm just exploring/testing format and did not use it previously.

@joshmoore
Copy link
Member

This issue may be fixed by #916

@arogozhnikov
Copy link

@joshmoore nice, I can recheck after PR is merged. Though as of now there is noone assigned to review #916

@joshmoore
Copy link
Member

@arogozhnikov, now merged. Let me know how it works for you.

@arogozhnikov
Copy link

@joshmoore I am on '2.11.0a3.dev41' and nothing changed, I get

ReadOnlyError: object is read-only

on the same example code above

@joshmoore
Copy link
Member

Thanks, @arogozhnikov. @martindurant, do you know off hand if

https://github.com/zarr-developers/zarr-python/blob/master/zarr/convenience.py#L80

should actually be clobber = mode in ('w', 'a')? It's feeling a bit like all the usages of normalize_store_arg need reviewing.

@martindurant
Copy link
Member

I am not certain in this context, but "clobber" ought to mean that any existing dataset is removed before write, right?

(not clobber) is not necessarily the same as (read only), so perhaps it's the naming convention that's wrong?

@arogozhnikov
Copy link

@joshmoore this change does not help.

From the log above you may see that mode somehow transforms to 'r':

   1208         if self.mode == 'r':
-> 1209             raise ReadOnlyError()

@joshmoore
Copy link
Member

@martindurant : hmmm..... that's a good question. From

mode = mode if clobber else "r"
I read clobber to mean:

"use the mode argument (which defaults to "w") rather than just "r"

@arogozhnikov : thanks!

@joshmoore
Copy link
Member

@arogozhnikov et al.: Another attempted fix was just released with 2.11.1.

@arogozhnikov
Copy link

@joshmoore
I confirm that with 2.11.1 the issue is solved! 🎉

@joshmoore
Copy link
Member

whew Thanks @d70-t and @martindurant! (#976)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants