Skip to content

trouble saving zarr store directly to google cloud  #741

@naomi-henderson

Description

@naomi-henderson

This issue is related to #681, which is now closed. I have been trying to get this to work reliably for awhile - hopefully someone can tell me what I am missing.

I open a perfectly normal zarr dataset (my credentials are already cached):

import gcsfs
import xarray as xr
gcs = gcsfs.GCSFileSystem(token='cache', access='read_write')
zstore = 'gs://cmip6/AerChemMIP/BCC/BCC-ESM1/ssp370/r1i1p1f1/Amon/pr/gn/'
ds = xr.open_zarr(zstore,consolidated=True)

and ds looks perfectly normal:

<xarray.Dataset>
Dimensions:    (bnds: 2, lat: 64, lon: 128, time: 492)
Coordinates:
  * lat        (lat) float64 -87.86 -85.1 -82.31 -79.53 ... 82.31 85.1 87.86
    lat_bnds   (lat, bnds) float64 dask.array<chunksize=(64, 2), meta=np.ndarray>
  * lon        (lon) float64 0.0 2.812 5.625 8.438 ... 348.8 351.6 354.4 357.2
    lon_bnds   (lon, bnds) float64 dask.array<chunksize=(128, 2), meta=np.ndarray>
  * time       (time) object 2015-01-16 12:00:00 ... 2055-12-16 12:00:00
    time_bnds  (time, bnds) object dask.array<chunksize=(492, 2), meta=np.ndarray>
...etc

But now I try to write this zarr store directly to GCS:

ds.to_zarr(gcs.get_mapper('gs://cmip6/ztemp9/pr/gn/'),consolidated=True)

Unfortunately, the zarr store it writes is corrupt. For example:

xr.open_zarr(gcs.get_mapper('gs://cmip6/ztemp9/pr/gn/'),consolidated=True)

returns a dataset with no variables and coordinates (but attributes are all fine):

<xarray.Dataset>
Dimensions:  ()
Data variables:
    *empty*
Attributes:
    Conventions:            CF-1.7 CMIP-6.2
 ...etc

Oddly, the zarr store is still about the same size, about 13.8M.

When writing to local disk, to_zarr works fine on this example and does not drop the variable and coordinate info.

I have tried many permutations on this. For example, with and without consolidated=True. Without consolidated=True the store only contains .zarray and .zgroup no subdirectories - so it is even worse.

What am I doing wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions