Skip to content

Allow appending non-numerical types to zarr arrays. #3480

Closed
@amatsukawa

Description

@amatsukawa

MCVE Code Sample

Zarr itself allows appending np.datetime and np.bool types.

>>> path = 'tmp/test.zarr'
>>> z1 = zarr.open(path, mode='w', shape=(10,), chunks=(10,), dtype='M8[D]')
>>> z1[:] = '1990-01-01'
>>> z2 = zarr.open(path, mode='a')
>>> a = np.array(['1992-01-01'] * 10, dtype='datetime64[D]')
>>> z2.append(a)
(20,)
>>> z2
<zarr.core.Array (20,) datetime64[D]>

But it's equivalent in xarray throws an error:

>>> ds = xr.Dataset(
...     {'y': (('x',), np.array(['1991-01-01'] * 10, dtype='datetime64[D]'))}
... )
>>> ds.to_zarr('tmp/test_xr.zarr', mode='w')
<xarray.backends.zarr.ZarrStore object at 0x31f403170>
>>> ds2 = xr.Dataset(
...      {'y': (('x',), np.array(['1992-01-01'] * 10, dtype='datetime64[D]'))}
... )
>>> ds2.to_zarr('tmp/test_xr.zarr', mode='a', append_dim='x')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/core/dataset.py", line 1616, in to_zarr
    append_dim=append_dim,
  File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1304, in to_zarr
    _validate_datatypes_for_zarr_append(dataset)
  File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1249, in _validate_datatypes_for_zarr_append
    check_dtype(k)
  File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1245, in check_dtype
    "unicode string or an object".format(var)
ValueError: Invalid dtype for data variable: <xarray.DataArray 'y' (x: 10)>
array(['1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
       '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
       '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
       '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
       '1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000'],
      dtype='datetime64[ns]')
Dimensions without coordinates: x dtype must be a subtype of number, a fixed sized string, a fixed size unicode string or an object

Expected Output

The append should succeed.

Problem Description

This function in xarray/api.py is too strict on types:

def _validate_datatypes_for_zarr_append(dataset):
    """DataArray.name and Dataset keys must be a string or None"""

    def check_dtype(var):
        if (
            not np.issubdtype(var.dtype, np.number)
            and not coding.strings.is_unicode_dtype(var.dtype)
            and not var.dtype == object
        ):
            # and not re.match('^bytes[1-9]+$', var.dtype.name)):
            raise ValueError(
                "Invalid dtype for data variable: {} "
                "dtype must be a subtype of number, "
                "a fixed sized string, a fixed size "
                "unicode string or an object".format(var)
            )

    for k in dataset.data_vars.values():
        check_dtype(k)

np.datetime64[.] and np.bool are not numbers:

>>> np.issubdtype(np.dtype('datetime64[D]'), np.number)
False
>>> np.issubdtype(np.dtype('bool'), np.number)
False

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 15:17:50) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None

xarray: 0.14.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: 4.7.12
pytest: 5.2.1
IPython: 7.8.0
sphinx: 2.2.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions