Closed
Description
MCVE Code Sample
Zarr itself allows appending np.datetime
and np.bool
types.
>>> path = 'tmp/test.zarr'
>>> z1 = zarr.open(path, mode='w', shape=(10,), chunks=(10,), dtype='M8[D]')
>>> z1[:] = '1990-01-01'
>>> z2 = zarr.open(path, mode='a')
>>> a = np.array(['1992-01-01'] * 10, dtype='datetime64[D]')
>>> z2.append(a)
(20,)
>>> z2
<zarr.core.Array (20,) datetime64[D]>
But it's equivalent in xarray throws an error:
>>> ds = xr.Dataset(
... {'y': (('x',), np.array(['1991-01-01'] * 10, dtype='datetime64[D]'))}
... )
>>> ds.to_zarr('tmp/test_xr.zarr', mode='w')
<xarray.backends.zarr.ZarrStore object at 0x31f403170>
>>> ds2 = xr.Dataset(
... {'y': (('x',), np.array(['1992-01-01'] * 10, dtype='datetime64[D]'))}
... )
>>> ds2.to_zarr('tmp/test_xr.zarr', mode='a', append_dim='x')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/core/dataset.py", line 1616, in to_zarr
append_dim=append_dim,
File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1304, in to_zarr
_validate_datatypes_for_zarr_append(dataset)
File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1249, in _validate_datatypes_for_zarr_append
check_dtype(k)
File "/Users/personal/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 1245, in check_dtype
"unicode string or an object".format(var)
ValueError: Invalid dtype for data variable: <xarray.DataArray 'y' (x: 10)>
array(['1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
'1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
'1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
'1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000',
'1992-01-01T00:00:00.000000000', '1992-01-01T00:00:00.000000000'],
dtype='datetime64[ns]')
Dimensions without coordinates: x dtype must be a subtype of number, a fixed sized string, a fixed size unicode string or an object
Expected Output
The append should succeed.
Problem Description
This function in xarray/api.py
is too strict on types:
def _validate_datatypes_for_zarr_append(dataset):
"""DataArray.name and Dataset keys must be a string or None"""
def check_dtype(var):
if (
not np.issubdtype(var.dtype, np.number)
and not coding.strings.is_unicode_dtype(var.dtype)
and not var.dtype == object
):
# and not re.match('^bytes[1-9]+$', var.dtype.name)):
raise ValueError(
"Invalid dtype for data variable: {} "
"dtype must be a subtype of number, "
"a fixed sized string, a fixed size "
"unicode string or an object".format(var)
)
for k in dataset.data_vars.values():
check_dtype(k)
np.datetime64[.]
and np.bool
are not numbers:
>>> np.issubdtype(np.dtype('datetime64[D]'), np.number)
False
>>> np.issubdtype(np.dtype('bool'), np.number)
False
Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.4 (default, Aug 13 2019, 15:17:50)
[Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 18.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: None
xarray: 0.14.0
pandas: 0.25.1
numpy: 1.17.2
scipy: 1.3.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: 2.3.2
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.5.2
distributed: 2.5.2
matplotlib: 3.1.1
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.2.3
conda: 4.7.12
pytest: 5.2.1
IPython: 7.8.0
sphinx: 2.2.0
Metadata
Metadata
Assignees
Labels
No labels