-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
to_zarr
raises ValueError: Invalid dtype
with mode='a'
(but not with mode='w'
)
#6345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The relevant code is here Lines 1405 to 1406 in d293f50
and here Lines 1280 to 1298 in d293f50
What I don't understand is why different validation is needed for the append scenario than for the the write scenario. @shoyer worked on this in #5252, so maybe he has some ideas. |
I just ran into this today as well. I am trying to add a dimensionless variable to an existing Zarr store, to help with CF-compliance (if exporting to netCDF), and I ran into this issue. The dtype of my variable is '|S1', and the error message is printed below:
|
Thanks for reporting this @kmsampson. My feeling is that it is a bug...which we can hopefully fix pretty easily! |
The data type restriction here seems to date back to the original PR adding support for appending. I turned up this comment that seems to summarize the motivation for this check: I think the original issue was that appending a fixed-width string could be a problem if the fixed-width does not match the width of the existing string dtype stored in Zarr. This obviously doesn't apply in this case, because you are adding an entirely new variable. So I guess the check could be removed in that case. |
It seems like what we really want to do is verify that the datatype of the appended data matches the data type on disk. |
So it looks like changing Lines 1280 to 1301 in d293f50
to def _validate_datatypes_for_zarr_append(store, dataset):
"""DataArray.name and Dataset keys must be a string or None"""
def check_dtype(vname):
store_dtype = store.get_variables()[vname].dtype
dataset_dtype = dataset[vname].dtype
if not store_dtype == dataset_dtype:
raise ValueError(
f"Mismatched dtypes for variable {vname} between Zarr store on disk "
f"and dataset to append. Store has dtype {store_dtype} but dataset to "
f"append has dtype {dataset_dtype}."
)
for vname in dataset.data_vars:
check_dtype(vname) could work? |
What happened?
A dataset in which a data variable has
dtype='|S35'
can be written to zarr without error as followsChanging the value of
mode
from'w'
to'a'
, raisesValueError: Invalid dtype for data variable
:Full Traceback
What did you expect to happen?
I would expect the behavior of
mode='w'
andmode='a'
to be consistent as regards dtypes of data variables.Minimal Complete Verifiable Example
See What Happened? section above
Relevant log output
See What Happened? section above
Anything else we need to know?
No response
Environment
cc @rabernat
The text was updated successfully, but these errors were encountered: