You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an HDF5 dataset with a scalar variable called 'name' that is actual a 0-D NumPy array with dtype '|S8'. (Not my choice, this is what I get from someone else...) Occasionally, the loading fails.
MCVE Code Sample
#Set up the fileimporth5pyf=h5py.File("error_demo.h5",mode='w')
f.create_dataset('name',shape=(),dtype="|S8",data=np.array([b'f(Pt,TE)'],dtype='|S8'))
f.close()
#Produce the error -- you may need to adjust the number of times you run the loopimportxarrayasxrforiinrange(10):
xr.load_dataset("error_demo.h5")
Expected Output
<xarray.Dataset>
Dimensions: ()
Data variables:
name <U8 'f(Pt,TE)'
Problem Description
The resulting error message
Traceback (most recent call last):
File "<ipython-input-3-b8e48f28a262>", line 1, in <module>
mcout62 = xr.load_dataset("57062/mcout000011.h5",group=r"part/ions/dE(r,z,D)")
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 261, in load_dataset
return ds.load()
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/dataset.py", line 659, in load
v.load()
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/variable.py", line 375, in load
self._data = np.asarray(self._data)
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/indexing.py", line 677, in __array__
self._ensure_cached()
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/indexing.py", line 674, in _ensure_cached
self.array = NumpyIndexingAdapter(np.asarray(self.array))
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/indexing.py", line 653, in __array__
return np.asarray(self.array, dtype=dtype)
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py", line 85, in asarray
return array(a, dtype, copy=False, order=order)
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/indexing.py", line 557, in __array__
return np.asarray(array[self.key], dtype=None)
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 73, in __getitem__
key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/core/indexing.py", line 837, in explicit_indexing_adapter
result = raw_indexing_method(raw_key.tuple)
File "/Users/lmorton/opt/anaconda3/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 85, in _getitem
array = getitem(original_array, key)
File "netCDF4/_netCDF4.pyx", line 4408, in netCDF4._netCDF4.Variable.__getitem__
File "netCDF4/_netCDF4.pyx", line 5384, in netCDF4._netCDF4.Variable._get
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 9: invalid start byte
@lamorton You can look at this in two different ways.
First, try to use dtype="|S9" on both occasions, since this will apply the zero-termination which is needed for the string to be interpreted correctly by netcdf (IIRC).
<xarray.Dataset>
Dimensions: ()
Data variables:
name |S8 b'f(Pt,TE)'
From your expectations It seems that the zero-padding is the culprit, since the output of the 'h5netcdf' backend doesn't quite fit.
Unfortunately I do not have links at hand with further comments on string handling differences between netcdf/hdf5. There are quite some...
Thanks, I'll close this, since it looks like an issue of bad input. I can't use h5netcdf due to conda env nonsense, but I've worked around it by just dropping the 'name' variable during loading.
I have an HDF5 dataset with a scalar variable called 'name' that is actual a 0-D NumPy array with dtype '|S8'. (Not my choice, this is what I get from someone else...) Occasionally, the loading fails.
MCVE Code Sample
Expected Output
<xarray.Dataset>
Dimensions: ()
Data variables:
name <U8 'f(Pt,TE)'
Problem Description
The resulting error message
Versions
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.6 (default, Jan 8 2020, 13:42:34)
[Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 19.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.7.3
xarray: 0.15.0
pandas: 1.0.1
numpy: 1.18.1
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.11.0
distributed: 2.11.0
matplotlib: 3.1.3
cartopy: None
seaborn: 0.10.0
numbagg: None
setuptools: 46.0.0.post20200309
pip: 20.0.2
conda: 4.8.3
pytest: 5.3.5
IPython: 7.12.0
sphinx: 2.4.0
The text was updated successfully, but these errors were encountered: