Skip to content

xr.DataArray.where(drop=True) crashes if the result is False everywhere (h5netcdf engine) #4524

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RondeauG opened this issue Oct 19, 2020 · 4 comments

Comments

@RondeauG
Copy link

RondeauG commented Oct 19, 2020

Description of the issue:

For the bug to occur, it looks like the following requirements need to be met :

  1. Crashes occured with both xarray v0.16.1 and the current master. The bug does not seem to exist with v0.15.1.
  2. The file must be opened using the h5netcdf engine.
  3. The DataArray, before the call to where(), must contain auxiliary coordinates that will get dropped (in the example below, lon_rho, lat_rho and h). Edit: By "auxiliary coordinate", I mean coordinates that depend on a dimension, without being dimensions themselves.
  4. The call to .where() must give False results everywhere.
  5. The argument drop=True must be used.

Minimal Complete Verifiable Example:

import xarray as xr
ds = xr.tutorial.open_dataset('ROMS_example', engine='h5netcdf')
da = ds["zeta"]
print(da.where(da>1000, drop=True))

This is what da looks like before the call to .where()

print(da)
<xarray.DataArray 'zeta' (ocean_time: 2, eta_rho: 191, xi_rho: 371)>
[141722 values with dtype=float32]
Coordinates:
    lon_rho     (eta_rho, xi_rho) float64 ...
    hc          float64 ...
    h           (eta_rho, xi_rho) float64 ...
    lat_rho     (eta_rho, xi_rho) float64 ...
    Vtransform  int32 ...
  * ocean_time  (ocean_time) datetime64[ns] 2001-08-01 2001-08-08
Dimensions without coordinates: eta_rho, xi_rho
Attributes:
    long_name:  free-surface
    units:      meter
    time:       ocean_time
    field:      free-surface, scalar, series

What happened:

This is the result obtained from the example above with xarray v0.16.1, as well as the current master.

Traceback (most recent call last):
  File "/usr/local/pycharm-community-2017.3.4/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<string>", line 1, in <module>
  File "/home/rondeau/python/github/xarray/xarray/core/common.py", line 1268, in where
    return ops.where_method(self, cond, other)
  File "/home/rondeau/python/github/xarray/xarray/core/ops.py", line 193, in where_method
    return apply_ufunc(
  File "/home/rondeau/python/github/xarray/xarray/core/computation.py", line 1126, in apply_ufunc
    return apply_dataarray_vfunc(
  File "/home/rondeau/python/github/xarray/xarray/core/computation.py", line 268, in apply_dataarray_vfunc
    result_coords = build_output_coords(args, signature, exclude_dims)
  File "/home/rondeau/python/github/xarray/xarray/core/computation.py", line 232, in build_output_coords
    merged_vars, unused_indexes = merge_coordinates_without_align(
  File "/home/rondeau/python/github/xarray/xarray/core/merge.py", line 329, in merge_coordinates_without_align
    return merge_collected(filtered, prioritized)
  File "/home/rondeau/python/github/xarray/xarray/core/merge.py", line 229, in merge_collected
    merged_vars[name] = unique_variable(name, variables, compat)
  File "/home/rondeau/python/github/xarray/xarray/core/merge.py", line 120, in unique_variable
    out = out.set_dims(dim_lengths)
  File "/home/rondeau/python/github/xarray/xarray/core/variable.py", line 1442, in set_dims
    expanded_data = self.data
  File "/home/rondeau/python/github/xarray/xarray/core/variable.py", line 361, in data
    return self.values
  File "/home/rondeau/python/github/xarray/xarray/core/variable.py", line 512, in values
    return _as_array_or_item(self._data)
  File "/home/rondeau/python/github/xarray/xarray/core/variable.py", line 274, in _as_array_or_item
    data = np.asarray(data)
  File "/home/rondeau/.conda/envs/pyscen/lib/python3.8/site-packages/numpy/core/_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/home/rondeau/python/github/xarray/xarray/core/indexing.py", line 685, in __array__
    self._ensure_cached()
  File "/home/rondeau/python/github/xarray/xarray/core/indexing.py", line 682, in _ensure_cached
    self.array = NumpyIndexingAdapter(np.asarray(self.array))
  File "/home/rondeau/.conda/envs/pyscen/lib/python3.8/site-packages/numpy/core/_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/home/rondeau/python/github/xarray/xarray/core/indexing.py", line 655, in __array__
    return np.asarray(self.array, dtype=dtype)
  File "/home/rondeau/.conda/envs/pyscen/lib/python3.8/site-packages/numpy/core/_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
  File "/home/rondeau/python/github/xarray/xarray/core/indexing.py", line 560, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "/home/rondeau/python/github/xarray/xarray/backends/h5netcdf_.py", line 28, in __getitem__
    return indexing.explicit_indexing_adapter(
  File "/home/rondeau/python/github/xarray/xarray/core/indexing.py", line 844, in explicit_indexing_adapter
    raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
  File "/home/rondeau/python/github/xarray/xarray/core/indexing.py", line 858, in decompose_indexer
    return _decompose_outer_indexer(indexer, shape, indexing_support)
  File "/home/rondeau/python/github/xarray/xarray/core/indexing.py", line 1021, in _decompose_outer_indexer
    gains = [
  File "/home/rondeau/python/github/xarray/xarray/core/indexing.py", line 1022, in <listcomp>
    (np.max(k) - np.min(k) + 1.0) / len(np.unique(k))
  File "<__array_function__ internals>", line 5, in amax
  File "/home/rondeau/.conda/envs/pyscen/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2667, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
  File "/home/rondeau/.conda/envs/pyscen/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 90, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity

What you expected to happen:

This is the result obtained for the example above with xarray v0.15.1

<xarray.DataArray 'zeta' (ocean_time: 0, eta_rho: 0, xi_rho: 0)>
array([], shape=(0, 0, 0), dtype=float32)
Coordinates:
    lon_rho     (eta_rho, xi_rho) float64 
    hc          float64 20.0
    h           (eta_rho, xi_rho) float64 
    lat_rho     (eta_rho, xi_rho) float64 
    Vtransform  int32 2
  * ocean_time  (ocean_time) datetime64[ns] 
Dimensions without coordinates: eta_rho, xi_rho
Attributes:
    long_name:  free-surface
    units:      meter
    time:       ocean_time
    field:      free-surface, scalar, series

Anything else we need to know?:

This following example works as intended. This seems to occur because isel() eliminates auxiliary coordinates.

import xarray as xr
ds = xr.tutorial.open_dataset('ROMS_example', engine='h5netcdf')
da = ds["zeta"].isel(eta_rho=0, xi_rho=0)
print(da.where(da>1000, drop=True))
<xarray.DataArray 'zeta' (ocean_time: 0)>
array([], dtype=float32)
Coordinates:
    lon_rho     float64 -93.6
    hc          float64 20.0
    h           float64 612.1
    lat_rho     float64 27.45
    Vtransform  int32 2
  * ocean_time  (ocean_time) datetime64[ns] 
Attributes:
    long_name:  free-surface
    units:      meter
    time:       ocean_time
    field:      free-surface, scalar, series

Environment:

Output of xr.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 08:20:52) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.2.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: en_CA.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.7.4
xarray: 0.15.2.dev58+g0e43ba9.d20200521
pandas: 1.1.3
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.8.0
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.30.0
distributed: 2.30.0
matplotlib: 3.3.1
cartopy: 0.18.0
seaborn: 0.11.0
numbagg: None
pint: 0.11
setuptools: 50.3.0.post20201006
pip: 20.2.3
conda: None
pytest: None
IPython: None
sphinx: None
@mathause
Copy link
Collaborator

This also fails for engine='netcdf4'

ds = xr.tutorial.open_dataset('ROMS_example', engine='netcdf4')
da = ds["zeta"].isel(eta_rho=0, xi_rho=0)
print(da.where(da>1000, drop=True))

with a different error: ValueError: Cannot apply_along_axis when any iteration dimensions are 0

Traceback

ValueError                                Traceback (most recent call last)
<ipython-input-14-9df3a1c8f9e1> in <module>
----> 1 da.where(da>1000, drop=True)

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/common.py in where(self, cond, other, drop)
   1266             cond = cond.isel(**indexers)
   1267 
-> 1268         return ops.where_method(self, cond, other)
   1269 
   1270     def close(self: Any) -> None:

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/ops.py in where_method(self, cond, other)
    191     # alignment for three arguments is complicated, so don't support it yet
    192     join = "inner" if other is dtypes.NA else "exact"
--> 193     return apply_ufunc(
    194         duck_array_ops.where_method,
    195         self,

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, *args)
   1102     # feed DataArray apply_variable_ufunc through apply_dataarray_vfunc
   1103     elif any(isinstance(a, DataArray) for a in args):
-> 1104         return apply_dataarray_vfunc(
   1105             variables_vfunc,
   1106             *args,

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/computation.py in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args)
    257     else:
    258         name = result_name(args)
--> 259     result_coords = build_output_coords(args, signature, exclude_dims)
    260 
    261     data_vars = [getattr(a, "variable", a) for a in args]

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/computation.py in build_output_coords(args, signature, exclude_dims)
    222     else:
    223         # TODO: save these merged indexes, instead of re-computing them later
--> 224         merged_vars, unused_indexes = merge_coordinates_without_align(
    225             coords_list, exclude_dims=exclude_dims
    226         )

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/merge.py in merge_coordinates_without_align(objects, prioritized, exclude_dims)
    327         filtered = collected
    328 
--> 329     return merge_collected(filtered, prioritized)
    330 
    331 

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/merge.py in merge_collected(grouped, prioritized, compat)
    227                 variables = [variable for variable, _ in elements_list]
    228                 try:
--> 229                     merged_vars[name] = unique_variable(name, variables, compat)
    230                 except MergeError:
    231                     if compat != "minimal":

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/merge.py in unique_variable(name, variables, compat, equals)
    118     if compat == "broadcast_equals":
    119         dim_lengths = broadcast_dimension_size(variables)
--> 120         out = out.set_dims(dim_lengths)
    121 
    122     if compat == "no_conflicts":

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/variable.py in set_dims(self, dims, shape)
   1438             # don't use broadcast_to unless necessary so the result remains
   1439             # writeable if possible
-> 1440             expanded_data = self.data
   1441         elif shape is not None:
   1442             dims_map = dict(zip(dims, shape))

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/variable.py in data(self)
    357             return self._data
    358         else:
--> 359             return self.values
    360 
    361     @data.setter

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/variable.py in values(self)
    508     def values(self):
    509         """The variable's data as a numpy.ndarray"""
--> 510         return _as_array_or_item(self._data)
    511 
    512     @values.setter

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/variable.py in _as_array_or_item(data)
    270         data = data.get()
    271     else:
--> 272         data = np.asarray(data)
    273     if data.ndim == 0:
    274         if data.dtype.kind == "M":

~/conda/envs/xarray_dev/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    683 
    684     def __array__(self, dtype=None):
--> 685         self._ensure_cached()
    686         return np.asarray(self.array, dtype=dtype)
    687 

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/indexing.py in _ensure_cached(self)
    680     def _ensure_cached(self):
    681         if not isinstance(self.array, NumpyIndexingAdapter):
--> 682             self.array = NumpyIndexingAdapter(np.asarray(self.array))
    683 
    684     def __array__(self, dtype=None):

~/conda/envs/xarray_dev/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    653 
    654     def __array__(self, dtype=None):
--> 655         return np.asarray(self.array, dtype=dtype)
    656 
    657     def __getitem__(self, key):

~/conda/envs/xarray_dev/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     81 
     82     """
---> 83     return array(a, dtype, copy=False, order=order)
     84 
     85 

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    558     def __array__(self, dtype=None):
    559         array = as_indexable(self.array)
--> 560         return np.asarray(array[self.key], dtype=None)
    561 
    562     def transpose(self, order):

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/backends/netCDF4_.py in __getitem__(self, key)
     70 
     71     def __getitem__(self, key):
---> 72         return indexing.explicit_indexing_adapter(
     73             key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
     74         )

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/core/indexing.py in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
    843     """
    844     raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
--> 845     result = raw_indexing_method(raw_key.tuple)
    846     if numpy_indices.tuple:
    847         # index the loaded np.ndarray

~/conda/envs/xarray_dev/lib/python3.8/site-packages/xarray/backends/netCDF4_.py in _getitem(self, key)
     83             with self.datastore.lock:
     84                 original_array = self.get_array(needs_lock=False)
---> 85                 array = getitem(original_array, key)
     86         except IndexError:
     87             # Catch IndexError in netCDF4 and return a more informative

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__getitem__()

~/conda/envs/xarray_dev/lib/python3.8/site-packages/netCDF4/utils.py in _StartCountStride(elem, shape, dimensions, grp, datashape, put, use_get_vars)
    435         #    ITERABLE    #
    436         elif np.iterable(e) and np.array(e).dtype.kind in 'i':  # Sequence of integers
--> 437             start[...,i] = np.apply_along_axis(lambda x: e*x, i, np.ones(sdim[:-1]))
    438             indices[...,i] = np.apply_along_axis(lambda x: np.arange(sdim[i])*x, i, np.ones(sdim[:-1], int))
    439 

<__array_function__ internals> in apply_along_axis(*args, **kwargs)

~/conda/envs/xarray_dev/lib/python3.8/site-packages/numpy/lib/shape_base.py in apply_along_axis(func1d, axis, arr, *args, **kwargs)
    374         ind0 = next(inds)
    375     except StopIteration as e:
--> 376         raise ValueError(
    377             'Cannot apply_along_axis when any iteration dimensions are 0'
    378         ) from None

ValueError: Cannot apply_along_axis when any iteration dimensions are 0

It does not fail when you load the data beforehand:

import xarray as xr
ds = xr.tutorial.open_dataset('ROMS_example', engine='h5netcdf')
da = ds["zeta"]
da.load()
print(da.where(da>1000, drop=True))

@mathause
Copy link
Collaborator

mathause commented Oct 19, 2020

git bisect tells me #4379 is the offending PR. So this likely has the same underlying cause as #4449 & there is a draft for a fix (#4453) which may also fix this issue

@mathause
Copy link
Collaborator

mathause commented Jan 3, 2021

@alexamici could this be a similar issue as #4737? (we decided against #4453)

@alexamici
Copy link
Collaborator

@mathause I think you are right, #4733 (the issue that #4737 aimed to fix) is probably not related to cfgrib, I didn't notice that no backend code was mentioned in the traceback.

I was fooled by the fact that we had a know bug in cfgrib ecmwf/cfgrib#157 and the error message reported in #4733 was the same.

So the PR #4737 is needed anyhow, but it doesn't really solve the issue it refers to :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants