Skip to content

Indexing not properly working with object dtype element ? #2414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davidtrem opened this issue Sep 12, 2018 · 4 comments · Fixed by #2625 or #2415
Closed

Indexing not properly working with object dtype element ? #2414

davidtrem opened this issue Sep 12, 2018 · 4 comments · Fixed by #2625 or #2415
Labels

Comments

@davidtrem
Copy link
Contributor

davidtrem commented Sep 12, 2018

Small "working" demo of the observed issue:

import xarray as xr
import numpy as np
er = xr.DataArray(np.array((np.arange(3), np.arange(6)))) # dtype=object because two different vector size
print(er.data[0]) # Does work
print(er[0]) # Does not work (ValueError)

#I'm a bit puzzled...
@shoyer
Copy link
Member

shoyer commented Sep 12, 2018

For reference, here is the error/traceback with xarray 0.10.8:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-a722ff347d1b> in <module>()
      3 er = xr.DataArray(np.array((np.arange(3), np.arange(6)))) # dtype=object because two different vector size
      4 print(er.data[0]) # Does work
----> 5 print(er[0]) # Does not work (ValueError)

/usr/local/lib/python3.6/dist-packages/xarray/core/dataarray.py in __getitem__(self, key)
    472         else:
    473             # xarray-style array indexing
--> 474             return self.isel(indexers=self._item_key_to_dict(key))
    475 
    476     def __setitem__(self, key, value):

/usr/local/lib/python3.6/dist-packages/xarray/core/dataarray.py in isel(self, indexers, drop, **indexers_kwargs)
    754         """
    755         indexers = either_dict_or_kwargs(indexers, indexers_kwargs, 'isel')
--> 756         ds = self._to_temp_dataset().isel(drop=drop, indexers=indexers)
    757         return self._from_temp_dataset(ds)
    758 

/usr/local/lib/python3.6/dist-packages/xarray/core/dataset.py in isel(self, indexers, drop, **indexers_kwargs)
   1425         for name, var in iteritems(self._variables):
   1426             var_indexers = {k: v for k, v in indexers_list if k in var.dims}
-> 1427             new_var = var.isel(indexers=var_indexers)
   1428             if not (drop and name in var_indexers):
   1429                 variables[name] = new_var

/usr/local/lib/python3.6/dist-packages/xarray/core/variable.py in isel(self, indexers, drop, **indexers_kwargs)
    852             if dim in indexers:
    853                 key[i] = indexers[dim]
--> 854         return self[tuple(key)]
    855 
    856     def squeeze(self, dim=None):

/usr/local/lib/python3.6/dist-packages/xarray/core/variable.py in __getitem__(self, key)
    622         if new_order:
    623             data = np.moveaxis(data, range(len(new_order)), new_order)
--> 624         return self._finalize_indexing_result(dims, data)
    625 
    626     def _finalize_indexing_result(self, dims, data):

/usr/local/lib/python3.6/dist-packages/xarray/core/variable.py in _finalize_indexing_result(self, dims, data)
    628         """
    629         return type(self)(dims, data, self._attrs, self._encoding,
--> 630                           fastpath=True)
    631 
    632     def _getitem_with_mask(self, key, fill_value=dtypes.NA):

/usr/local/lib/python3.6/dist-packages/xarray/core/variable.py in __init__(self, dims, data, attrs, encoding, fastpath)
    261         """
    262         self._data = as_compatible_data(data, fastpath=fastpath)
--> 263         self._dims = self._parse_dimensions(dims)
    264         self._attrs = None
    265         self._encoding = None

/usr/local/lib/python3.6/dist-packages/xarray/core/variable.py in _parse_dimensions(self, dims)
    422             raise ValueError('dimensions %s must have the same length as the '
    423                              'number of data dimensions, ndim=%s'
--> 424                              % (dims, self.ndim))
    425         return dims
    426 

ValueError: dimensions () must have the same length as the number of data dimensions, ndim=1

The bottom line issue is that indexing the 1D object array returns a scalar value:

>>> er.data
array([array([0, 1, 2]), array([0, 1, 2, 3, 4, 5])], dtype=object)
>>> er.data[0]
array([0, 1, 2])

In this case, the scalar value is another 1D array, which in turn triggers the error about inconsistent dimensions.

I agree that this is a bug. We actually currently have some logic to fix another manifestation of this issue:

xarray/xarray/core/indexing.py

Lines 1145 to 1175 in 4de8dbc

def _ensure_ndarray(self, value):
# We always want the result of indexing to be a NumPy array. If it's
# not, then it really should be a 0d array. Doing the coercion here
# instead of inside variable.as_compatible_data makes it less error
# prone.
if not isinstance(value, np.ndarray):
value = utils.to_0d_array(value)
return value
def _indexing_array_and_key(self, key):
if isinstance(key, OuterIndexer):
array = self.array
key = _outer_to_numpy_indexer(key, self.array.shape)
elif isinstance(key, VectorizedIndexer):
array = nputils.NumpyVIndexAdapter(self.array)
key = key.tuple
elif isinstance(key, BasicIndexer):
array = self.array
key = key.tuple
else:
raise TypeError('unexpected key type: {}'.format(type(key)))
return array, key
def transpose(self, order):
return self.array.transpose(order)
def __getitem__(self, key):
array, key = self._indexing_array_and_key(key)
return self._ensure_ndarray(array[key])

This seems to be the wrong check. Instead of checking not isinstance(value, np.ndarray), we should be checking to see if the array has object dtype and the indexing operation would result in a scalar (which I think can only happen if the unwrapped key is a tuple of integers).

Any interest in putting together a pull request? :)

@stale
Copy link

stale bot commented Aug 15, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Aug 15, 2020
@stale stale bot closed this as completed Sep 14, 2020
@dcherian dcherian reopened this Sep 14, 2020
@stale stale bot removed the stale label Sep 14, 2020
@andersy005
Copy link
Member

Unless I am missing something, this appears to have been addressed:

In [1]: import xarray as xr
import numpy as np

In [2]: import numpy as np

In [3]: er = xr.DataArray(np.array((np.arange(3), np.arange(6))))
<ipython-input-3-64075f0108e0>:3: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  er = xr.DataArray(np.array((np.arange(3), np.arange(6))))

In [4]: er
Out[4]: 
<xarray.DataArray (dim_0: 2)>
array([array([0, 1, 2]), array([0, 1, 2, 3, 4, 5])], dtype=object)
Dimensions without coordinates: dim_0
In [6]: er.data[0]
Out[6]: array([0, 1, 2])

In [7]: er[0]
Out[7]: 
<xarray.DataArray ()>
array(array([0, 1, 2]), dtype=object)

In [8]: er[1]
Out[8]: 
<xarray.DataArray ()>
array(array([0, 1, 2, 3, 4, 5]), dtype=object)

In [9]: xr.__version__
Out[9]: '0.17.0'

In [10]: np.__version__
Out[10]: '1.20.1'

@keewis
Copy link
Collaborator

keewis commented Mar 10, 2021

I agree, this has either been fixed a long time ago (before we removed the references to pandas.Panel) or by a dependency (numpy, most likely, but the fix is included in the env generated by bare-minimum) #2625 was the fix.

We still want a test like the one suggested in #2415 to make sure we don't regress, though.

@keewis keewis linked a pull request Mar 10, 2021 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants