Unnecessary copy when indexing to obtain a 0d array #2622

danielwe · 2018-12-19T12:13:35Z

Code Sample

>>> import numpy as np
>>> import xarray as xr
>>> da = xr.DataArray(np.arange(3))
>>> da
<xarray.DataArray (dim_0: 3)>
array([0, 1, 2])
Dimensions without coordinates: dim_0
>>> da[0].values.fill(99)
>>> da
<xarray.DataArray (dim_0: 3)>
array([0, 1, 2])
Dimensions without coordinates: dim_0

Problem description

Indexing into xarray objects creates a view of the underlying data if possible. A surprising exception is when all dimensions are indexed out and the resulting object is 0d. Xarray insists on returning a 0d array rather than a scalar, which suggests (at least to me) that this is also a view whenever possible; however, it is always a copy, and modifying it will never affect the original array.

(The example above is a little contrived, since one could always call da[0] = 99. In my actual use case I am indexing into a Dataset in a way that creates views for all variables except the one that happens to collapse to 0d, and thus I'm unable to use the indexed Dataset to modify that variable in the original Dataset.)

The copy happens because, internally, the 0d array is created by retrieving a scalar from the underlying numpy array and then wrapping a new array around it. However, in numpy a 0d view can be created directly by indexing with Ellipsis/..., as follows:

>>> import numpy as np
>>> arr = np.arange(3)
>>> arr[0, ...]
array(0)

Thus, a fix that solves my immediate issues and passes all current tests is to modify the following method:

xarray/xarray/core/indexing.py

Lines 1154 to 1163 in 778ffc4

    
           def _indexing_array_and_key(self, key): 
        
               if isinstance(key, OuterIndexer): 
        
                   array = self.array 
        
                   key = _outer_to_numpy_indexer(key, self.array.shape) 
        
               elif isinstance(key, VectorizedIndexer): 
        
                   array = nputils.NumpyVIndexAdapter(self.array) 
        
                   key = key.tuple 
        
               elif isinstance(key, BasicIndexer): 
        
                   array = self.array 
        
                   key = key.tuple

to always append an ellipsis for basic and outer indexing:

    def _indexing_array_and_key(self, key):
        if isinstance(key, OuterIndexer):
            array = self.array
>           key = _outer_to_numpy_indexer(key, self.array.shape) + (Ellipsis,)
        elif isinstance(key, VectorizedIndexer):
            array = nputils.NumpyVIndexAdapter(self.array)
            key = key.tuple
        elif isinstance(key, BasicIndexer):
            array = self.array
>           key = key.tuple + (Ellipsis,)

I'm not familiar enough with all the indexing variants in xarray to know if this covers all cases of 0d arrays that are currently copies but could be views. If someone wants to share some insight (e.g., some more advanced test cases), I could try and put together a pull request.

Expected Output

>>> da[0].values.fill(99)
>>> da
<xarray.DataArray (dim_0: 3)>
array([99, 1, 2])
Dimensions without coordinates: dim_0

Output of `xr.show_versions()`

/home/daniel/local/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-42-lowlatency
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.11.0
pandas: 0.23.0
numpy: 1.14.3
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: 0.6.2
h5py: 2.7.1
Nio: None
zarr: None
cftime: 1.0.0b1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.2.2
cartopy: None
seaborn: 0.8.1
setuptools: 39.1.0
pip: 10.0.1
conda: 4.5.12
pytest: 3.5.1
IPython: 6.4.0
sphinx: 1.7.4

The text was updated successfully, but these errors were encountered:

fujiisoup · 2018-12-19T13:08:21Z

It sounds a reasonable improvement, as far as I think of.

Is any possible use case in your mind that will be broken if we return a view for 0 dimensional array rather than a copy?

I'm not familiar enough with all the indexing variants in xarray to know if this covers all cases of 0d arrays that are currently copies but could be views.

I think it would be sufficient to change the basic indexing part.

danielwe · 2018-12-19T18:43:27Z

Is any possible use case in your mind that will be broken if we return a view for 0 dimensional array rather than a copy?

Not that I can think of. The chance of accidentally modifying data due to the change is very slim; the change will only affect statements that convey an explicit intent to do an in-place modification, but currently fail to do so. For example, say you have a dataset ds with a variable "foo" that becomes 0d when using a basic index index; then, the following statements will work the same as before:

ds[index]["foo"] = 99 will leave the original ds["foo"] unchanged (it only replaces "foo" in the temporary ds[index]),
ds["foo"][index] = 99 will modify ds["foo"] in-place.

The change affects statements such as the following:

ds[index]["foo"][...] = 99,
ds[index]["foo"].values[...] = 99,
ds[index]["foo"].values.fill(99),

These statements currently modify ds["foo"] in all cases except when ds[index]["foo"] is 0d. After the change they will succeed also in the 0d case.

However, keep in mind that I'm not a seasoned xarray user. Perhaps I'm overlooking some subtleties.

I think it would be sufficient to change the basic indexing part.

You're right, I just realized outer indexing never returns a view after all, and vectorized indexing certainly doesn't. My comment was mainly about the different array wrappers for lazy indexing, copy on write, etc., but upon closer inspection of the code, it looks like modifying the NumpyIndexingAdapter class is sufficient to change the behavior in all the relevant cases.

shoyer · 2018-12-20T17:01:37Z

upon closer inspection of the code, it looks like modifying the NumpyIndexingAdapter class is sufficient to change the behavior in all the relevant cases.

+1 for modifying NumpyIndexingAdapter to fix this. It would absolutely be cleaner to always return NumPy arrays directly from indexing rather than rewrapping scalars. I actually didn't know that this "append ellipsis" trick worked, but I see now that's documented in the detailed notes in numpy's indexing docs.

fujiisoup · 2018-12-20T22:09:49Z

@danielwe, do you mind to send a fix of it?
Let's just leave the other lazy indexing classes later for the future improvement.

danielwe mentioned this issue Dec 21, 2018

Get 0d slices of ndarrays directly from indexing #2625

Merged

3 tasks

shoyer closed this as completed in #2625 Dec 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary copy when indexing to obtain a 0d array #2622

Unnecessary copy when indexing to obtain a 0d array #2622

danielwe commented Dec 19, 2018

INSTALLED VERSIONS

fujiisoup commented Dec 19, 2018

danielwe commented Dec 19, 2018

shoyer commented Dec 20, 2018

fujiisoup commented Dec 20, 2018

Unnecessary copy when indexing to obtain a 0d array #2622

Unnecessary copy when indexing to obtain a 0d array #2622

Comments

danielwe commented Dec 19, 2018

Code Sample

Problem description

Expected Output

Output of xr.show_versions()

INSTALLED VERSIONS

fujiisoup commented Dec 19, 2018

danielwe commented Dec 19, 2018

shoyer commented Dec 20, 2018

fujiisoup commented Dec 20, 2018

Output of `xr.show_versions()`