Attributes of Dataset coordinates are dropped/replaced when adding a DataArray #2245

kbg · 2018-06-22T10:47:53Z

Problem description

Attributes of Dataset coordinates are dropped or replaced when adding a DataArray with dimensions or coordinates that already exist in the Dataset. In addition the order of the Dataset's coordinates can change by adding a DataArray.

Expected Behaviour

Attributes of Dataset coordinates should not be altered by adding a DataArray to the Dataset, and the order of existing coordinates should be preserved.

More details and code examples

The following code shows the behaviour by adding new data variables to a Dataset using a tuple, a DataArray (dimension without coordinates), and a Variable.

import numpy as np
import xarray as xr

ds = xr.Dataset(
    coords={
        'x': ('x', np.arange(10, 20), {'meta': 'foo'}),
        'y': ('y', np.arange(20, 30), {'meta': 'bar'}),
        'z': ('z', np.arange(30, 40), {'meta': 'baz'})})

print(ds, end='\n\n')
ds.info()

print('\n\n====\n')

ds['a'] = 'x', np.arange(10)
ds['b'] = xr.DataArray(np.arange(10), dims='y')
ds['c'] = xr.Variable('z', np.arange(10))

print(ds, end='\n\n')
ds.info()

Output

<xarray.Dataset>
Dimensions:  (x: 10, y: 10, z: 10)
Coordinates:
  * x        (x) int64 10 11 12 13 14 15 16 17 18 19
  * y        (y) int64 20 21 22 23 24 25 26 27 28 29
  * z        (z) int64 30 31 32 33 34 35 36 37 38 39
Data variables:
    *empty*

xarray.Dataset {
dimensions:
        x = 10 ;
        y = 10 ;
        z = 10 ;

variables:
        int64 x(x) ;
                x:meta = foo ;
        int64 y(y) ;
                y:meta = bar ;
        int64 z(z) ;
                z:meta = baz ;

// global attributes:
}

====

<xarray.Dataset>
Dimensions:  (x: 10, y: 10, z: 10)
Coordinates:
  * y        (y) int64 20 21 22 23 24 25 26 27 28 29
  * x        (x) int64 10 11 12 13 14 15 16 17 18 19
  * z        (z) int64 30 31 32 33 34 35 36 37 38 39
Data variables:
    a        (x) int64 0 1 2 3 4 5 6 7 8 9
    b        (y) int64 0 1 2 3 4 5 6 7 8 9
    c        (z) int64 0 1 2 3 4 5 6 7 8 9

xarray.Dataset {
dimensions:
        x = 10 ;
        y = 10 ;
        z = 10 ;

variables:
        int64 y(y) ;
        int64 x(x) ;
                x:meta = foo ;
        int64 z(z) ;
                z:meta = baz ;
        int64 a(x) ;
        int64 b(y) ;
        int64 c(z) ;

// global attributes:

The output shows that the attributes and the order of the Dataset's coordinates are preserved (as expected) when adding data variables using a tuple or a Variable, but when using a DataArray instead the attributes are dropped for the related coordinates, and the ordering of the Dataset's coordinates is changed.

When adding DataArrays with coordinates to the Dataset, the attributes of the affected Dataset coordinates are replaced with the attributes of the DataArray's coordinates:

d = xr.DataArray(
    np.arange(10),
    coords=[('x', np.arange(10, 20), {'breakfast': 'eggs'})])

e = xr.DataArray(
    np.arange(10),
    coords=[('z', np.arange(40, 50), {'breakfast': 'spam'})])

print('d.x =', d.x, end='\n\n')
print('e.z =', e.z, end='\n\n')

ds['d'] = d
ds['e'] = e

print(ds, end='\n\n')
ds.info()

Output

d.x = <xarray.DataArray 'x' (x: 10)>
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
Coordinates:
  * x        (x) int64 10 11 12 13 14 15 16 17 18 19
Attributes:
    breakfast:  eggs

e.z = <xarray.DataArray 'z' (z: 10)>
array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
Coordinates:
  * z        (z) int64 40 41 42 43 44 45 46 47 48 49
Attributes:
    breakfast:  spam

<xarray.Dataset>
Dimensions:  (x: 10, y: 10, z: 10)
Coordinates:
  * z        (z) int64 30 31 32 33 34 35 36 37 38 39
  * y        (y) int64 20 21 22 23 24 25 26 27 28 29
  * x        (x) int64 10 11 12 13 14 15 16 17 18 19
Data variables:
    a        (x) int64 0 1 2 3 4 5 6 7 8 9
    b        (y) int64 0 1 2 3 4 5 6 7 8 9
    c        (z) int64 0 1 2 3 4 5 6 7 8 9
    d        (x) int64 0 1 2 3 4 5 6 7 8 9
    e        (z) float64 nan nan nan nan nan nan nan nan nan nan

xarray.Dataset {
dimensions:
        x = 10 ;
        y = 10 ;
        z = 10 ;

variables:
        int64 z(z) ;
                z:breakfast = spam ;
        int64 y(y) ;
        int64 x(x) ;
                x:breakfast = eggs ;
        int64 a(x) ;
        int64 b(y) ;
        int64 c(z) ;
        int64 d(x) ;
        float64 e(z) ;

// global attributes:

This even happens for the DataArray e in the example above which has a common dimension 'z' with the Dataset ds, but different coordinate values. In this case the data and coordinate values are handled as one would expect: The ds.e array is filled with NaNs (because the coordinate values do not match), and the ds.z coordinate values are not replaced by the DataArray's e.z coordinate values. But the attributes of the Dataset's coordinates (ds.z.attrs) are still replaced by the attributes of the DataArray's coordinates (e.z.attrs).

Output of `xr.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.17.2-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.10.7
pandas: 0.23.0
numpy: 1.14.3
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: None
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 39.1.0
pip: 10.0.1
conda: None
pytest: 3.5.1
IPython: 6.4.0
sphinx: 1.7.4

The text was updated successfully, but these errors were encountered:

shoyer · 2018-07-10T19:11:12Z

This looks like the same issue as #2276.

I agree that this is probably a bug. This might be related to a recent internal refactor of Dataset.__setitem__ in #2162 (see the changes in xarray/core/merge.py)

dcherian · 2018-07-23T07:01:07Z

This is because priority_arg=1 in

xarray/xarray/core/merge.py

Line 579 in b8a342a

return merge_core([dataset, other], priority_arg=1,

So the old co-ordinate (with attrs) is replaced by the new co-ordinate (without attrs).

Example: ds['b'] = xr.DataArray(np.arange(10), dims='y') in the above creates a new dimension y with no attrs that is given priority when merging. This seems like intended behaviour because changing priority_arg to 0 makes a lot of tests fail.

wtgee · 2019-08-28T01:48:14Z

~~Was there ever a solution here? I'm opening multiple netCDF files via open_mfdataset but the attrs get clobbered since they all have the same keys.~~

Edit: I was thinking about it a little wrong although I can still see a use case.

leonfoks · 2022-02-03T16:18:14Z

Encountered this issue this week, is there anything in the works to address this?

shoyer mentioned this issue Jul 10, 2018

xarray: dimension attributes gone after adding variable? #2276

Closed

shoyer added the bug label Jul 10, 2018

sfinkens mentioned this issue Sep 22, 2022

xr.where overrides coordinate attributes with global attributes #7068

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attributes of Dataset coordinates are dropped/replaced when adding a DataArray #2245

Attributes of Dataset coordinates are dropped/replaced when adding a DataArray #2245

kbg commented Jun 22, 2018

shoyer commented Jul 10, 2018

dcherian commented Jul 23, 2018 •

edited

Loading

wtgee commented Aug 28, 2019 •

edited

Loading

leonfoks commented Feb 3, 2022

Attributes of Dataset coordinates are dropped/replaced when adding a DataArray #2245

Attributes of Dataset coordinates are dropped/replaced when adding a DataArray #2245

Comments

kbg commented Jun 22, 2018

Problem description

Expected Behaviour

More details and code examples

Output of xr.show_versions()

shoyer commented Jul 10, 2018

dcherian commented Jul 23, 2018 • edited Loading

wtgee commented Aug 28, 2019 • edited Loading

leonfoks commented Feb 3, 2022

Output of `xr.show_versions()`

dcherian commented Jul 23, 2018 •

edited

Loading

wtgee commented Aug 28, 2019 •

edited

Loading