Skip to content

Attributes of Dataset coordinates are dropped/replaced when adding a DataArray #2245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kbg opened this issue Jun 22, 2018 · 4 comments
Open
Labels

Comments

@kbg
Copy link

kbg commented Jun 22, 2018

Problem description

Attributes of Dataset coordinates are dropped or replaced when adding a DataArray with dimensions or coordinates that already exist in the Dataset. In addition the order of the Dataset's coordinates can change by adding a DataArray.

Expected Behaviour

Attributes of Dataset coordinates should not be altered by adding a DataArray to the Dataset, and the order of existing coordinates should be preserved.

More details and code examples

The following code shows the behaviour by adding new data variables to a Dataset using a tuple, a DataArray (dimension without coordinates), and a Variable.

import numpy as np
import xarray as xr

ds = xr.Dataset(
    coords={
        'x': ('x', np.arange(10, 20), {'meta': 'foo'}),
        'y': ('y', np.arange(20, 30), {'meta': 'bar'}),
        'z': ('z', np.arange(30, 40), {'meta': 'baz'})})

print(ds, end='\n\n')
ds.info()

print('\n\n====\n')

ds['a'] = 'x', np.arange(10)
ds['b'] = xr.DataArray(np.arange(10), dims='y')
ds['c'] = xr.Variable('z', np.arange(10))

print(ds, end='\n\n')
ds.info()
Output
<xarray.Dataset>
Dimensions:  (x: 10, y: 10, z: 10)
Coordinates:
  * x        (x) int64 10 11 12 13 14 15 16 17 18 19
  * y        (y) int64 20 21 22 23 24 25 26 27 28 29
  * z        (z) int64 30 31 32 33 34 35 36 37 38 39
Data variables:
    *empty*

xarray.Dataset {
dimensions:
        x = 10 ;
        y = 10 ;
        z = 10 ;

variables:
        int64 x(x) ;
                x:meta = foo ;
        int64 y(y) ;
                y:meta = bar ;
        int64 z(z) ;
                z:meta = baz ;

// global attributes:
}

====

<xarray.Dataset>
Dimensions:  (x: 10, y: 10, z: 10)
Coordinates:
  * y        (y) int64 20 21 22 23 24 25 26 27 28 29
  * x        (x) int64 10 11 12 13 14 15 16 17 18 19
  * z        (z) int64 30 31 32 33 34 35 36 37 38 39
Data variables:
    a        (x) int64 0 1 2 3 4 5 6 7 8 9
    b        (y) int64 0 1 2 3 4 5 6 7 8 9
    c        (z) int64 0 1 2 3 4 5 6 7 8 9

xarray.Dataset {
dimensions:
        x = 10 ;
        y = 10 ;
        z = 10 ;

variables:
        int64 y(y) ;
        int64 x(x) ;
                x:meta = foo ;
        int64 z(z) ;
                z:meta = baz ;
        int64 a(x) ;
        int64 b(y) ;
        int64 c(z) ;

// global attributes:

The output shows that the attributes and the order of the Dataset's coordinates are preserved (as expected) when adding data variables using a tuple or a Variable, but when using a DataArray instead the attributes are dropped for the related coordinates, and the ordering of the Dataset's coordinates is changed.

When adding DataArrays with coordinates to the Dataset, the attributes of the affected Dataset coordinates are replaced with the attributes of the DataArray's coordinates:

d = xr.DataArray(
    np.arange(10),
    coords=[('x', np.arange(10, 20), {'breakfast': 'eggs'})])

e = xr.DataArray(
    np.arange(10),
    coords=[('z', np.arange(40, 50), {'breakfast': 'spam'})])

print('d.x =', d.x, end='\n\n')
print('e.z =', e.z, end='\n\n')

ds['d'] = d
ds['e'] = e

print(ds, end='\n\n')
ds.info()
Output
d.x = <xarray.DataArray 'x' (x: 10)>
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
Coordinates:
  * x        (x) int64 10 11 12 13 14 15 16 17 18 19
Attributes:
    breakfast:  eggs

e.z = <xarray.DataArray 'z' (z: 10)>
array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
Coordinates:
  * z        (z) int64 40 41 42 43 44 45 46 47 48 49
Attributes:
    breakfast:  spam

<xarray.Dataset>
Dimensions:  (x: 10, y: 10, z: 10)
Coordinates:
  * z        (z) int64 30 31 32 33 34 35 36 37 38 39
  * y        (y) int64 20 21 22 23 24 25 26 27 28 29
  * x        (x) int64 10 11 12 13 14 15 16 17 18 19
Data variables:
    a        (x) int64 0 1 2 3 4 5 6 7 8 9
    b        (y) int64 0 1 2 3 4 5 6 7 8 9
    c        (z) int64 0 1 2 3 4 5 6 7 8 9
    d        (x) int64 0 1 2 3 4 5 6 7 8 9
    e        (z) float64 nan nan nan nan nan nan nan nan nan nan

xarray.Dataset {
dimensions:
        x = 10 ;
        y = 10 ;
        z = 10 ;

variables:
        int64 z(z) ;
                z:breakfast = spam ;
        int64 y(y) ;
        int64 x(x) ;
                x:breakfast = eggs ;
        int64 a(x) ;
        int64 b(y) ;
        int64 c(z) ;
        int64 d(x) ;
        float64 e(z) ;

// global attributes:

This even happens for the DataArray e in the example above which has a common dimension 'z' with the Dataset ds, but different coordinate values. In this case the data and coordinate values are handled as one would expect: The ds.e array is filled with NaNs (because the coordinate values do not match), and the ds.z coordinate values are not replaced by the DataArray's e.z coordinate values. But the attributes of the Dataset's coordinates (ds.z.attrs) are still replaced by the attributes of the DataArray's coordinates (e.z.attrs).

Output of xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.17.2-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.10.7
pandas: 0.23.0
numpy: 1.14.3
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: None
h5py: 2.7.1
Nio: None
zarr: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.17.5
distributed: 1.21.8
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 39.1.0
pip: 10.0.1
conda: None
pytest: 3.5.1
IPython: 6.4.0
sphinx: 1.7.4
@shoyer
Copy link
Member

shoyer commented Jul 10, 2018

This looks like the same issue as #2276.

I agree that this is probably a bug. This might be related to a recent internal refactor of Dataset.__setitem__ in #2162 (see the changes in xarray/core/merge.py)

@shoyer shoyer added the bug label Jul 10, 2018
@dcherian
Copy link
Contributor

dcherian commented Jul 23, 2018

This is because priority_arg=1 in

return merge_core([dataset, other], priority_arg=1,

So the old co-ordinate (with attrs) is replaced by the new co-ordinate (without attrs).

Example: ds['b'] = xr.DataArray(np.arange(10), dims='y') in the above creates a new dimension y with no attrs that is given priority when merging. This seems like intended behaviour because changing priority_arg to 0 makes a lot of tests fail.

@wtgee
Copy link

wtgee commented Aug 28, 2019

Was there ever a solution here? I'm opening multiple netCDF files via open_mfdataset but the attrs get clobbered since they all have the same keys.

Edit: I was thinking about it a little wrong although I can still see a use case.

@leonfoks
Copy link

leonfoks commented Feb 3, 2022

Encountered this issue this week, is there anything in the works to address this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants