Skip to content

Problem with creating coords by dict #1197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SpghttCd opened this issue Jan 10, 2017 · 7 comments
Closed

Problem with creating coords by dict #1197

SpghttCd opened this issue Jan 10, 2017 · 7 comments

Comments

@SpghttCd
Copy link

IMO there's a problem with using dicts as coords definition for DataArrays.
E.g. an array defined by
import xarray as xr
arr = xr.DataArray(np.zeros([1, 2, 3]), {'x': [1], 'y': [1, 2], 'z': [1, 2, 3]})
will lead to an exception when tried to be indexed like
arr[0, 0, 0]

The error message is
ValueError: conflicting sizes for dimension 'z': length 2 on <this-array> and length 3 on 'z'

As far as I can imagine this is because dicts don't preserve order in python.
At least printing the coords shows a different order than used in the definition of arr:
arr.coords
Coordinates:
* x (x) int32 1
* z (z) int32 1 2 3
* y (y) int32 1 2

Using an ordered dict turns out to work:
import collections
od = collections.OrderedDict([('x', [1]), ('y', [1, 2]), ('z', [1, 2, 3])])
arr = xr.DataArray(np.zeros([1, 2, 3]), od)
And so does using a list of tuples:
arr = xr.DataArray(np.zeros([1, 2, 3]), [('x', [1]), ('y', [1, 2]), ('z', [1, 2, 3])])

Both lead to
arr[0, 0, 0]
<xarray.DataArray ()>
array(0.0)
Coordinates:
x int32 1
y int32 1
z int32 1
as expected.

Detected in WinPython-64bit-3.4.4.1Qt5 with xarray version 0.7.0

@shoyer
Copy link
Member

shoyer commented Jan 10, 2017

You shouldn't be able to make a DataArray like this -- you have three dimensions on coordinates, but only one dimension on the array itself.

With xarray 0.8.2, you get an error message with your example:

import xarray as xr
arr = xr.DataArray(np.zeros([1, 2, 3]), {'x': [1], 'y': [1, 2], 'z': [1, 2, 3]})
ValueError: conflicting sizes for dimension 'z': length 1 on the data but length 3 on coordinate 'z'

However, this error message is not quite accurate. You need to explicitly provide dims to get a sensible error:

arr = xr.DataArray(np.zeros([1, 2, 3]), {'x': [1], 'y': [1, 2], 'z': [1, 2, 3]}, dims=['x'])
ValueError: coordinate z has dimensions ('z',), but these are not a subset of the DataArray dimensions ['x']

I still need to test this on the dev version, but for now I will make this as an error reporting bug.

@SpghttCd
Copy link
Author

Hello, thanks for the immediate reply - and to seize the opportunity: thanks for this really great library, which I think will be the enabler for me to - finally - use netCDF4 for my measurement facility like I wished for years now.
However, I have to disagree in your point regarding the not fitting dimensions in the DataArray definition. np.zeros() will for sure return an n-dimensional array when called with a length-n list as argument, which I did with n=3 and the list [1,2,3].
I.e. calling np.zeros([1, 2, 3])
will return array([ [ [ 0., 0., 0.], [ 0., 0., 0.] ] ])

Besides, the working alternatives with tuples or the OrderedDict use the very same dummy array for initializing in the example above.

Further I estimate the 0.8.2-error-message completely consistent: it (silently) acknowledges that there are three dimensions in both the zeros-array and the coords by reporting a problem about not fitting lengths of the third one of each, 'z'.

Though by adding a one-dimensional dims-definition, you turn the following error messages away from the original problem, now indeed introducing an inconsistency in dimensions between dims and data.

@SpghttCd SpghttCd reopened this Jan 11, 2017
@fmaussion
Copy link
Member

fmaussion commented Jan 11, 2017

Note that the recommended to build data arrays is to specify the dimensions names explicitly:

In [11]: xr.DataArray(np.zeros([1, 2, 3]), {'x': [1], 'y': [1, 2], 'z': [1, 2, 3]}, 
                                 dims=['x', 'y', 'z'])
Out[11]: 
<xarray.DataArray (x: 1, y: 2, z: 3)>
array([[[ 0.,  0.,  0.],
        [ 0.,  0.,  0.]]])
Coordinates:
  * y        (y) int64 1 2
  * z        (z) int64 1 2 3
  * x        (x) int64 1

on master, failing to do so results in a FutureWarning (which is the important message, but gets lost in the traceback) and then in a ValueError (which might be confusing):

In [12]: xr.DataArray(np.zeros([1, 2, 3]), {'x': [1], 'y': [1, 2], 'z': [1, 2, 3]})
/home/mowglie/.pyvirtualenvs/py3/bin/ipython:1: FutureWarning: inferring DataArray dimensions from dictionary like ``coords`` has been deprecated. Use an explicit list of ``dims`` instead.
  #!/home/mowglie/.pyvirtualenvs/py3/bin/python3
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-501c20b8b785> in <module>()
----> 1 xr.DataArray(np.zeros([1, 2, 3]), {'x': [1], 'y': [1, 2], 'z': [1, 2, 3]})

/home/mowglie/Documents/git/xarray/xarray/core/dataarray.py in __init__(self, data, coords, dims, name, attrs, encoding, fastpath)
    219 
    220             data = as_compatible_data(data)
--> 221             coords, dims = _infer_coords_and_dims(data.shape, coords, dims)
    222             variable = Variable(dims, data, attrs, encoding, fastpath=True)
    223 

/home/mowglie/Documents/git/xarray/xarray/core/dataarray.py in _infer_coords_and_dims(shape, coords, dims)
     83                 raise ValueError('conflicting sizes for dimension %r: '
     84                                  'length %s on the data but length %s on '
---> 85                                  'coordinate %r' % (d, sizes[d], s, k))
     86 
     87     assert_unique_multiindex_level_names(new_coords)

ValueError: conflicting sizes for dimension 'y': length 1 on the data but length 2 on coordinate 'y'

@fmaussion
Copy link
Member

@shoyer maybe changing the warning into an error would be the way to go now?

@shoyer
Copy link
Member

shoyer commented Jan 11, 2017

@SpghttCd Yes, I was confused. I missed that you were writing np.zeros, so reply doesn't make much sense!

On master (and in the next release of xarray), we do indeed issue a warning here as @fmaussion points out. Unfortunately, we did support this behavior in prior releases of xarray (especially with an OrderedDict), so I think we should keep the deprecation warning around for a while before making this behavior an error (probably in v0.10).

@jhamman
Copy link
Member

jhamman commented Aug 30, 2017

o I think we should keep the deprecation warning around for a while before making this behavior an error (probably in v0.10)

@shoyer - Do we still want to deprecate this behavior in v0.10?

@shoyer
Copy link
Member

shoyer commented Aug 30, 2017

Do we still want to deprecate this behavior in v0.10?

Sure, that seems reasonable to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
@shoyer @jhamman @fmaussion @SpghttCd and others