Implement tensordot for xarray with dask support #723

deanpospisil · 2016-01-26T00:57:23Z

I've started using X-ray to store responses from convolutional neural nets over different transformations of images (translation(x,y), rotation (radians), etc). So far its been very intuitive storing and transforming results, unfortunately much of my analysis requires the use of tensor dot products, where I can choose arbitrary dimensions over which to make a projection, or perform a correlation. While dask implements np.tensordot, xray does not.

One can implement a dot product manually by multiplying data arrays then summing over dimensions.

fitm = (da_response*da_model).sum('imageID').sum('x_translation').max('models')

but this ends up being very slow, as I imagine when dot products are implemented by numpy or dask, there is a fair amount of optimization going on.

I am relatively new to GitHub, and this project, would you have any advice on the best way to contribute this functionality? tensordot where in you can put in a list of dimension names in two dataarray over which to compute a sum product, using dasks implementation.

The text was updated successfully, but these errors were encountered:

shoyer · 2016-01-26T05:27:36Z

Yes, this would be a nice addition!

I spent a little bit of a time futzing around with this to see if there is an elegant way to plug this into our existing dispatching system. The short of it is that the answer appears to be no -- we don't have any elegant equivalent to dask.array's generic atop method.

So, for now I would simply write a function specialized to DataArray objects. Something like the following (barely tested) is a starting point:

from xarray import align, DataArray

# note: using private imports (e.g., from xarray.core) is definitely discouraged!
# this is not guaranteed to work in future versions of xarray
from xarray.core.ops import _dask_or_eager_func

def tensordot(a, b, dims):
    if not (isinstance(a, DataArray) and isinstance(b, DataArray)):
        raise ValueError

    a, b = align(a, b, join='inner', copy=False)

    axes = (a.get_axis_num(dims), b.get_axis_num(dims))
    f = _dask_or_eager_func('tensordot', n_array_args=2)
    new_data = f(a.data, b.data, axes=axes)

    if isinstance(dims, basestring):
        dims = [dims]

    new_coords = a.coords.merge(b.coords).drop(dims)

    new_dims = ([d for d in a.dims if d not in dims] +
                [d for d in b.dims if d not in dims])

    return DataArray(new_data, new_coords, new_dims)

This would be worth cleaning up so we could add it to the codebase (mostly documentation & tests).

shoyer · 2016-01-26T05:34:40Z

@MaximilianR I do like einsum, but I'm not sure the API would be a good fit for xarray (we already have dimension names), and it also does not exist yet for dask (dask/dask#732).

That said, I suppose you could make an xarray version of einsum with syntax that looks more like tensordot with *args, e.g., einsum(a, b, c, dims=('x', 'y')).

max-sixty · 2016-01-26T05:37:47Z

@shoyer - I thought your answer dominated mine, so I left yours as the only response.
But yup, that form of einsum would be pretty nice...

deanpospisil · 2016-01-26T18:53:02Z

Looks like it can perform tensor dot for dask and straight xarrays! But apparently dask has not implemented tensordot with multiple axes arguments, and it also does not work performing a tensor dot between a dask xarray and an xarray. Neither of these cases worries me too much, hopefully they don't worry you.

from xarray import align, DataArray

#note: using private imports (e.g., from xarray.core) is definitely discouraged!
#this is not guaranteed to work in future versions of xarray

from xarray.core.ops import _dask_or_eager_func

def tensordot(a, b, dims):
    if not (isinstance(a, DataArray) and isinstance(b, DataArray)):
        raise ValueError

    a, b = align(a, b, join='inner', copy=False)

    axes = (a.get_axis_num(dims), b.get_axis_num(dims))
    f = _dask_or_eager_func('tensordot', n_array_args=2)
    new_data = f(a.data, b.data, axes=axes)

    if isinstance(dims, str):
        dims = [dims]

    new_coords = a.coords.merge(b.coords).drop(dims)

    #drop the dims you are performing the sum product over
    new_dims = ([d for d in a.dims if d not in dims] +
                [d for d in b.dims if d not in dims])

    return DataArray(new_data, new_coords, new_dims)

import xarray as xr
import numpy as np

x_trans = np.linspace(-3,3,6)
y_trans = np.linspace(-3,3,5)
imgID = range(4)
da = xr.DataArray( np.ones((6,5,4)), 
coords = [ x_trans, y_trans, imgID ], 
dims = ['x_trans', 'y_trans', 'imgID'] )

models = range(20)
dm = xr.DataArray( np.ones(( 20 , 5, 4 )), 
coords = [ models, y_trans, imgID], 
dims = [ 'models', 'y_trans', 'imgID' ] )

#xarray tensordot
proj_a = tensordot(da, dm, 'imgID')

#dask xarray tensor dot
da = da.chunk()
dm = dm.chunk()
proj_b = tensordot(da, dm, 'imgID')

#errors
#multiple dims
proj_c = tensordot(da, dm, ['imgID', 'y_trans'])

#mixed types
da = da.chunk()
dm = dm.load()
proj_d = tensordot(da, dm, 'imgID')

deanpospisil · 2016-01-27T03:58:31Z

I wasn't sure where the best place to put the def would be. Currently I have been running it from the xarray class:
t = da1.tensordot( da2, 'shapes' )
Let me know if that seems alright, then I'll write some simple tests in test_dataarray for tensor dot.
Maybe make my first pull request!

deanpospisil · 2016-01-27T03:59:46Z

Also that einsum does seem pretty ideal. I'll see if I can get it running in dask, so we can port it over here.

shoyer · 2016-01-27T04:24:42Z

I'm split on whether a function or method makes more sense (a.tensordot(b, dim='x') vs xr.tensordot(a, b, dim='x')). I would be OK with either, so yes, please do go ahead!

shoyer · 2016-03-05T00:47:49Z

Fixed by #731.

shoyer changed the title ~~Implementing dask tensordot~~ Implement tensordot for xarray with dask support Jan 26, 2016

This was referenced Jan 27, 2016

Add tensordot to dataarray class also add its test to test_dataarray #731

Merged

Implement vnorm for xarray with dask support #735

Closed

shoyer closed this as completed Mar 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement tensordot for xarray with dask support #723

Implement tensordot for xarray with dask support #723

deanpospisil commented Jan 26, 2016

shoyer commented Jan 26, 2016

shoyer commented Jan 26, 2016

max-sixty commented Jan 26, 2016

deanpospisil commented Jan 26, 2016

deanpospisil commented Jan 27, 2016

deanpospisil commented Jan 27, 2016

shoyer commented Jan 27, 2016

shoyer commented Mar 5, 2016

Implement tensordot for xarray with dask support #723

Implement tensordot for xarray with dask support #723

Comments

deanpospisil commented Jan 26, 2016

shoyer commented Jan 26, 2016

shoyer commented Jan 26, 2016

max-sixty commented Jan 26, 2016

deanpospisil commented Jan 26, 2016

deanpospisil commented Jan 27, 2016

deanpospisil commented Jan 27, 2016

shoyer commented Jan 27, 2016

shoyer commented Mar 5, 2016