Open
Description
I wanted to create a symbolic numpy array to help with some scalability issues with Dask. The goal here is to provide a target for high-level array expression optimization.
This is leaning heavily on logic currently in Dask, which seems to be going well. Here is a tiny example:
In [1]: from symbolic_array import Leaf, compute
In [2]: x = Leaf(shape=(10, 10), dtype='float32', name='x')
In [3]: import numpy as np
In [4]: y = np.sin(x)[::2, :5].max(axis=0)
In [5]: y.shape
Out[5]: (5,)
In [6]: y.dtype
Out[6]: dtype('float32')
In [7]: data = np.ones((10, 10), dtype='float32')
In [8]: compute([y], {x: data})
Out[8]:
(array([0.84147096, 0.84147096, 0.84147096, 0.84147096, 0.84147096],
dtype=float32),)
In [9]: x
Out[9]: x
In [10]: y
Out[10]: amax(sin(x)[slice(None, None, 2), slice(None, 5, None)], axis=0, keepdims=False)
There are a few things that we would do next if we think this is worth doing:
- Build out enough of the API to satisfy a commonly used subset of XArray, or some other use case (though I'm currently being paid to think about XArray)
- Align that API to optimizations we would want to make in dask array, particularly atop fusion and slice-reordering
- Build those optimizations (non-trivial, but probably doable once we have the expression trees in place)
- Figure out how to back an XArray by symbolic arrays (possibly hard without NEP-0018)
- Find people interested in this problem that we can cajole into evolving and maintaining it.
cc @jcrist @shoyer @rabernat @saulshanabrook @jhamman @scopatz
I apologize for the lack of docs but the code here is only 150 lines, and I think it should be pretty digestible to this crowd.
Metadata
Metadata
Assignees
Labels
No labels