Skip to content

Draft of symbolic arrays #1

Open
@mrocklin

Description

@mrocklin

I wanted to create a symbolic numpy array to help with some scalability issues with Dask. The goal here is to provide a target for high-level array expression optimization.

This is leaning heavily on logic currently in Dask, which seems to be going well. Here is a tiny example:

In [1]: from symbolic_array import Leaf, compute

In [2]: x = Leaf(shape=(10, 10), dtype='float32', name='x')

In [3]: import numpy as np

In [4]: y = np.sin(x)[::2, :5].max(axis=0)

In [5]: y.shape
Out[5]: (5,)

In [6]: y.dtype
Out[6]: dtype('float32')

In [7]: data = np.ones((10, 10), dtype='float32')

In [8]: compute([y], {x: data})
Out[8]: 
(array([0.84147096, 0.84147096, 0.84147096, 0.84147096, 0.84147096],
       dtype=float32),)

In [9]: x
Out[9]: x

In [10]: y
Out[10]: amax(sin(x)[slice(None, None, 2), slice(None, 5, None)], axis=0, keepdims=False)

There are a few things that we would do next if we think this is worth doing:

  1. Build out enough of the API to satisfy a commonly used subset of XArray, or some other use case (though I'm currently being paid to think about XArray)
  2. Align that API to optimizations we would want to make in dask array, particularly atop fusion and slice-reordering
  3. Build those optimizations (non-trivial, but probably doable once we have the expression trees in place)
  4. Figure out how to back an XArray by symbolic arrays (possibly hard without NEP-0018)
  5. Find people interested in this problem that we can cajole into evolving and maintaining it.

cc @jcrist @shoyer @rabernat @saulshanabrook @jhamman @scopatz

I apologize for the lack of docs but the code here is only 150 lines, and I think it should be pretty digestible to this crowd.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions