Lazy arrays for asymptotically better performance #326

hameerabbasi · 2020-02-17T10:11:35Z

I've already brought this up in another thread (dask/dask#5879), but I was planning to make sparse collections within this library lazy for asymptotically better performance in certain situations. See the following research papers for details:

And the following talks:

These research papers define a method to generate efficient kernels for a broad range of storage formats. They can do things composed of element-wise operations (with broadcasting) and reductions, but they can't do things like (for example) eigendecompositions (which we intend to do with SciPy wrappers for LAPACK, et. al.).

With this in mind, would it make sense to make sparse collections lazy, with the caveat of an API break? These would have an API similar to Dask, having to do arr.compute() for the final result. As discussed in dask/dask#5879, it would also follow the protocols for dask custom collections.

If we manage to do this right, adding GPU support shouldn't be difficult either. But the question arises, is it worth the break an API compatibility to do this?

The text was updated successfully, but these errors were encountered:

hameerabbasi · 2020-02-17T10:13:34Z

Here's my proposal: We declare 0.x a backwards-compatible API, and consider 1.x to be based on TACO. We will emit a FutureWarning in 0.* about the API change.

mrocklin · 2020-02-17T15:39:26Z

I think that there is value in having a fully immediate library for multi-dimensional spare arrays. Not everyone is going to want laziness, even given the performance that it brings. I would like to suggest that some version of this library remains non-lazy and publicly targetable without a version modifier. Perhaps it would make sense to have a different sparse_lazy library that depended strongly on this one?

…

On Mon, Feb 17, 2020 at 2:13 AM Hameer Abbasi ***@***.***> wrote: Here's my proposal: We declare 0.x a backwards-compatible API, and consider 1.x to be based on TACO. We will emit a FutureWarning in 0.* about the API change. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#326?email_source=notifications&email_token=AACKZTDVMRHZZMQI26TU633RDJPM5A5CNFSM4KWOPEJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL52IYY#issuecomment-586916963>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTG7KTPLZF3QS4TRIULRDJPM5ANCNFSM4KWOPEJQ> .

hameerabbasi · 2020-02-17T18:01:49Z

Perhaps it would make sense to have a
different sparse_lazy library that depended strongly on this one?

I think if we were to do this, it’d be the other way around: immediate operations would just do .compute() at the end. How about I leave the main namespace API as-is move the lazy part to sparse.perf.

mrocklin · 2020-02-17T18:57:04Z

That sounds nicer to me from an integration perspective. Also, to be clear, I'm very excited about seeing what this can do in terms of performance. My original hesitation is mostly around thinking about other libraries depending on pydata/sparse. My guess is that folks will be more hesitant to depend on anything that has any sort of non-traditional behavior.

rgommers · 2020-02-17T20:11:46Z

I think there's also a good amount of evidence that eager is better as a default, see e.g. PyTorch modes and Tensorflow moving to eager by default.

So +1 for eager by default, and advertising lazy mode prominently as a potential speedup method.

hameerabbasi · 2020-02-26T14:20:16Z

So I think the final decision is:

Build the eager method on top of the lazy.
Expose the lazy method as well for potential speedups, as a separate submodule.

dhirschfeld · 2020-02-26T19:52:52Z

Potentially OT, but I'm curious how this relates to uarray - isn't that lazily evaluated? And if so, couldn't you just create a sparse backend?

hameerabbasi · 2020-02-26T19:54:41Z

@dhirschfeld An early version of uarray that was renamed to metadsl was lazy. The current iteration is not lazy.

saulshanabrook · 2020-03-11T15:27:37Z

Build the eager method on top of the lazy.

That makes sense!

hameerabbasi added the discussion label Feb 17, 2020

hameerabbasi mentioned this issue Feb 23, 2020

Single dispatch issparse scipy/scipy#11565

Open

hameerabbasi added help wanted and removed discussion labels Feb 26, 2020

This was referenced Mar 3, 2020

COO.transpose() is not cheap #329

Open

Performance bencmarks comparison with scipy.sparse #331

Closed

hameerabbasi mentioned this issue Jun 3, 2020

PyData/Sparse support scikit-learn/scikit-learn#17364

Open

hameerabbasi added discussion and removed help wanted labels Jul 7, 2020

hameerabbasi closed this as completed May 20, 2021

pydata locked and limited conversation to collaborators May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Lazy arrays for asymptotically better performance #326

Lazy arrays for asymptotically better performance #326

hameerabbasi commented Feb 17, 2020 •

edited

Loading

hameerabbasi commented Feb 17, 2020

mrocklin commented Feb 17, 2020 via email

hameerabbasi commented Feb 17, 2020

mrocklin commented Feb 17, 2020

rgommers commented Feb 17, 2020

hameerabbasi commented Feb 26, 2020

dhirschfeld commented Feb 26, 2020

hameerabbasi commented Feb 26, 2020

saulshanabrook commented Mar 11, 2020

This issue was moved to a discussion.

This issue was moved to a discussion.

Lazy arrays for asymptotically better performance #326

Lazy arrays for asymptotically better performance #326

Comments

hameerabbasi commented Feb 17, 2020 • edited Loading

hameerabbasi commented Feb 17, 2020

mrocklin commented Feb 17, 2020 via email

hameerabbasi commented Feb 17, 2020

mrocklin commented Feb 17, 2020

rgommers commented Feb 17, 2020

hameerabbasi commented Feb 26, 2020

dhirschfeld commented Feb 26, 2020

hameerabbasi commented Feb 26, 2020

saulshanabrook commented Mar 11, 2020

This issue was moved to a discussion.

hameerabbasi commented Feb 17, 2020 •

edited

Loading