Skip to content

numbagg & flox #8233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
max-sixty opened this issue Sep 26, 2023 · 13 comments
Closed

numbagg & flox #8233

max-sixty opened this issue Sep 26, 2023 · 13 comments

Comments

@max-sixty
Copy link
Collaborator

What is your issue?

I've been doing some work recently on our old friend numbagg, improving the ewm routines & adding some more.

I'm keen to get numbagg back in shape, doing the things that it does best, and trimming anything it doesn't. I notice that it has grouped calcs. Am I correct to think that flox does this better? I haven't been up with the latest. flox looks like it's particularly focused on dask arrays, whereas numpy_groupies, one of the inspirations for this, was applicable to numpy arrays too.

At least from the xarray perspective, are we OK to deprecate these numbagg functions, and direct folks to flox?

@dcherian
Copy link
Contributor

Supporting numbagg as a backend for flox would be a major win (xarray-contrib/flox#72) particularly because it supports the "group nD array by 1D array" use-case.

The pure numpy approach in numpy_groupies requires "group 1D array by other 1D array"; this transformation is a decent chunk of overhead (30-40%).

flox does implement a ufunc.reduceat solution (originally by Stephan), but that requires an argsort of the array, which again is not cheap.

I stalled out on numbagg because of issues around fill_value specification. It requires some changes on the numbagg end, but nothing too hard

@max-sixty
Copy link
Collaborator Author

particularly because it supports the "group nD array by 1D array" use-case.

The pure numpy approach in numpy_groupies requires "group 1D array by other 1D array"; this transformation is a decent chunk of overhead (30-40%).

Ah interesting. I'm still at 80% of understanding — but to confirm "Form 3" here doesn't do what we need? i.e. the other dimensions are flattened into columns of a which represents the "nD array", and the the group_idx is the 1D array?

image

The pure numpy approach

And if flox supports the numba (i.e. not yet numbagg) implementation in numpy_groupies, that doesn't help?

(not to be too down on numbagg — I had high hopes for it, and still think it's great, just don't want it to be the 3rd best implementation if someone else does it better...)

@dcherian
Copy link
Contributor

but to confirm "Form 3" here doesn't do what we need?

It does do what we need, but numbagg would do it substantially faster (I don't remember numbers off the top of my head, but at least 2X faster?).

And if flox supports the numba (i.e. not yet numbagg) implementation in numpy_groupies, that doesn't help?

I think of groupby with numpy_groupies as 3 steps, a useful rule is these steps take approximately equal amounts of time, though for small problems step 3 can be small-ish (20% of total time).

  1. Factorize to integer codes -> using pandas
  2. transform problem to purely 1D problem
  3. actually run the reduction.

numpy_groupies uses numba for (3) but a lot of the overhead is in (2), which uses np.ravel_multi_index.

(side note: One of the overheads with flox is that for mean we run step (2) twice, once for sum and once for count. This could be fixed.).

I had high hopes for it, and still think it's great, just don't want it to be the 3rd best implementation if someone else does it better

I think it is the absolute best for typical Xarray user, just needs some effort around integration.

@max-sixty
Copy link
Collaborator Author

max-sixty commented Sep 26, 2023

  • Factorize to integer codes -> using pandas

  • transform problem to purely 1D problem

  • actually run the reduction.

Yes very much agree, nicely conceptualized.

numpy_groupies uses numba for (3) but a lot of the overhead is in (2), which uses np.ravel_multi_index.

Ah interesting, I need to think through this more. I hadn't actually thought (2) was expensive; that it could be done by indexing into an array of N dimensions with a set of 1D indices, like A[indices], i.e. A[[1,4,7,8]]. And then (3) called an aggregation func on those values. So the broadcasting to the other dimensions would "just work".

But maybe the data access there isn't great — by aggregating over each group in turn, it's jumping around; and much less efficient relative running through the data linearly, assigning to relevant group.

Or maybe I'm completely misunderstanding, thanks for you patience if so...

@dcherian
Copy link
Contributor

indexing into an array of N dimensions with a set of 1D indices, like A[indices], i.e. A[[1,4,7,8]]`. And then (3) called an aggregation func on those values.

This is xarray's "for loop over groups" approach, it can get quite slow. If you have some clever ideas for vectorizing this, that would be amazing!

It's also not a simple broadcasting because commonly you're only reducing along one of n dimensions. I did improve numpy_groupies for this (ml31415/numpy-groupies#46) but I remember thinking that numbagg would still be a major win.

@max-sixty
Copy link
Collaborator Author

This is xarray's "for loop over groups" approach, it can get quite slow. If you have some clever ideas for vectorizing this, that would be amazing!

Yes I imagine it's slow for a few reasons:

  • The loop is in python
  • It's not parallelized
  • It's bad for data locality (it's going by group order, rather than by data order)

I had thought that numpy_groupies would solve the first two, but sounds like that's not doing quite enough.

I'll try and come back to this and have a look (though to set expectations, I've probably gone too deep into numbagg for the moment, and so will try and spend a bit less time on it, and predominantly kicking this issue off to see if we could reduce its scope!)

@mathause
Copy link
Collaborator

@dcherian is there an issue describing what is missing from numbagg to use it in flox?

@dcherian
Copy link
Contributor

dcherian commented Sep 28, 2023

Here's a few: numbagg/numbagg#120, numbagg/numbagg#121, numbagg/numbagg#122

and an updated test run from flox: https://github.com/xarray-contrib/flox/actions/runs/6344423208/job/17234488009?pr=72

EDIT: Well apparently I fixed the fill_value issues by just handling it on my own. So really we just need numbagg to support dtype and address a few edge cases.

@max-sixty
Copy link
Collaborator Author

This may well be complete, at least for the moment, WDYT @dcherian ?

@dcherian
Copy link
Contributor

dcherian commented Oct 9, 2023

Yes many thanks Max. It'll be possible to use xarray_obj.groupby(...).mean(..., engine="numbagg") after the next flox release.

In my tests it's approx 2X improvement (because pd.factorize adds quite a bit of overhead to the reduction)

@dcherian dcherian closed this as completed Oct 9, 2023
@dcherian
Copy link
Contributor

Just released flox v0.8.0. Here are some benchmarks for grouping a 2D array by a 1D array. Great work, Max!

func engine
nansum flox 70.3±0.2ms
numpy 122±0.2ms
numbagg 18.4±0.04ms
nanmean flox 144±0.4ms
numpy 196±0.5ms
numbagg 23.7±0.2ms
nanmax flox 93.4±0.8ms
numpy 953±2ms
numbagg 20.3±0.2ms
count flox 59.8±1ms
numpy 114±0.2ms
numbagg 29.3±0.1ms

@max-sixty
Copy link
Collaborator Author

That's awesome!

Thanks for doing the hard work — my part was relatively easy...

And I'm excited that numbagg has the resurgence it deserves after some hibernation — thanks to @shoyer for the hard work there a few years ago...

@shoyer
Copy link
Member

shoyer commented Oct 15, 2023

Wow, nice work both of you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants