numbagg & flox #8233

max-sixty · 2023-09-26T17:33:32Z

What is your issue?

I've been doing some work recently on our old friend numbagg, improving the ewm routines & adding some more.

I'm keen to get numbagg back in shape, doing the things that it does best, and trimming anything it doesn't. I notice that it has grouped calcs. Am I correct to think that flox does this better? I haven't been up with the latest. flox looks like it's particularly focused on dask arrays, whereas numpy_groupies, one of the inspirations for this, was applicable to numpy arrays too.

At least from the xarray perspective, are we OK to deprecate these numbagg functions, and direct folks to flox?

dcherian · 2023-09-26T18:00:18Z

Supporting numbagg as a backend for flox would be a major win (xarray-contrib/flox#72) particularly because it supports the "group nD array by 1D array" use-case.

The pure numpy approach in numpy_groupies requires "group 1D array by other 1D array"; this transformation is a decent chunk of overhead (30-40%).

flox does implement a ufunc.reduceat solution (originally by Stephan), but that requires an argsort of the array, which again is not cheap.

I stalled out on numbagg because of issues around fill_value specification. It requires some changes on the numbagg end, but nothing too hard

max-sixty · 2023-09-26T18:54:42Z

particularly because it supports the "group nD array by 1D array" use-case.

The pure numpy approach in numpy_groupies requires "group 1D array by other 1D array"; this transformation is a decent chunk of overhead (30-40%).

Ah interesting. I'm still at 80% of understanding — but to confirm "Form 3" here doesn't do what we need? i.e. the other dimensions are flattened into columns of a which represents the "nD array", and the the group_idx is the 1D array?

The pure numpy approach

And if flox supports the numba (i.e. not yet numbagg) implementation in numpy_groupies, that doesn't help?

(not to be too down on numbagg — I had high hopes for it, and still think it's great, just don't want it to be the 3rd best implementation if someone else does it better...)

dcherian · 2023-09-26T19:36:15Z

but to confirm "Form 3" here doesn't do what we need?

It does do what we need, but numbagg would do it substantially faster (I don't remember numbers off the top of my head, but at least 2X faster?).

And if flox supports the numba (i.e. not yet numbagg) implementation in numpy_groupies, that doesn't help?

I think of groupby with numpy_groupies as 3 steps, a useful rule is these steps take approximately equal amounts of time, though for small problems step 3 can be small-ish (20% of total time).

Factorize to integer codes -> using pandas
transform problem to purely 1D problem
actually run the reduction.

numpy_groupies uses numba for (3) but a lot of the overhead is in (2), which uses np.ravel_multi_index.

(side note: One of the overheads with flox is that for mean we run step (2) twice, once for sum and once for count. This could be fixed.).

I had high hopes for it, and still think it's great, just don't want it to be the 3rd best implementation if someone else does it better

I think it is the absolute best for typical Xarray user, just needs some effort around integration.

max-sixty · 2023-09-26T20:02:30Z

Factorize to integer codes -> using pandas

transform problem to purely 1D problem

actually run the reduction.

Yes very much agree, nicely conceptualized.

numpy_groupies uses numba for (3) but a lot of the overhead is in (2), which uses np.ravel_multi_index.

Ah interesting, I need to think through this more. I hadn't actually thought (2) was expensive; that it could be done by indexing into an array of N dimensions with a set of 1D indices, like A[indices], i.e. A[[1,4,7,8]]. And then (3) called an aggregation func on those values. So the broadcasting to the other dimensions would "just work".

But maybe the data access there isn't great — by aggregating over each group in turn, it's jumping around; and much less efficient relative running through the data linearly, assigning to relevant group.

Or maybe I'm completely misunderstanding, thanks for you patience if so...

dcherian · 2023-09-26T20:13:16Z

indexing into an array of N dimensions with a set of 1D indices, like A[indices], i.e. A[[1,4,7,8]]`. And then (3) called an aggregation func on those values.

This is xarray's "for loop over groups" approach, it can get quite slow. If you have some clever ideas for vectorizing this, that would be amazing!

It's also not a simple broadcasting because commonly you're only reducing along one of n dimensions. I did improve numpy_groupies for this (ml31415/numpy-groupies#46) but I remember thinking that numbagg would still be a major win.

max-sixty · 2023-09-26T21:10:15Z

This is xarray's "for loop over groups" approach, it can get quite slow. If you have some clever ideas for vectorizing this, that would be amazing!

Yes I imagine it's slow for a few reasons:

The loop is in python
It's not parallelized
It's bad for data locality (it's going by group order, rather than by data order)

I had thought that numpy_groupies would solve the first two, but sounds like that's not doing quite enough.

I'll try and come back to this and have a look (though to set expectations, I've probably gone too deep into numbagg for the moment, and so will try and spend a bit less time on it, and predominantly kicking this issue off to see if we could reduce its scope!)

mathause · 2023-09-27T11:47:42Z

@dcherian is there an issue describing what is missing from numbagg to use it in flox?

dcherian · 2023-09-28T21:56:13Z

Here's a few: numbagg/numbagg#120, numbagg/numbagg#121, numbagg/numbagg#122

and an updated test run from flox: https://github.com/xarray-contrib/flox/actions/runs/6344423208/job/17234488009?pr=72

EDIT: Well apparently I fixed the fill_value issues by just handling it on my own. So really we just need numbagg to support dtype and address a few edge cases.

max-sixty · 2023-10-07T06:59:08Z

This may well be complete, at least for the moment, WDYT @dcherian ?

dcherian · 2023-10-09T15:40:29Z

Yes many thanks Max. It'll be possible to use xarray_obj.groupby(...).mean(..., engine="numbagg") after the next flox release.

In my tests it's approx 2X improvement (because pd.factorize adds quite a bit of overhead to the reduction)

dcherian · 2023-10-15T04:53:15Z

Just released flox v0.8.0. Here are some benchmarks for grouping a 2D array by a 1D array. Great work, Max!

func	engine
nansum	flox	70.3±0.2ms
	numpy	122±0.2ms
	numbagg	18.4±0.04ms
nanmean	flox	144±0.4ms
	numpy	196±0.5ms
	numbagg	23.7±0.2ms
nanmax	flox	93.4±0.8ms
	numpy	953±2ms
	numbagg	20.3±0.2ms
count	flox	59.8±1ms
	numpy	114±0.2ms
	numbagg	29.3±0.1ms

max-sixty · 2023-10-15T06:47:32Z

That's awesome!

Thanks for doing the hard work — my part was relatively easy...

And I'm excited that numbagg has the resurgence it deserves after some hibernation — thanks to @shoyer for the hard work there a few years ago...

shoyer · 2023-10-15T07:48:56Z

Wow, nice work both of you!

max-sixty added the upstream issue label Sep 26, 2023

dcherian added the topic-groupby label Sep 26, 2023

dcherian mentioned this issue Oct 7, 2023

Add engine="numbagg" xarray-contrib/flox#72

Merged

4 tasks

dcherian closed this as completed Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

numbagg & flox #8233

numbagg & flox #8233

max-sixty commented Sep 26, 2023

dcherian commented Sep 26, 2023

Uh oh!

max-sixty commented Sep 26, 2023

Uh oh!

dcherian commented Sep 26, 2023

Uh oh!

max-sixty commented Sep 26, 2023 •

edited

Loading

Uh oh!

dcherian commented Sep 26, 2023

Uh oh!

max-sixty commented Sep 26, 2023

Uh oh!

mathause commented Sep 27, 2023

Uh oh!

dcherian commented Sep 28, 2023 •

edited

Loading

Uh oh!

max-sixty commented Oct 7, 2023

Uh oh!

dcherian commented Oct 9, 2023 •

edited

Loading

Uh oh!

dcherian commented Oct 15, 2023

Uh oh!

max-sixty commented Oct 15, 2023

Uh oh!

shoyer commented Oct 15, 2023

Uh oh!

Uh oh!

numbagg & flox #8233

numbagg & flox #8233

Comments

max-sixty commented Sep 26, 2023

What is your issue?

dcherian commented Sep 26, 2023

Uh oh!

max-sixty commented Sep 26, 2023

Uh oh!

dcherian commented Sep 26, 2023

Uh oh!

max-sixty commented Sep 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcherian commented Sep 26, 2023

Uh oh!

max-sixty commented Sep 26, 2023

Uh oh!

mathause commented Sep 27, 2023

Uh oh!

dcherian commented Sep 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

max-sixty commented Oct 7, 2023

Uh oh!

dcherian commented Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcherian commented Oct 15, 2023

Uh oh!

max-sixty commented Oct 15, 2023

Uh oh!

shoyer commented Oct 15, 2023

Uh oh!

max-sixty commented Sep 26, 2023 •

edited

Loading

dcherian commented Sep 28, 2023 •

edited

Loading

dcherian commented Oct 9, 2023 •

edited

Loading