Skip to content

Commit 4b72c92

Browse files
committed
More updates
1 parent ace7d92 commit 4b72c92

File tree

5 files changed

+38
-38
lines changed

5 files changed

+38
-38
lines changed

docs/source/api.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Visualization
3030
:toctree: generated/
3131

3232
visualize.draw_mesh
33-
visualize.visualize_groups
33+
visualize.visualize_groups_1d
3434
visualize.visualize_cohorts_2d
3535

3636
Aggregation Objects

docs/source/custom.md

Lines changed: 0 additions & 26 deletions
This file was deleted.

docs/source/engines.md

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,24 @@
1-
# Engines
1+
(engines)=
2+
# Engines & Duck Arrays
3+
4+
`flox` provides multiple options, using the `engine` kwarg, for computing the core GroupBy reduction on numpy or other array types other than dask.
5+
6+
1. `engine="numpy"` wraps `numpy_groupies.aggregate_numpy`. This uses indexing tricks and functions like `np.bincount`, or the ufunc `.at` methods
7+
(.e.g `np.maximum.at`) to provided reasonably performant aggregations.
8+
1. `engine="numba"` wraps `numpy_groupies.aggregate_numba`. This uses `numba` kernels for the core aggregation.
9+
1. `engine="flox"` uses the `ufunc.reduceat` method after first argsorting the array so that all group members occur sequentially. This was copied from
10+
a [gist by Stephan Hoyer](https://gist.github.com/shoyer/f538ac78ae904c936844)
11+
12+
There are some tradeoffs here. For the common case of reducing a nD array by a 1D array of group labels (e.g. `groupby("time.month")`), `engine="flox"` *can* be faster.
13+
The reason is that `numpy_groupies` converts all groupby problems to a 1D problem, this can involve [some overhead](https://github.com/ml31415/numpy-groupies/pull/46).
14+
It is possible to optimize this a bit in `flox` or `numpy_groupies` (though the latter is harder).
15+
The advantage of `engine="numpy"` is that it tends to work for more array types, since it appears to be more common to implement `np.bincount`, and not `np.add.reduceat`.
16+
17+
```{tip}
18+
Other potential engines we could add are [`numbagg`](https://github.com/numbagg/numbagg) ([stalled PR here](https://github.com/xarray-contrib/flox/pull/72)) and [`datashader`](https://github.com/xarray-contrib/flox/issues/142).
19+
Both use numba for high-performance aggregations. Contributions or discussion is very welcome!
20+
```
21+
22+
## Duck Array Support
23+
24+
Aggregating over other array types will work if the array types supports the following methods:

docs/source/implementation.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,13 @@
11
(algorithms)=
22
# Parallel Algorithms
33

4-
`flox` outsources the core GroupBy operation to the vectorized implementations in
5-
[numpy_groupies](https://github.com/ml31415/numpy-groupies).
6-
7-
Running an efficient groupby reduction in parallel is hard, and strongly depends on how the
8-
groups are distributed amongst the blocks of an array.
4+
`flox` outsources the core GroupBy operation to the vectorized implementations controlled by the
5+
[`engine` kwarg](engines). Applying these implementations on a parallel array type like dask
6+
can be hard. Performance strongly depends on how the groups are distributed amongst the blocks of an array.
97

108
`flox` implements 4 strategies for grouped reductions, each is appropriate for a particular distribution of groups
119
among the blocks of a dask array. Switch between the various strategies by passing `method`
12-
and/or `reindex` to either {py:func}`flox.core.groupby_reduce` or `xarray_reduce`.
10+
and/or `reindex` to either {py:func}`flox.groupby_reduce` or {py:func}`flox.xarray.xarray_reduce`.
1311

1412
Your options are:
1513
1. `method="map-reduce"` with `reindex=False`
@@ -20,6 +18,11 @@ Your options are:
2018
The most appropriate strategy for your problem will depend on the chunking of your dataset,
2119
and the distribution of group labels across those chunks.
2220

21+
```{tip}
22+
Currently these strategieis are implemented for dask. We would like to generalize to other parallel array types
23+
as appropriate (e.g. Ramba, cubed, arkouda). Please open an issue to discuss if you are interested.
24+
```
25+
2326
(xarray-split)=
2427
## Background: Xarray's current GroupBy strategy
2528

docs/source/index.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ See a presentation ([video](https://discourse.pangeo.io/t/november-17-2021-flox-
3030
1. {py:func}`flox.xarray.xarray_reduce` extends Xarray's GroupBy operations allowing lazy grouping by dask arrays, grouping by multiple arrays,
3131
as well as combining categorical grouping and histrogram-style binning operations using multiple variables.
3232
1. `flox` also provides utility functions for rechunking both dask arrays and Xarray objects along a single dimension using the group labels as a guide:
33-
1. To rechunk for blockwise operations: {py:func}`flox.rechunk_for_blockwise`, {py:func}`flox.xarray.rechunk_for_blockwise`.
34-
1. To rechunk so that "cohorts", or groups of labels, tend to occur in the same chunks: {py:func}`flox.rechunk_for_cohorts`, {py:func}`flox.xarray.rechunk_for_cohorts`.
33+
1. To rechunk for blockwise operations: {py:func}`flox.rechunk_for_blockwise`, {py:func}`flox.xarray.rechunk_for_blockwise`.
34+
1. To rechunk so that "cohorts", or groups of labels, tend to occur in the same chunks: {py:func}`flox.rechunk_for_cohorts`, {py:func}`flox.xarray.rechunk_for_cohorts`.
3535

3636
## Installing
3737

@@ -59,9 +59,9 @@ It was motivated by many discussions in the [Pangeo](https://pangeo.io) communit
5959
.. toctree::
6060
:maxdepth: 1
6161
62-
implementation.md
62+
aggregations.md
6363
engines.md
64-
custom.md
64+
implementation.md
6565
xarray.md
6666
api.rst
6767
user-stories.md

0 commit comments

Comments
 (0)