Skip to content

Commit ace7d92

Browse files
committed
Updates
1 parent affb6ac commit ace7d92

8 files changed

+7508
-797
lines changed

docs/diagrams/new-blockwise-annotated.svg

Lines changed: 1185 additions & 0 deletions
Loading

docs/diagrams/new-blockwise.svg

Lines changed: 138 additions & 138 deletions
Loading

docs/diagrams/new-cohorts-annotated.svg

Lines changed: 1845 additions & 0 deletions
Loading

docs/diagrams/new-map-reduce-reindex-False-annotated.svg

Lines changed: 1887 additions & 0 deletions
Loading

docs/diagrams/new-map-reduce-reindex-True-annotated.svg

Lines changed: 1794 additions & 0 deletions
Loading

docs/diagrams/new-map-reduce-reindex-True.svg

Lines changed: 651 additions & 651 deletions
Loading

docs/source/implementation.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@
77
Running an efficient groupby reduction in parallel is hard, and strongly depends on how the
88
groups are distributed amongst the blocks of an array.
99

10-
`flox` implements 4 strategies for
11-
grouped reductions, each is appropriate for a particular distribution of groups
10+
`flox` implements 4 strategies for grouped reductions, each is appropriate for a particular distribution of groups
1211
among the blocks of a dask array. Switch between the various strategies by passing `method`
1312
and/or `reindex` to either {py:func}`flox.core.groupby_reduce` or `xarray_reduce`.
13+
1414
Your options are:
15-
1. `method="blockwise"`
1615
1. `method="map-reduce"` with `reindex=False`
1716
1. `method="map-reduce"` with `reindex=True`
17+
1. `method="blockwise"`
1818
1. `method="cohorts"`
1919

2020
The most appropriate strategy for your problem will depend on the chunking of your dataset,
@@ -61,7 +61,7 @@ If we know all the group labels, we can do so right at the blockwise step (`rein
6161
`xhistogram`, where the bin edges, or group labels oof the output, are known. The downside is the potential of large memory use
6262
if number of output groups is much larger than number of groups in a block.
6363

64-
```{image} ../diagrams/new-map-reduce-reindex-True.svg
64+
```{image} ../diagrams/new-map-reduce-reindex-True-annotated.svg
6565
:alt: map-reduce-reindex-True-strategy-schematic
6666
:width: 100%
6767
```
@@ -70,7 +70,7 @@ if number of output groups is much larger than number of groups in a block.
7070
We can `reindex` at the combine stage to groups present in the blocks being combined (`reindex=False`). This can limit memory use at the cost
7171
of a performance reduction due to extra copies of the intermediate data during reindexing.
7272

73-
```{image} ../diagrams/new-map-reduce-reindex-False.svg
73+
```{image} ../diagrams/new-map-reduce-reindex-False-annotated.svg
7474
:alt: map-reduce-reindex-True-strategy-schematic
7575
:width: 100%
7676
```
@@ -99,7 +99,7 @@ For resampling type reductions,
9999
In this case, it makes sense to use `dask.dataframe` resample strategy which is to rechunk using {py:func}`flox.rechunk_for_blockwise`
100100
so that all members of a group are in a single block. Then, the groupby operation can be applied blockwise.
101101

102-
```{image} ../diagrams/new-blockwise.svg
102+
```{image} ../diagrams/new-blockwise-annotated.svg
103103
:alt: blockwise-strategy-schematic
104104
:width: 100%
105105
```
@@ -158,7 +158,7 @@ We first apply the groupby-reduction blockwise, then split and reindex blocks to
158158
using `map-reduce`. Because the split or shuffle step occurs after the blockwise reduction, we *sometimes* communicate a significantly smaller amount of data
159159
than if we split or shuffled the input array.
160160

161-
```{image} /../diagrams/new-cohorts.svg
161+
```{image} /../diagrams/new-cohorts-annotated.svg
162162
:alt: cohorts-strategy-schematic
163163
:width: 100%
164164
```

docs/source/xarray.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Xarray
22

3-
Xarray will use flox by default for numpy and dask array backed Xarray objects if it is installed. By default, it will use `method="cohorts"` which generalizes
3+
Xarray will use flox by default (if installed) for DataArrays containing numpy and dask arrays. The default choice is `method="cohorts"` which generalizes
44
the best. Pass flox-specific kwargs to the specific reduction method:
55
```python
66
ds.groupby("time.month").mean(method="map-reduce", engine="flox")

0 commit comments

Comments
 (0)