xarray-contrib
diff --git a/‎.pre-commit-config.yaml
Lines changed: 20 additions & 10 deletions b/‎.pre-commit-config.yaml
Lines changed: 20 additions & 10 deletions
diff --git a/‎README.md
Lines changed: 30 additions & 29 deletions b/‎README.md
Lines changed: 30 additions & 29 deletions
diff --git a/‎asv_bench/benchmarks/README_CI.md
Lines changed: 20 additions & 16 deletions b/‎asv_bench/benchmarks/README_CI.md
Lines changed: 20 additions & 16 deletions
diff --git a/‎docs/source/aggregations.md
Lines changed: 16 additions & 17 deletions b/‎docs/source/aggregations.md
Lines changed: 16 additions & 17 deletions
diff --git a/‎docs/source/arrays.md
Lines changed: 1 addition & 2 deletions b/‎docs/source/arrays.md
Lines changed: 1 addition & 2 deletions
diff --git a/‎docs/source/engines.md
Lines changed: 4 additions & 2 deletions b/‎docs/source/engines.md
Lines changed: 4 additions & 2 deletions
@@ -25,14 +25,24 @@ repos:
       hooks:
         - id: isort
 
-    - repo: https://github.com/deathbeds/prenotebook
-      rev: f5bdb72a400f1a56fe88109936c83aa12cc349fa
+    - repo: https://github.com/executablebooks/mdformat
+      rev: 0.7.16
       hooks:
-        - id: prenotebook
-          args:
-            [
-              '--keep-output',
-              '--keep-metadata',
-              '--keep-execution-count',
-              '--keep-empty',
-            ]
+      - id: mdformat
+        additional_dependencies:
+          - mdformat-black
+          - mdformat-myst
+
+    - repo: https://github.com/nbQA-dev/nbQA
+      rev: 1.6.1
+      hooks:
+          - id: nbqa-black
+          - id: nbqa-pyupgrade
+            args: [--py37-plus]
+          - id: nbqa-isort
+
+    - repo: https://github.com/kynan/nbstripout
+      rev: 0.6.1
+      hooks:
+        - id: nbstripout
+          args: [--extra-keys=metadata.kernelspec metadata.language_info.version]
@@ -14,10 +14,10 @@
 This project explores strategies for fast GroupBy reductions with dask.array. It used to be called `dask_groupby`
 It was motivated by
 
-1.  Dask Dataframe GroupBy
-    [blogpost](https://blog.dask.org/2019/10/08/df-groupby)
-2.  [numpy_groupies](https://github.com/ml31415/numpy-groupies) in Xarray
-    [issue](https://github.com/pydata/xarray/issues/4473)
+1. Dask Dataframe GroupBy
+   [blogpost](https://blog.dask.org/2019/10/08/df-groupby)
+1. [numpy_groupies](https://github.com/ml31415/numpy-groupies) in Xarray
+   [issue](https://github.com/pydata/xarray/issues/4473)
 
 (See a
 [presentation](https://docs.google.com/presentation/d/1YubKrwu9zPHC_CzVBhvORuQBW-z148BvX3Ne8XcvWsQ/edit?usp=sharing)
@@ -26,22 +26,23 @@ about this package, from the Pangeo Showcase).
 ## Acknowledgements
 
 This work was funded in part by
+
 1. NASA-ACCESS 80NSSC18M0156 "Community tools for analysis of NASA Earth Observing System
-Data in the Cloud" (PI J. Hamman, NCAR),
-2. NASA-OSTFL 80NSSC22K0345 "Enhancing analysis of NASA data with the open-source Python Xarray Library" (PIs Scott Henderson, University of Washington; Deepak Cherian, NCAR; Jessica Scheick, University of New Hampshire), and
-3. [NCAR's Earth System Data Science Initiative](https://ncar.github.io/esds/).
+   Data in the Cloud" (PI J. Hamman, NCAR),
+1. NASA-OSTFL 80NSSC22K0345 "Enhancing analysis of NASA data with the open-source Python Xarray Library" (PIs Scott Henderson, University of Washington; Deepak Cherian, NCAR; Jessica Scheick, University of New Hampshire), and
+1. [NCAR's Earth System Data Science Initiative](https://ncar.github.io/esds/).
 
 It was motivated by [very](https://github.com/pangeo-data/pangeo/issues/266) [very](https://github.com/pangeo-data/pangeo/issues/271) [many](https://github.com/dask/distributed/issues/2602) [discussions](https://github.com/pydata/xarray/issues/2237) in the [Pangeo](https://pangeo.io) community.
 
 ## API
 
 There are two main functions
-1.  `flox.groupby_reduce(dask_array, by_dask_array, "mean")`
-    "pure" dask array interface
-1.  `flox.xarray.xarray_reduce(xarray_object, by_dataarray, "mean")`
-    "pure" xarray interface; though [work is ongoing](https://github.com/pydata/xarray/pull/5734) to integrate this
-    package in xarray.
 
+1. `flox.groupby_reduce(dask_array, by_dask_array, "mean")`
+   "pure" dask array interface
+1. `flox.xarray.xarray_reduce(xarray_object, by_dataarray, "mean")`
+   "pure" xarray interface; though [work is ongoing](https://github.com/pydata/xarray/pull/5734) to integrate this
+   package in xarray.
 
 ## Implementation
 
@@ -53,21 +54,21 @@ See [the documentation](https://flox.readthedocs.io/en/latest/implementation.htm
 It also allows you to specify a custom Aggregation (again inspired by dask.dataframe),
 though this might not be fully functional at the moment. See `aggregations.py` for examples.
 
-``` python
-    mean = Aggregation(
-        # name used for dask tasks
-        name="mean",
-        # operation to use for pure-numpy inputs
-        numpy="mean",
-        # blockwise reduction
-        chunk=("sum", "count"),
-        # combine intermediate results: sum the sums, sum the counts
-        combine=("sum", "sum"),
-        # generate final result as sum / count
-        finalize=lambda sum_, count: sum_ / count,
-        # Used when "reindexing" at combine-time
-        fill_value=0,
-        # Used when any member of `expected_groups` is not found
-        final_fill_value=np.nan,
-    )
+```python
+mean = Aggregation(
+    # name used for dask tasks
+    name="mean",
+    # operation to use for pure-numpy inputs
+    numpy="mean",
+    # blockwise reduction
+    chunk=("sum", "count"),
+    # combine intermediate results: sum the sums, sum the counts
+    combine=("sum", "sum"),
+    # generate final result as sum / count
+    finalize=lambda sum_, count: sum_ / count,
+    # Used when "reindexing" at combine-time
+    fill_value=0,
+    # Used when any member of `expected_groups` is not found
+    final_fill_value=np.nan,
+)
 ```
@@ -1,7 +1,9 @@
 # Benchmark CI
 
 <!-- Author: @jaimergp -->
+
 <!-- Last updated: 2021.07.06 -->
+
 <!-- Describes the work done as part of https://github.com/scikit-image/scikit-image/pull/5424 -->
 
 ## How it works
@@ -10,39 +12,39 @@ The `asv` suite can be run for any PR on GitHub Actions (check workflow `.github
 
 We use `asv continuous` to run the job, which runs a relative performance measurement. This means that there's no state to be saved and that regressions are only caught in terms of performance ratio (absolute numbers are available but they are not useful since we do not use stable hardware over time). `asv continuous` will:
 
-* Compile `scikit-image` for _both_ commits. We use `ccache` to speed up the process, and `mamba` is used to create the build environments.
-* Run the benchmark suite for both commits, _twice_  (since `processes=2` by default).
-* Generate a report table with performance ratios:
-    * `ratio=1.0` -> performance didn't change.
-    * `ratio<1.0` -> PR made it slower.
-    * `ratio>1.0` -> PR made it faster.
+- Compile `scikit-image` for _both_ commits. We use `ccache` to speed up the process, and `mamba` is used to create the build environments.
+- Run the benchmark suite for both commits, _twice_  (since `processes=2` by default).
+- Generate a report table with performance ratios:
+  - `ratio=1.0` -> performance didn't change.
+  - `ratio<1.0` -> PR made it slower.
+  - `ratio>1.0` -> PR made it faster.
 
 Due to the sensitivity of the test, we cannot guarantee that false positives are not produced. In practice, values between `(0.7, 1.5)` are to be considered part of the measurement noise. When in doubt, running the benchmark suite one more time will provide more information about the test being a false positive or not.
 
 ## Running the benchmarks on GitHub Actions
 
 1. On a PR, add the label `run-benchmark`.
-2. The CI job will be started. Checks will appear in the usual dashboard panel above the comment box.
-3. If more commits are added, the label checks will be grouped with the last commit checks _before_ you added the label.
-4. Alternatively, you can always go to the `Actions` tab in the repo and [filter for `workflow:Benchmark`](https://github.com/scikit-image/scikit-image/actions?query=workflow%3ABenchmark). Your username will be assigned to the `actor` field, so you can also filter the results with that if you need it.
+1. The CI job will be started. Checks will appear in the usual dashboard panel above the comment box.
+1. If more commits are added, the label checks will be grouped with the last commit checks _before_ you added the label.
+1. Alternatively, you can always go to the `Actions` tab in the repo and [filter for `workflow:Benchmark`](https://github.com/scikit-image/scikit-image/actions?query=workflow%3ABenchmark). Your username will be assigned to the `actor` field, so you can also filter the results with that if you need it.
 
 ## The artifacts
 
 The CI job will also generate an artifact. This is the `.asv/results` directory compressed in a zip file. Its contents include:
 
-* `fv-xxxxx-xx/`. A directory for the machine that ran the suite. It contains three files:
-    * `<baseline>.json`, `<contender>.json`: the benchmark results for each commit, with stats.
-    * `machine.json`: details about the hardware.
-* `benchmarks.json`: metadata about the current benchmark suite.
-* `benchmarks.log`: the CI logs for this run.
-* This README.
+- `fv-xxxxx-xx/`. A directory for the machine that ran the suite. It contains three files:
+  - `<baseline>.json`, `<contender>.json`: the benchmark results for each commit, with stats.
+  - `machine.json`: details about the hardware.
+- `benchmarks.json`: metadata about the current benchmark suite.
+- `benchmarks.log`: the CI logs for this run.
+- This README.
 
 ## Re-running the analysis
 
 Although the CI logs should be enough to get an idea of what happened (check the table at the end), one can use `asv` to run the analysis routines again.
 
 1. Uncompress the artifact contents in the repo, under `.asv/results`. This is, you should see `.asv/results/benchmarks.log`, not `.asv/results/something_else/benchmarks.log`. Write down the machine directory name for later.
-2. Run `asv show` to see your available results. You will see something like this:
+1. Run `asv show` to see your available results. You will see something like this:
 
 ```
 $> asv show
@@ -115,8 +117,10 @@ To minimize the time required to run the full suite, we trimmed the parameter ma
 ```python
 from . import _skip_slow  # this function is defined in benchmarks.__init__
 
+
 def time_something_slow():
     pass
 
+
 time_something.setup = _skip_slow
 ```
@@ -14,7 +14,6 @@ the `func` kwarg:
 - `"first"`
 - `"last"`
 
-
 ```{tip}
 We would like to add support for `cumsum`, `cumprod` ([issue](https://github.com/xarray-contrib/flox/issues/91)). Contributions are welcome!
 ```
@@ -27,20 +26,20 @@ though this might not be fully functional at the moment. See `aggregations.py` f
 See the ["Custom Aggregations"](user-stories/custom-aggregations.ipynb) user story for a more user-friendly example.
 
 ```python
-    mean = Aggregation(
-        # name used for dask tasks
-        name="mean",
-        # operation to use for pure-numpy inputs
-        numpy="mean",
-        # blockwise reduction
-        chunk=("sum", "count"),
-        # combine intermediate results: sum the sums, sum the counts
-        combine=("sum", "sum"),
-        # generate final result as sum / count
-        finalize=lambda sum_, count: sum_ / count,
-        # Used when "reindexing" at combine-time
-        fill_value=0,
-        # Used when any member of `expected_groups` is not found
-        final_fill_value=np.nan,
-    )
+mean = Aggregation(
+    # name used for dask tasks
+    name="mean",
+    # operation to use for pure-numpy inputs
+    numpy="mean",
+    # blockwise reduction
+    chunk=("sum", "count"),
+    # combine intermediate results: sum the sums, sum the counts
+    combine=("sum", "sum"),
+    # generate final result as sum / count
+    finalize=lambda sum_, count: sum_ / count,
+    # Used when "reindexing" at combine-time
+    fill_value=0,
+    # Used when any member of `expected_groups` is not found
+    final_fill_value=np.nan,
+)
 ```
@@ -2,9 +2,8 @@
 
 Aggregating over other array types will work if the array types supports the following methods, [ufunc.reduceat](https://numpy.org/doc/stable/reference/generated/numpy.ufunc.reduceat.html) or [ufunc.at](https://numpy.org/doc/stable/reference/generated/numpy.ufunc.at.html)
 
-
 | Reduction                      | `method="numpy"` | `method="flox"`   |
-|--------------------------------|------------------|-------------------|
+| ------------------------------ | ---------------- | ----------------- |
 | sum, nansum                    | bincount         | add.reduceat      |
 | mean, nanmean                  | bincount         | add.reduceat      |
 | var, nanvar                    | bincount         | add.reduceat      |
 
@@ -1,4 +1,5 @@
 (engines)=
+
 # Engines
 
 `flox` provides multiple options, using the `engine` kwarg, for computing the core GroupBy reduction on numpy or other array types other than dask.
@@ -7,13 +8,14 @@
    (.e.g `np.maximum.at`) to provided reasonably performant aggregations.
 1. `engine="numba"` wraps `numpy_groupies.aggregate_numba`. This uses `numba` kernels for the core aggregation.
 1. `engine="flox"` uses the `ufunc.reduceat` method after first argsorting the array so that all group members occur sequentially. This was copied from
-    a [gist by Stephan Hoyer](https://gist.github.com/shoyer/f538ac78ae904c936844)
+   a [gist by Stephan Hoyer](https://gist.github.com/shoyer/f538ac78ae904c936844)
 
 See [](arrays) for more details.
 
 ## Tradeoffs
 
-    For the common case of reducing a nD array by a 1D array of group labels (e.g. `groupby("time.month")`), `engine="flox"` *can* be faster.
+For the common case of reducing a nD array by a 1D array of group labels (e.g. `groupby("time.month")`), `engine="flox"` *can* be faster.
+
 The reason is that `numpy_groupies` converts all groupby problems to a 1D problem, this can involve [some overhead](https://github.com/ml31415/numpy-groupies/pull/46).
 It is possible to optimize this a bit in `flox` or `numpy_groupies`, but the work has not been done yet.
 The advantage of `engine="numpy"` is that it tends to work for more array types, since it appears to be more common to implement `np.bincount`, and not `np.add.reduceat`.