Skip to content

Commit b55c783

Browse files
dcherianIllviljankeewisheadtr1ck
authored
Grouper, Resampler as public api (#8840)
* Grouper, Resampler as public API * Add test * Add docs * Fix test * fix types. * bugfix * Better binning API * docs.fixes * Apply suggestions from code review Co-authored-by: Illviljan <[email protected]> * Fix typing * clean up reprs * Allow passing dicts * Apply suggestions from code review Co-authored-by: Justus Magin <[email protected]> * Update xarray/core/common.py Co-authored-by: Justus Magin <[email protected]> * Review comments * Fix docstring * Try to fix typing * Nicer error * Try fixing types * fix * Apply suggestions from code review Co-authored-by: Michael Niklas <[email protected]> * Review comments * Add whats-new note * Fix * Add more types * Fix link --------- Co-authored-by: Illviljan <[email protected]> Co-authored-by: Justus Magin <[email protected]> Co-authored-by: Michael Niklas <[email protected]>
1 parent c2aebd8 commit b55c783

13 files changed

+477
-160
lines changed

doc/api-hidden.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -693,3 +693,7 @@
693693

694694
coding.times.CFTimedeltaCoder
695695
coding.times.CFDatetimeCoder
696+
697+
core.groupers.Grouper
698+
core.groupers.Resampler
699+
core.groupers.EncodedGroups

doc/api.rst

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -803,6 +803,18 @@ DataArray
803803
DataArrayGroupBy.dims
804804
DataArrayGroupBy.groups
805805

806+
Grouper Objects
807+
---------------
808+
809+
.. currentmodule:: xarray.core
810+
811+
.. autosummary::
812+
:toctree: generated/
813+
814+
groupers.BinGrouper
815+
groupers.UniqueGrouper
816+
groupers.TimeResampler
817+
806818

807819
Rolling objects
808820
===============
@@ -1028,17 +1040,20 @@ DataArray
10281040
Accessors
10291041
=========
10301042

1031-
.. currentmodule:: xarray
1043+
.. currentmodule:: xarray.core
10321044

10331045
.. autosummary::
10341046
:toctree: generated/
10351047

1036-
core.accessor_dt.DatetimeAccessor
1037-
core.accessor_dt.TimedeltaAccessor
1038-
core.accessor_str.StringAccessor
1048+
accessor_dt.DatetimeAccessor
1049+
accessor_dt.TimedeltaAccessor
1050+
accessor_str.StringAccessor
1051+
10391052

10401053
Custom Indexes
10411054
==============
1055+
.. currentmodule:: xarray
1056+
10421057
.. autosummary::
10431058
:toctree: generated/
10441059

doc/conf.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,8 @@
158158
"Variable": "~xarray.Variable",
159159
"DatasetGroupBy": "~xarray.core.groupby.DatasetGroupBy",
160160
"DataArrayGroupBy": "~xarray.core.groupby.DataArrayGroupBy",
161+
"Grouper": "~xarray.core.groupers.Grouper",
162+
"Resampler": "~xarray.core.groupers.Resampler",
161163
# objects without namespace: numpy
162164
"ndarray": "~numpy.ndarray",
163165
"MaskedArray": "~numpy.ma.MaskedArray",
@@ -169,6 +171,7 @@
169171
"CategoricalIndex": "~pandas.CategoricalIndex",
170172
"TimedeltaIndex": "~pandas.TimedeltaIndex",
171173
"DatetimeIndex": "~pandas.DatetimeIndex",
174+
"IntervalIndex": "~pandas.IntervalIndex",
172175
"Series": "~pandas.Series",
173176
"DataFrame": "~pandas.DataFrame",
174177
"Categorical": "~pandas.Categorical",

doc/user-guide/groupby.rst

Lines changed: 82 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. currentmodule:: xarray
2+
13
.. _groupby:
24

35
GroupBy: Group and Bin Data
@@ -15,19 +17,20 @@ __ https://www.jstatsoft.org/v40/i01/paper
1517
- Apply some function to each group.
1618
- Combine your groups back into a single data object.
1719

18-
Group by operations work on both :py:class:`~xarray.Dataset` and
19-
:py:class:`~xarray.DataArray` objects. Most of the examples focus on grouping by
20+
Group by operations work on both :py:class:`Dataset` and
21+
:py:class:`DataArray` objects. Most of the examples focus on grouping by
2022
a single one-dimensional variable, although support for grouping
2123
over a multi-dimensional variable has recently been implemented. Note that for
2224
one-dimensional data, it is usually faster to rely on pandas' implementation of
2325
the same pipeline.
2426

2527
.. tip::
2628

27-
To substantially improve the performance of GroupBy operations, particularly
28-
with dask `install the flox package <https://flox.readthedocs.io>`_. flox
29+
`Install the flox package <https://flox.readthedocs.io>`_ to substantially improve the performance
30+
of GroupBy operations, particularly with dask. flox
2931
`extends Xarray's in-built GroupBy capabilities <https://flox.readthedocs.io/en/latest/xarray.html>`_
30-
by allowing grouping by multiple variables, and lazy grouping by dask arrays. If installed, Xarray will automatically use flox by default.
32+
by allowing grouping by multiple variables, and lazy grouping by dask arrays.
33+
If installed, Xarray will automatically use flox by default.
3134

3235
Split
3336
~~~~~
@@ -87,7 +90,7 @@ Binning
8790
Sometimes you don't want to use all the unique values to determine the groups
8891
but instead want to "bin" the data into coarser groups. You could always create
8992
a customized coordinate, but xarray facilitates this via the
90-
:py:meth:`~xarray.Dataset.groupby_bins` method.
93+
:py:meth:`Dataset.groupby_bins` method.
9194

9295
.. ipython:: python
9396
@@ -110,7 +113,7 @@ Apply
110113
~~~~~
111114

112115
To apply a function to each group, you can use the flexible
113-
:py:meth:`~xarray.core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
116+
:py:meth:`core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
114117
concatenated back together along the group axis:
115118

116119
.. ipython:: python
@@ -121,8 +124,8 @@ concatenated back together along the group axis:
121124
122125
arr.groupby("letters").map(standardize)
123126
124-
GroupBy objects also have a :py:meth:`~xarray.core.groupby.DatasetGroupBy.reduce` method and
125-
methods like :py:meth:`~xarray.core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
127+
GroupBy objects also have a :py:meth:`core.groupby.DatasetGroupBy.reduce` method and
128+
methods like :py:meth:`core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
126129
aggregation function:
127130

128131
.. ipython:: python
@@ -183,7 +186,7 @@ Iterating and Squeezing
183186
Previously, Xarray defaulted to squeezing out dimensions of size one when iterating over
184187
a GroupBy object. This behaviour is being removed.
185188
You can always squeeze explicitly later with the Dataset or DataArray
186-
:py:meth:`~xarray.DataArray.squeeze` methods.
189+
:py:meth:`DataArray.squeeze` methods.
187190

188191
.. ipython:: python
189192
@@ -217,7 +220,7 @@ __ https://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_two_dime
217220
da.groupby("lon").map(lambda x: x - x.mean(), shortcut=False)
218221
219222
Because multidimensional groups have the ability to generate a very large
220-
number of bins, coarse-binning via :py:meth:`~xarray.Dataset.groupby_bins`
223+
number of bins, coarse-binning via :py:meth:`Dataset.groupby_bins`
221224
may be desirable:
222225

223226
.. ipython:: python
@@ -232,3 +235,71 @@ applying your function, and then unstacking the result:
232235
233236
stacked = da.stack(gridcell=["ny", "nx"])
234237
stacked.groupby("gridcell").sum(...).unstack("gridcell")
238+
239+
.. _groupby.groupers:
240+
241+
Grouper Objects
242+
~~~~~~~~~~~~~~~
243+
244+
Both ``groupby_bins`` and ``resample`` are specializations of the core ``groupby`` operation for binning,
245+
and time resampling. Many problems demand more complex GroupBy application: for example, grouping by multiple
246+
variables with a combination of categorical grouping, binning, and resampling; or more specializations like
247+
spatial resampling; or more complex time grouping like special handling of seasons, or the ability to specify
248+
custom seasons. To handle these use-cases and more, Xarray is evolving to providing an
249+
extension point using ``Grouper`` objects.
250+
251+
.. tip::
252+
253+
See the `grouper design`_ doc for more detail on the motivation and design ideas behind
254+
Grouper objects.
255+
256+
.. _grouper design: https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md
257+
258+
For now Xarray provides three specialized Grouper objects:
259+
260+
1. :py:class:`groupers.UniqueGrouper` for categorical grouping
261+
2. :py:class:`groupers.BinGrouper` for binned grouping
262+
3. :py:class:`groupers.TimeResampler` for resampling along a datetime coordinate
263+
264+
These provide functionality identical to the existing ``groupby``, ``groupby_bins``, and ``resample`` methods.
265+
That is,
266+
267+
.. code-block:: python
268+
269+
ds.groupby("x")
270+
271+
is identical to
272+
273+
.. code-block:: python
274+
275+
from xarray.groupers import UniqueGrouper
276+
277+
ds.groupby(x=UniqueGrouper())
278+
279+
; and
280+
281+
.. code-block:: python
282+
283+
ds.groupby_bins("x", bins=bins)
284+
285+
is identical to
286+
287+
.. code-block:: python
288+
289+
from xarray.groupers import BinGrouper
290+
291+
ds.groupby(x=BinGrouper(bins))
292+
293+
and
294+
295+
.. code-block:: python
296+
297+
ds.resample(time="ME")
298+
299+
is identical to
300+
301+
.. code-block:: python
302+
303+
from xarray.groupers import TimeResampler
304+
305+
ds.resample(time=TimeResampler("ME"))

doc/whats-new.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,12 @@ v2024.06.1 (unreleased)
2222

2323
New Features
2424
~~~~~~~~~~~~
25+
- Introduce new :py:class:`groupers.UniqueGrouper`, :py:class:`groupers.BinGrouper`, and
26+
:py:class:`groupers.TimeResampler` objects as a step towards supporting grouping by
27+
multiple variables. See the `docs <groupby.groupers_>` and the
28+
`grouper design doc <https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md>`_ for more.
29+
(:issue:`6610`, :pull:`8840`).
30+
By `Deepak Cherian <https://github.com/dcherian>`_.
2531
- Allow per-variable specification of ``mask_and_scale``, ``decode_times``, ``decode_timedelta``
2632
``use_cftime`` and ``concat_characters`` params in :py:func:`~xarray.open_dataset` (:pull:`9218`).
2733
By `Mathijs Verhaegh <https://github.com/Ostheer>`_.

xarray/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
from xarray.coding.cftimeindex import CFTimeIndex
1515
from xarray.coding.frequencies import infer_freq
1616
from xarray.conventions import SerializationWarning, decode_cf
17+
from xarray.core import groupers
1718
from xarray.core.alignment import align, broadcast
1819
from xarray.core.combine import combine_by_coords, combine_nested
1920
from xarray.core.common import ALL_DIMS, full_like, ones_like, zeros_like
@@ -55,6 +56,7 @@
5556
# `mypy --strict` running in projects that import xarray.
5657
__all__ = (
5758
# Sub-packages
59+
"groupers",
5860
"testing",
5961
"tutorial",
6062
# Top-level functions

xarray/core/common.py

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838

3939
from xarray.core.dataarray import DataArray
4040
from xarray.core.dataset import Dataset
41+
from xarray.core.groupers import Resampler
4142
from xarray.core.indexes import Index
4243
from xarray.core.resample import Resample
4344
from xarray.core.rolling_exp import RollingExp
@@ -876,7 +877,7 @@ def rolling_exp(
876877
def _resample(
877878
self,
878879
resample_cls: type[T_Resample],
879-
indexer: Mapping[Any, str] | None,
880+
indexer: Mapping[Hashable, str | Resampler] | None,
880881
skipna: bool | None,
881882
closed: SideOptions | None,
882883
label: SideOptions | None,
@@ -885,7 +886,7 @@ def _resample(
885886
origin: str | DatetimeLike,
886887
loffset: datetime.timedelta | str | None,
887888
restore_coord_dims: bool | None,
888-
**indexer_kwargs: str,
889+
**indexer_kwargs: str | Resampler,
889890
) -> T_Resample:
890891
"""Returns a Resample object for performing resampling operations.
891892
@@ -1068,7 +1069,7 @@ def _resample(
10681069

10691070
from xarray.core.dataarray import DataArray
10701071
from xarray.core.groupby import ResolvedGrouper
1071-
from xarray.core.groupers import TimeResampler
1072+
from xarray.core.groupers import Resampler, TimeResampler
10721073
from xarray.core.resample import RESAMPLE_DIM
10731074

10741075
# note: the second argument (now 'skipna') use to be 'dim'
@@ -1098,15 +1099,21 @@ def _resample(
10981099
name=RESAMPLE_DIM,
10991100
)
11001101

1101-
grouper = TimeResampler(
1102-
freq=freq,
1103-
closed=closed,
1104-
label=label,
1105-
origin=origin,
1106-
offset=offset,
1107-
loffset=loffset,
1108-
base=base,
1109-
)
1102+
grouper: Resampler
1103+
if isinstance(freq, str):
1104+
grouper = TimeResampler(
1105+
freq=freq,
1106+
closed=closed,
1107+
label=label,
1108+
origin=origin,
1109+
offset=offset,
1110+
loffset=loffset,
1111+
base=base,
1112+
)
1113+
elif isinstance(freq, Resampler):
1114+
grouper = freq
1115+
else:
1116+
raise ValueError("freq must be a str or a Resampler object")
11101117

11111118
rgrouper = ResolvedGrouper(grouper, group, self)
11121119

0 commit comments

Comments
 (0)