1
+ .. currentmodule :: xarray
2
+
1
3
.. _groupby :
2
4
3
5
GroupBy: Group and Bin Data
@@ -15,19 +17,20 @@ __ https://www.jstatsoft.org/v40/i01/paper
15
17
- Apply some function to each group.
16
18
- Combine your groups back into a single data object.
17
19
18
- Group by operations work on both :py:class: `~xarray. Dataset ` and
19
- :py:class: `~xarray. DataArray ` objects. Most of the examples focus on grouping by
20
+ Group by operations work on both :py:class: `Dataset ` and
21
+ :py:class: `DataArray ` objects. Most of the examples focus on grouping by
20
22
a single one-dimensional variable, although support for grouping
21
23
over a multi-dimensional variable has recently been implemented. Note that for
22
24
one-dimensional data, it is usually faster to rely on pandas' implementation of
23
25
the same pipeline.
24
26
25
27
.. tip ::
26
28
27
- To substantially improve the performance of GroupBy operations, particularly
28
- with dask ` install the flox package < https://flox.readthedocs.io >`_ . flox
29
+ ` Install the flox package < https://flox.readthedocs.io >`_ to substantially improve the performance
30
+ of GroupBy operations, particularly with dask . flox
29
31
`extends Xarray's in-built GroupBy capabilities <https://flox.readthedocs.io/en/latest/xarray.html >`_
30
- by allowing grouping by multiple variables, and lazy grouping by dask arrays. If installed, Xarray will automatically use flox by default.
32
+ by allowing grouping by multiple variables, and lazy grouping by dask arrays.
33
+ If installed, Xarray will automatically use flox by default.
31
34
32
35
Split
33
36
~~~~~
@@ -87,7 +90,7 @@ Binning
87
90
Sometimes you don't want to use all the unique values to determine the groups
88
91
but instead want to "bin" the data into coarser groups. You could always create
89
92
a customized coordinate, but xarray facilitates this via the
90
- :py:meth: `~xarray. Dataset.groupby_bins ` method.
93
+ :py:meth: `Dataset.groupby_bins ` method.
91
94
92
95
.. ipython :: python
93
96
@@ -110,7 +113,7 @@ Apply
110
113
~~~~~
111
114
112
115
To apply a function to each group, you can use the flexible
113
- :py:meth: `~xarray. core.groupby.DatasetGroupBy.map ` method. The resulting objects are automatically
116
+ :py:meth: `core.groupby.DatasetGroupBy.map ` method. The resulting objects are automatically
114
117
concatenated back together along the group axis:
115
118
116
119
.. ipython :: python
@@ -121,8 +124,8 @@ concatenated back together along the group axis:
121
124
122
125
arr.groupby(" letters" ).map(standardize)
123
126
124
- GroupBy objects also have a :py:meth: `~xarray. core.groupby.DatasetGroupBy.reduce ` method and
125
- methods like :py:meth: `~xarray. core.groupby.DatasetGroupBy.mean ` as shortcuts for applying an
127
+ GroupBy objects also have a :py:meth: `core.groupby.DatasetGroupBy.reduce ` method and
128
+ methods like :py:meth: `core.groupby.DatasetGroupBy.mean ` as shortcuts for applying an
126
129
aggregation function:
127
130
128
131
.. ipython :: python
@@ -183,7 +186,7 @@ Iterating and Squeezing
183
186
Previously, Xarray defaulted to squeezing out dimensions of size one when iterating over
184
187
a GroupBy object. This behaviour is being removed.
185
188
You can always squeeze explicitly later with the Dataset or DataArray
186
- :py:meth: `~xarray. DataArray.squeeze ` methods.
189
+ :py:meth: `DataArray.squeeze ` methods.
187
190
188
191
.. ipython :: python
189
192
@@ -217,7 +220,7 @@ __ https://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_two_dime
217
220
da.groupby(" lon" ).map(lambda x : x - x.mean(), shortcut = False )
218
221
219
222
Because multidimensional groups have the ability to generate a very large
220
- number of bins, coarse-binning via :py:meth: `~xarray. Dataset.groupby_bins `
223
+ number of bins, coarse-binning via :py:meth: `Dataset.groupby_bins `
221
224
may be desirable:
222
225
223
226
.. ipython :: python
@@ -232,3 +235,71 @@ applying your function, and then unstacking the result:
232
235
233
236
stacked = da.stack(gridcell = [" ny" , " nx" ])
234
237
stacked.groupby(" gridcell" ).sum(... ).unstack(" gridcell" )
238
+
239
+ .. _groupby.groupers :
240
+
241
+ Grouper Objects
242
+ ~~~~~~~~~~~~~~~
243
+
244
+ Both ``groupby_bins `` and ``resample `` are specializations of the core ``groupby `` operation for binning,
245
+ and time resampling. Many problems demand more complex GroupBy application: for example, grouping by multiple
246
+ variables with a combination of categorical grouping, binning, and resampling; or more specializations like
247
+ spatial resampling; or more complex time grouping like special handling of seasons, or the ability to specify
248
+ custom seasons. To handle these use-cases and more, Xarray is evolving to providing an
249
+ extension point using ``Grouper `` objects.
250
+
251
+ .. tip ::
252
+
253
+ See the `grouper design `_ doc for more detail on the motivation and design ideas behind
254
+ Grouper objects.
255
+
256
+ .. _grouper design : https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md
257
+
258
+ For now Xarray provides three specialized Grouper objects:
259
+
260
+ 1. :py:class: `groupers.UniqueGrouper ` for categorical grouping
261
+ 2. :py:class: `groupers.BinGrouper ` for binned grouping
262
+ 3. :py:class: `groupers.TimeResampler ` for resampling along a datetime coordinate
263
+
264
+ These provide functionality identical to the existing ``groupby ``, ``groupby_bins ``, and ``resample `` methods.
265
+ That is,
266
+
267
+ .. code-block :: python
268
+
269
+ ds.groupby(" x" )
270
+
271
+ is identical to
272
+
273
+ .. code-block :: python
274
+
275
+ from xarray.groupers import UniqueGrouper
276
+
277
+ ds.groupby(x = UniqueGrouper())
278
+
279
+ ; and
280
+
281
+ .. code-block :: python
282
+
283
+ ds.groupby_bins(" x" , bins = bins)
284
+
285
+ is identical to
286
+
287
+ .. code-block :: python
288
+
289
+ from xarray.groupers import BinGrouper
290
+
291
+ ds.groupby(x = BinGrouper(bins))
292
+
293
+ and
294
+
295
+ .. code-block :: python
296
+
297
+ ds.resample(time = " ME" )
298
+
299
+ is identical to
300
+
301
+ .. code-block :: python
302
+
303
+ from xarray.groupers import TimeResampler
304
+
305
+ ds.resample(time = TimeResampler(" ME" ))
0 commit comments