Skip to content

Commit c9a4205

Browse files
committed
Merge remote-tracking branch 'upstream/master' into fix-sparse
* upstream/master: Improve interp performance (pydata#4069) Auto chunk (pydata#4064) xr.cov() and xr.corr() (pydata#4089) allow multiindex levels in plots (pydata#3938) Fix bool weights (pydata#4075) fix dangerous default arguments (pydata#4006)
2 parents ba7b47a + d1f7cb8 commit c9a4205

18 files changed

+598
-50
lines changed

doc/api.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ Top-level functions
2929
full_like
3030
zeros_like
3131
ones_like
32+
cov
33+
corr
3234
dot
3335
polyval
3436
map_blocks

doc/plotting.rst

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ labels can also be used to easily create informative plots.
1313
xarray's plotting capabilities are centered around
1414
:py:class:`DataArray` objects.
1515
To plot :py:class:`Dataset` objects
16-
simply access the relevant DataArrays, ie ``dset['var1']``.
16+
simply access the relevant DataArrays, i.e. ``dset['var1']``.
1717
Dataset specific plotting routines are also available (see :ref:`plot-dataset`).
1818
Here we focus mostly on arrays 2d or larger. If your data fits
1919
nicely into a pandas DataFrame then you're better off using one of the more
@@ -209,6 +209,44 @@ entire figure (as for matplotlib's ``figsize`` argument).
209209

210210
.. _plotting.multiplelines:
211211

212+
=========================
213+
Determine x-axis values
214+
=========================
215+
216+
Per default dimension coordinates are used for the x-axis (here the time coordinates).
217+
However, you can also use non-dimension coordinates, MultiIndex levels, and dimensions
218+
without coordinates along the x-axis. To illustrate this, let's calculate a 'decimal day' (epoch)
219+
from the time and assign it as a non-dimension coordinate:
220+
221+
.. ipython:: python
222+
223+
decimal_day = (air1d.time - air1d.time[0]) / pd.Timedelta('1d')
224+
air1d_multi = air1d.assign_coords(decimal_day=("time", decimal_day))
225+
air1d_multi
226+
227+
To use ``'decimal_day'`` as x coordinate it must be explicitly specified:
228+
229+
.. ipython:: python
230+
231+
air1d_multi.plot(x="decimal_day")
232+
233+
Creating a new MultiIndex named ``'date'`` from ``'time'`` and ``'decimal_day'``,
234+
it is also possible to use a MultiIndex level as x-axis:
235+
236+
.. ipython:: python
237+
238+
air1d_multi = air1d_multi.set_index(date=("time", "decimal_day"))
239+
air1d_multi.plot(x="decimal_day")
240+
241+
Finally, if a dataset does not have any coordinates it enumerates all data points:
242+
243+
.. ipython:: python
244+
245+
air1d_multi = air1d_multi.drop("date")
246+
air1d_multi.plot()
247+
248+
The same applies to 2D plots below.
249+
212250
====================================================
213251
Multiple lines showing variation along a dimension
214252
====================================================

doc/whats-new.rst

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,21 @@ Breaking changes
3434
(:pull:`3274`)
3535
By `Elliott Sales de Andrade <https://github.com/QuLogic>`_
3636

37+
Enhancements
38+
~~~~~~~~~~~~
39+
- Performance improvement of :py:meth:`DataArray.interp` and :py:func:`Dataset.interp`
40+
For orthogonal linear- and nearest-neighbor interpolation, we do 1d-interpolation sequentially
41+
rather than interpolating in multidimensional space. (:issue:`2223`)
42+
By `Keisuke Fujii <https://github.com/fujiisoup>`_.
43+
3744
New Features
3845
~~~~~~~~~~~~
46+
47+
- ``chunks='auto'`` is now supported in the ``chunks`` argument of
48+
:py:meth:`Dataset.chunk`. (:issue:`4055`)
49+
By `Andrew Williams <https://github.com/AndrewWilliams3142>`_
50+
- Added :py:func:`xarray.cov` and :py:func:`xarray.corr` (:issue:`3784`, :pull:`3550`, :pull:`4089`).
51+
By `Andrew Williams <https://github.com/AndrewWilliams3142>`_ and `Robin Beer <https://github.com/r-beer>`_.
3952
- Added :py:meth:`DataArray.polyfit` and :py:func:`xarray.polyval` for fitting polynomials. (:issue:`3349`)
4053
By `Pascal Bourgault <https://github.com/aulemahal>`_.
4154
- Control over attributes of result in :py:func:`merge`, :py:func:`concat`,
@@ -63,6 +76,8 @@ New Features
6376
By `Stephan Hoyer <https://github.com/shoyer>`_.
6477
- Allow plotting of boolean arrays. (:pull:`3766`)
6578
By `Marek Jacob <https://github.com/MeraX>`_
79+
- Enable using MultiIndex levels as cordinates in 1D and 2D plots (:issue:`3927`).
80+
By `Mathias Hauser <https://github.com/mathause>`_.
6681
- A ``days_in_month`` accessor for :py:class:`xarray.CFTimeIndex`, analogous to
6782
the ``days_in_month`` accessor for a :py:class:`pandas.DatetimeIndex`, which
6883
returns the days in the month each datetime in the index. Now days in month
@@ -121,6 +136,8 @@ Bug fixes
121136
- Fix bug in time parsing failing to fall back to cftime. This was causing time
122137
variables with a time unit of `'msecs'` to fail to parse. (:pull:`3998`)
123138
By `Ryan May <https://github.com/dopplershift>`_.
139+
- Fix weighted mean when passing boolean weights (:issue:`4074`).
140+
By `Mathias Hauser <https://github.com/mathause>`_.
124141
- Fix html repr in untrusted notebooks: fallback to plain text repr. (:pull:`4053`)
125142
By `Benoit Bovy <https://github.com/benbovy>`_.
126143

@@ -188,7 +205,7 @@ New Features
188205

189206
- Weighted array reductions are now supported via the new :py:meth:`DataArray.weighted`
190207
and :py:meth:`Dataset.weighted` methods. See :ref:`comput.weighted`. (:issue:`422`, :pull:`2922`).
191-
By `Mathias Hauser <https://github.com/mathause>`_
208+
By `Mathias Hauser <https://github.com/mathause>`_.
192209
- The new jupyter notebook repr (``Dataset._repr_html_`` and
193210
``DataArray._repr_html_``) (introduced in 0.14.1) is now on by default. To
194211
disable, use ``xarray.set_options(display_style="text")``.

xarray/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
from .core.alignment import align, broadcast
1818
from .core.combine import auto_combine, combine_by_coords, combine_nested
1919
from .core.common import ALL_DIMS, full_like, ones_like, zeros_like
20-
from .core.computation import apply_ufunc, dot, polyval, where
20+
from .core.computation import apply_ufunc, corr, cov, dot, polyval, where
2121
from .core.concat import concat
2222
from .core.dataarray import DataArray
2323
from .core.dataset import Dataset
@@ -54,6 +54,8 @@
5454
"concat",
5555
"decode_cf",
5656
"dot",
57+
"cov",
58+
"corr",
5759
"full_like",
5860
"load_dataarray",
5961
"load_dataset",

xarray/core/computation.py

Lines changed: 179 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
import numpy as np
2525

2626
from . import dtypes, duck_array_ops, utils
27-
from .alignment import deep_align
27+
from .alignment import align, deep_align
2828
from .merge import merge_coordinates_without_align
2929
from .options import OPTIONS
3030
from .pycompat import dask_array_type
@@ -1069,6 +1069,184 @@ def earth_mover_distance(first_samples,
10691069
return apply_array_ufunc(func, *args, dask=dask)
10701070

10711071

1072+
def cov(da_a, da_b, dim=None, ddof=1):
1073+
"""
1074+
Compute covariance between two DataArray objects along a shared dimension.
1075+
1076+
Parameters
1077+
----------
1078+
da_a: DataArray object
1079+
Array to compute.
1080+
da_b: DataArray object
1081+
Array to compute.
1082+
dim : str, optional
1083+
The dimension along which the covariance will be computed
1084+
ddof: int, optional
1085+
If ddof=1, covariance is normalized by N-1, giving an unbiased estimate,
1086+
else normalization is by N.
1087+
1088+
Returns
1089+
-------
1090+
covariance: DataArray
1091+
1092+
See also
1093+
--------
1094+
pandas.Series.cov: corresponding pandas function
1095+
xr.corr: respective function to calculate correlation
1096+
1097+
Examples
1098+
--------
1099+
>>> da_a = DataArray(np.array([[1, 2, 3], [0.1, 0.2, 0.3], [3.2, 0.6, 1.8]]),
1100+
... dims=("space", "time"),
1101+
... coords=[('space', ['IA', 'IL', 'IN']),
1102+
... ('time', pd.date_range("2000-01-01", freq="1D", periods=3))])
1103+
>>> da_a
1104+
<xarray.DataArray (space: 3, time: 3)>
1105+
array([[1. , 2. , 3. ],
1106+
[0.1, 0.2, 0.3],
1107+
[3.2, 0.6, 1.8]])
1108+
Coordinates:
1109+
* space (space) <U2 'IA' 'IL' 'IN'
1110+
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
1111+
>>> da_a = DataArray(np.array([[0.2, 0.4, 0.6], [15, 10, 5], [3.2, 0.6, 1.8]]),
1112+
... dims=("space", "time"),
1113+
... coords=[('space', ['IA', 'IL', 'IN']),
1114+
... ('time', pd.date_range("2000-01-01", freq="1D", periods=3))])
1115+
>>> da_b
1116+
<xarray.DataArray (space: 3, time: 3)>
1117+
array([[ 0.2, 0.4, 0.6],
1118+
[15. , 10. , 5. ],
1119+
[ 3.2, 0.6, 1.8]])
1120+
Coordinates:
1121+
* space (space) <U2 'IA' 'IL' 'IN'
1122+
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
1123+
>>> xr.cov(da_a, da_b)
1124+
<xarray.DataArray ()>
1125+
array(-3.53055556)
1126+
>>> xr.cov(da_a, da_b, dim='time')
1127+
<xarray.DataArray (space: 3)>
1128+
array([ 0.2, -0.5, 1.69333333])
1129+
Coordinates:
1130+
* space (space) <U2 'IA' 'IL' 'IN'
1131+
"""
1132+
from .dataarray import DataArray
1133+
1134+
if any(not isinstance(arr, DataArray) for arr in [da_a, da_b]):
1135+
raise TypeError(
1136+
"Only xr.DataArray is supported."
1137+
"Given {}.".format([type(arr) for arr in [da_a, da_b]])
1138+
)
1139+
1140+
return _cov_corr(da_a, da_b, dim=dim, ddof=ddof, method="cov")
1141+
1142+
1143+
def corr(da_a, da_b, dim=None):
1144+
"""
1145+
Compute the Pearson correlation coefficient between
1146+
two DataArray objects along a shared dimension.
1147+
1148+
Parameters
1149+
----------
1150+
da_a: DataArray object
1151+
Array to compute.
1152+
da_b: DataArray object
1153+
Array to compute.
1154+
dim: str, optional
1155+
The dimension along which the correlation will be computed
1156+
1157+
Returns
1158+
-------
1159+
correlation: DataArray
1160+
1161+
See also
1162+
--------
1163+
pandas.Series.corr: corresponding pandas function
1164+
xr.cov: underlying covariance function
1165+
1166+
Examples
1167+
--------
1168+
>>> da_a = DataArray(np.array([[1, 2, 3], [0.1, 0.2, 0.3], [3.2, 0.6, 1.8]]),
1169+
... dims=("space", "time"),
1170+
... coords=[('space', ['IA', 'IL', 'IN']),
1171+
... ('time', pd.date_range("2000-01-01", freq="1D", periods=3))])
1172+
>>> da_a
1173+
<xarray.DataArray (space: 3, time: 3)>
1174+
array([[1. , 2. , 3. ],
1175+
[0.1, 0.2, 0.3],
1176+
[3.2, 0.6, 1.8]])
1177+
Coordinates:
1178+
* space (space) <U2 'IA' 'IL' 'IN'
1179+
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
1180+
>>> da_a = DataArray(np.array([[0.2, 0.4, 0.6], [15, 10, 5], [3.2, 0.6, 1.8]]),
1181+
... dims=("space", "time"),
1182+
... coords=[('space', ['IA', 'IL', 'IN']),
1183+
... ('time', pd.date_range("2000-01-01", freq="1D", periods=3))])
1184+
>>> da_b
1185+
<xarray.DataArray (space: 3, time: 3)>
1186+
array([[ 0.2, 0.4, 0.6],
1187+
[15. , 10. , 5. ],
1188+
[ 3.2, 0.6, 1.8]])
1189+
Coordinates:
1190+
* space (space) <U2 'IA' 'IL' 'IN'
1191+
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
1192+
>>> xr.corr(da_a, da_b)
1193+
<xarray.DataArray ()>
1194+
array(-0.57087777)
1195+
>>> xr.corr(da_a, da_b, dim='time')
1196+
<xarray.DataArray (space: 3)>
1197+
array([ 1., -1., 1.])
1198+
Coordinates:
1199+
* space (space) <U2 'IA' 'IL' 'IN'
1200+
"""
1201+
from .dataarray import DataArray
1202+
1203+
if any(not isinstance(arr, DataArray) for arr in [da_a, da_b]):
1204+
raise TypeError(
1205+
"Only xr.DataArray is supported."
1206+
"Given {}.".format([type(arr) for arr in [da_a, da_b]])
1207+
)
1208+
1209+
return _cov_corr(da_a, da_b, dim=dim, method="corr")
1210+
1211+
1212+
def _cov_corr(da_a, da_b, dim=None, ddof=0, method=None):
1213+
"""
1214+
Internal method for xr.cov() and xr.corr() so only have to
1215+
sanitize the input arrays once and we don't repeat code.
1216+
"""
1217+
# 1. Broadcast the two arrays
1218+
da_a, da_b = align(da_a, da_b, join="inner", copy=False)
1219+
1220+
# 2. Ignore the nans
1221+
valid_values = da_a.notnull() & da_b.notnull()
1222+
1223+
if not valid_values.all():
1224+
da_a = da_a.where(valid_values)
1225+
da_b = da_b.where(valid_values)
1226+
1227+
valid_count = valid_values.sum(dim) - ddof
1228+
1229+
# 3. Detrend along the given dim
1230+
demeaned_da_a = da_a - da_a.mean(dim=dim)
1231+
demeaned_da_b = da_b - da_b.mean(dim=dim)
1232+
1233+
# 4. Compute covariance along the given dim
1234+
# N.B. `skipna=False` is required or there is a bug when computing
1235+
# auto-covariance. E.g. Try xr.cov(da,da) for
1236+
# da = xr.DataArray([[1, 2], [1, np.nan]], dims=["x", "time"])
1237+
cov = (demeaned_da_a * demeaned_da_b).sum(dim=dim, skipna=False) / (valid_count)
1238+
1239+
if method == "cov":
1240+
return cov
1241+
1242+
else:
1243+
# compute std + corr
1244+
da_a_std = da_a.std(dim=dim)
1245+
da_b_std = da_b.std(dim=dim)
1246+
corr = cov / (da_a_std * da_b_std)
1247+
return corr
1248+
1249+
10721250
def dot(*arrays, dims=None, **kwargs):
10731251
"""Generalized dot product for xarray objects. Like np.einsum, but
10741252
provides a simpler interface based on array dimensions.

xarray/core/dataset.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1707,7 +1707,10 @@ def chunks(self) -> Mapping[Hashable, Tuple[int, ...]]:
17071707
def chunk(
17081708
self,
17091709
chunks: Union[
1710-
None, Number, Mapping[Hashable, Union[None, Number, Tuple[Number, ...]]]
1710+
None,
1711+
Number,
1712+
str,
1713+
Mapping[Hashable, Union[None, Number, str, Tuple[Number, ...]]],
17111714
] = None,
17121715
name_prefix: str = "xarray-",
17131716
token: str = None,
@@ -1725,7 +1728,7 @@ def chunk(
17251728
17261729
Parameters
17271730
----------
1728-
chunks : int or mapping, optional
1731+
chunks : int, 'auto' or mapping, optional
17291732
Chunk sizes along each dimension, e.g., ``5`` or
17301733
``{'x': 5, 'y': 5}``.
17311734
name_prefix : str, optional
@@ -1742,7 +1745,7 @@ def chunk(
17421745
"""
17431746
from dask.base import tokenize
17441747

1745-
if isinstance(chunks, Number):
1748+
if isinstance(chunks, (Number, str)):
17461749
chunks = dict.fromkeys(self.dims, chunks)
17471750

17481751
if chunks is not None:

xarray/core/missing.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -619,6 +619,19 @@ def interp(var, indexes_coords, method, **kwargs):
619619
# default behavior
620620
kwargs["bounds_error"] = kwargs.get("bounds_error", False)
621621

622+
# check if the interpolation can be done in orthogonal manner
623+
if (
624+
len(indexes_coords) > 1
625+
and method in ["linear", "nearest"]
626+
and all(dest[1].ndim == 1 for dest in indexes_coords.values())
627+
and len(set([d[1].dims[0] for d in indexes_coords.values()]))
628+
== len(indexes_coords)
629+
):
630+
# interpolate sequentially
631+
for dim, dest in indexes_coords.items():
632+
var = interp(var, {dim: dest}, method, **kwargs)
633+
return var
634+
622635
# target dimensions
623636
dims = list(indexes_coords)
624637
x, new_x = zip(*[indexes_coords[d] for d in dims])
@@ -659,7 +672,7 @@ def interp_func(var, x, new_x, method, kwargs):
659672
New coordinates. Should not contain NaN.
660673
method: string
661674
{'linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic'} for
662-
1-dimensional itnterpolation.
675+
1-dimensional interpolation.
663676
{'linear', 'nearest'} for multidimensional interpolation
664677
**kwargs:
665678
Optional keyword arguments to be passed to scipy.interpolator

xarray/core/weighted.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,14 @@ def _sum_of_weights(
142142
# we need to mask data values that are nan; else the weights are wrong
143143
mask = da.notnull()
144144

145-
sum_of_weights = self._reduce(mask, self.weights, dim=dim, skipna=False)
145+
# bool -> int, because ``xr.dot([True, True], [True, True])`` -> True
146+
# (and not 2); GH4074
147+
if self.weights.dtype == bool:
148+
sum_of_weights = self._reduce(
149+
mask, self.weights.astype(int), dim=dim, skipna=False
150+
)
151+
else:
152+
sum_of_weights = self._reduce(mask, self.weights, dim=dim, skipna=False)
146153

147154
# 0-weights are not valid
148155
valid_weights = sum_of_weights != 0.0

0 commit comments

Comments
 (0)