-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
WIP: Feature/interpolate #1640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Feature/interpolate #1640
Changes from 6 commits
1582c1f
ab727e7
95006c4
4a4f6eb
42d63ef
263ec98
19d21b8
f937c07
8717e38
3d5c1b1
1864e8f
f58d464
1b93808
6f83b7b
33df6af
eafe67a
dd9fa8c
88d1569
3fb9261
37882b7
a04e83e
48505a5
20f957d
282bb65
a6fcb7f
2b0d9e1
d3220f3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1220,6 +1220,91 @@ def fillna(self, value): | |
out = ops.fillna(self, value) | ||
return out | ||
|
||
def interpolate_na(self, dim=None, method='linear', limit=None, | ||
use_coordinate=True, | ||
**kwargs): | ||
"""Interpolate values according to different methods. | ||
|
||
Parameters | ||
---------- | ||
dim : str | ||
Specifies the dimension along which to interpolate. | ||
method : {'linear', 'time', 'index', 'values', 'nearest'}, optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should probably some explanation about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good catch. we actually dropped the time/index/values methods in favor of a more explicit interface that uses |
||
String indicating which method to use for interpolation: | ||
|
||
- 'linear': linear interpolation (Default). Additional keyword | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you referring to which values Pandas uses for the index? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. pandas' docstring says
the difference of which may correspond to our Our |
||
arguments are passed to ``numpy.interp`` | ||
- 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', | ||
'polynomial': are passed to ``scipy.interpolate.interp1d``. If | ||
method=='polynomial', the ``order`` keyword argument must also be | ||
provided. | ||
- 'barycentric', 'krog', 'pchip', 'spline': use their respective | ||
``scipy.interpolate`` classes. | ||
use_coordinate : boolean or str, default True | ||
Specifies which index to use as the x values in the interpolation | ||
formulated as `y = f(x)`. If False, values are treated as if | ||
eqaully-spaced along `dim`. If True, the IndexVariable `dim` is | ||
used. If use_coordinate is a string, it specifies the name of a | ||
coordinate variariable to use as the index. | ||
limit : int, default None | ||
Maximum number of consecutive NaNs to fill. Must be greater than 0 | ||
or None for no limit. | ||
|
||
Returns | ||
------- | ||
DataArray | ||
|
||
See also | ||
-------- | ||
numpy.interp | ||
scipy.interpolate | ||
""" | ||
from .missing import interp_na | ||
return interp_na(self, dim=dim, method=method, limit=limit, | ||
use_coordinate=use_coordinate, **kwargs) | ||
|
||
def ffill(self, dim, limit=None): | ||
'''Fill NaN values by propogating values forward | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no need to change now, but FYI PEP8 is |
||
|
||
Parameters | ||
---------- | ||
dim : str | ||
Specifies the dimension along which to propogate values when | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. propogate -> propagate? |
||
filling. | ||
limit : int, default None | ||
The maximum number of consecutive NaN values to forward fill. In | ||
other words, if there is a gap with more than this number of | ||
consecutive NaNs, it will only be partially filled. Must be greater | ||
than 0 or None for no limit. | ||
|
||
Returns | ||
------- | ||
DataArray | ||
''' | ||
from .missing import ffill | ||
return ffill(self, dim, limit=limit) | ||
|
||
def bfill(self, dim, limit=None): | ||
'''Fill NaN values by propogating values backward | ||
|
||
Parameters | ||
---------- | ||
dim : str | ||
Specifies the dimension along which to propogate values when | ||
filling. | ||
limit : int, default None | ||
The maximum number of consecutive NaN values to backward fill. In | ||
other words, if there is a gap with more than this number of | ||
consecutive NaNs, it will only be partially filled. Must be greater | ||
than 0 or None for no limit. | ||
|
||
Returns | ||
------- | ||
DataArray | ||
''' | ||
from .missing import bfill | ||
return bfill(self, dim, limit=limit) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need bottleneck installed to use |
||
|
||
def combine_first(self, other): | ||
"""Combine two DataArray objects, with union of coordinates. | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,293 @@ | ||
from __future__ import absolute_import | ||
from __future__ import division | ||
from __future__ import print_function | ||
|
||
from collections import Iterable | ||
from functools import partial | ||
|
||
import numpy as np | ||
import pandas as pd | ||
|
||
|
||
from .computation import apply_ufunc | ||
from .utils import is_scalar | ||
|
||
|
||
class BaseInterpolator(object): | ||
'''gerneric interpolator class for normalizing interpolation methods''' | ||
cons_kwargs = {} | ||
call_kwargs = {} | ||
f = None | ||
kind = None | ||
|
||
def __init__(self, xi, yi, kind=None, **kwargs): | ||
self.kind = kind | ||
self.call_kwargs = kwargs | ||
|
||
def __call__(self, x): | ||
return self.f(x, **self.call_kwargs) | ||
|
||
def __repr__(self): | ||
return "{type}: kind={kind}".format(type=self.__class__.__name__, | ||
kind=self.kind) | ||
|
||
|
||
class NumpyInterpolator(BaseInterpolator): | ||
'''One-dimensional linear interpolation. | ||
|
||
See Also | ||
-------- | ||
numpy.interp | ||
''' | ||
def __init__(self, xi, yi, kind='linear', fill_value=None, **kwargs): | ||
self.kind = kind | ||
self.f = np.interp | ||
self.cons_kwargs = kwargs | ||
self.call_kwargs = {'period': self.cons_kwargs.pop('period', None)} | ||
|
||
self._xi = xi | ||
self._yi = yi | ||
|
||
if self.cons_kwargs: | ||
raise ValueError( | ||
'recieved invalid kwargs: %r' % self.cons_kwargs.keys()) | ||
|
||
if fill_value is None: | ||
self._left = np.nan | ||
self._right = yi[-1] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a little surprising to me -- I would expect both to default to NaN. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. I'm not sure if this is a Pandas bug or an intention feature. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense to have them both default to NaN in Xarray? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 let's use NaN here. |
||
elif isinstance(fill_value, Iterable) and len(fill_value) == 2: | ||
self._left = fill_value[0] | ||
self._right = fill_value[1] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this only works for a pair of values, let's restrict this to |
||
elif is_scalar(fill_value): | ||
self._left = fill_value | ||
self._right = fill_value | ||
else: | ||
raise ValueError('%s is not a valid fill_value' % fill_value) | ||
|
||
def __call__(self, x): | ||
return self.f(x, self._xi, self._yi, left=self._left, | ||
right=self._right, **self.call_kwargs) | ||
|
||
|
||
class ScipyInterpolator(BaseInterpolator): | ||
'''Interpolate a 1-D function using Scipy interp1d | ||
|
||
See Also | ||
-------- | ||
scipy.interpolate.interp1d | ||
''' | ||
def __init__(self, xi, yi, kind=None, fill_value=None, assume_sorted=True, | ||
copy=False, bounds_error=False, **kwargs): | ||
from scipy.interpolate import interp1d | ||
|
||
if kind is None: | ||
raise ValueError('kind is a required argument') | ||
|
||
if kind == 'polynomial': | ||
kind = kwargs.pop('order', None) | ||
if kind is None: | ||
raise ValueError('order is required when method=polynomial') | ||
|
||
self.kind = kind | ||
|
||
self.cons_kwargs = kwargs | ||
self.call_kwargs = {} | ||
|
||
if fill_value is None and kind == 'linear': | ||
fill_value = kwargs.pop('fill_value', (np.nan, yi[-1])) | ||
elif fill_value is None: | ||
fill_value = np.nan | ||
|
||
self.f = interp1d(xi, yi, kind=self.kind, fill_value=fill_value, | ||
bounds_error=False, **self.cons_kwargs) | ||
|
||
|
||
class SplineInterpolator(BaseInterpolator): | ||
'''One-dimensional smoothing spline fit to a given set of data points. | ||
|
||
See Also | ||
-------- | ||
scipy.interpolate.UnivariateSpline | ||
''' | ||
def __init__(self, xi, yi, kind=None, fill_value=None, order=3, **kwargs): | ||
from scipy.interpolate import UnivariateSpline | ||
|
||
if kind is None: | ||
raise ValueError('kind is a required argument') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what should be passed as |
||
|
||
self.kind = kind | ||
self.cons_kwargs = kwargs | ||
self.call_kwargs['nu'] = kwargs.pop('nu', 0) | ||
self.call_kwargs['ext'] = kwargs.pop('ext', None) | ||
|
||
if fill_value is not None: | ||
raise ValueError('SplineInterpolator does not support fill_value') | ||
|
||
self.f = UnivariateSpline(xi, yi, k=order, **self.cons_kwargs) | ||
|
||
|
||
def get_clean_interp_index(arr, dim, use_coordinate=True, **kwargs): | ||
'''get index to use for x values in interpolation. | ||
|
||
If use_coordinate is True, the coordinate that shares the name of the | ||
dimension along which interpolation is being performed will be used as the | ||
x values. | ||
|
||
If use_coordinate is False, the x values are set as an equally spaced | ||
sequence. | ||
''' | ||
|
||
if use_coordinate: | ||
if use_coordinate is True: | ||
index = arr.get_index(dim) | ||
else: | ||
index = arr.coords[use_coordinate] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to check There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can also raise an appropriate error if coordinate is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fujiisoup - do we have a handy way to tell if an coordinate is a xarray DataArray/Multindex? Its not a subset of isinstance(index, (pd.MultiIndex, xr.MultiIndexCoordinate)) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably the cleanest and most comprehensive thing to do is to try to always cast the interpolation index to |
||
if isinstance(index, pd.DatetimeIndex): | ||
index = index.values.astype(np.float64) | ||
# check index sorting now so we can skip it later | ||
if not (np.diff(index) > 0).all(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we require a monotonically-increasing index, can we pass |
||
raise ValueError("Index must be monotonicly increasing") | ||
|
||
else: | ||
axis = arr.get_axis_num(dim) | ||
index = np.arange(arr.shape[axis]) | ||
|
||
return index | ||
|
||
|
||
def interp_na(self, dim=None, use_coordinate=True, method='linear', limit=None, | ||
**kwargs): | ||
'''Interpolate values according to different methods.''' | ||
|
||
if dim is None: | ||
raise NotImplementedError('dim is a required argument') | ||
|
||
if limit is not None: | ||
valids = _get_valid_fill_mask(self, dim, limit) | ||
|
||
# method | ||
index = get_clean_interp_index(self, dim, use_coordinate=use_coordinate, | ||
**kwargs) | ||
interpolator = _get_interpolator(method, **kwargs) | ||
|
||
arr = apply_ufunc(interpolator, index, self, | ||
input_core_dims=[[dim], [dim]], | ||
output_core_dims=[[dim]], | ||
output_dtypes=[self.dtype], | ||
dask='parallelized', | ||
vectorize=True, | ||
keep_attrs=True).transpose(*self.dims) | ||
|
||
if limit is not None: | ||
arr = arr.where(valids) | ||
|
||
return arr | ||
|
||
|
||
def wrap_interpolator(interpolator, x, y, **kwargs): | ||
'''helper function to apply interpolation along 1 dimension''' | ||
# it would be nice if this wasn't necessary, works around: | ||
# "ValueError: assignment destination is read-only" in assignment below | ||
out = y.copy() | ||
|
||
nans = pd.isnull(y) | ||
nonans = ~nans | ||
|
||
# fast track for no-nans and all-nans cases | ||
n_nans = nans.sum() | ||
if n_nans == 0 or n_nans == len(y): | ||
return y | ||
|
||
f = interpolator(x[nonans], y[nonans], **kwargs) | ||
out[nans] = f(x[nans]) | ||
return out | ||
|
||
|
||
def _bfill(arr, n=None, axis=-1): | ||
'''inverse of ffill''' | ||
import bottleneck as bn | ||
|
||
arr = np.flip(arr, axis=axis) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this requires numpy 0.12 or later so I'll need to add a compat function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this was done in 19d21b8 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe |
||
|
||
# fill | ||
arr = bn.push(arr, axis=axis, n=n) | ||
|
||
# reverse back to original | ||
return np.flip(arr, axis=axis) | ||
|
||
|
||
def ffill(arr, dim=None, limit=None): | ||
'''forward fill missing values''' | ||
import bottleneck as bn | ||
|
||
axis = arr.get_axis_num(dim) | ||
|
||
# work around for bottleneck 178 | ||
_limit = limit if limit is not None else arr.shape[axis] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
|
||
return apply_ufunc(bn.push, arr, | ||
dask='parallelized', | ||
keep_attrs=True, | ||
output_dtypes=[arr.dtype], | ||
kwargs=dict(n=_limit, axis=axis)).transpose(*arr.dims) | ||
|
||
|
||
def bfill(arr, dim=None, limit=None): | ||
'''backfill missing values''' | ||
axis = arr.get_axis_num(dim) | ||
|
||
# work around for bottleneck 178 | ||
_limit = limit if limit is not None else arr.shape[axis] | ||
|
||
return apply_ufunc(_bfill, arr, | ||
dask='parallelized', | ||
keep_attrs=True, | ||
output_dtypes=[arr.dtype], | ||
kwargs=dict(n=_limit, axis=axis)).transpose(*arr.dims) | ||
|
||
|
||
def _get_interpolator(method, **kwargs): | ||
'''helper function to select the appropriate interpolator class | ||
|
||
returns a partial of wrap_interpolator | ||
''' | ||
interp1d_methods = ['linear', 'nearest', 'zero', 'slinear', 'quadratic', | ||
'cubic', 'polynomial'] | ||
valid_methods = interp1d_methods + ['barycentric', 'krog', 'pchip', | ||
'spline'] | ||
|
||
if (method == 'linear' and not | ||
kwargs.get('fill_value', None) == 'extrapolate'): | ||
kwargs.update(kind=method) | ||
interp_class = NumpyInterpolator | ||
elif method in valid_methods: | ||
try: | ||
from scipy import interpolate | ||
except ImportError: | ||
raise ImportError( | ||
'Interpolation with method `%s` requires scipy' % method) | ||
if method in interp1d_methods: | ||
kwargs.update(kind=method) | ||
interp_class = ScipyInterpolator | ||
elif method == 'barycentric': | ||
interp_class = interpolate.BarycentricInterpolator | ||
elif method == 'krog': | ||
interp_class = interpolate.KroghInterpolator | ||
elif method == 'pchip': | ||
interp_class = interpolate.PchipInterpolator | ||
elif method == 'spline': | ||
kwargs.update(kind=method) | ||
interp_class = SplineInterpolator | ||
else: | ||
raise ValueError('%s is not a valid scipy interpolator' % method) | ||
else: | ||
raise ValueError('%s is not a valid interpolator' % method) | ||
|
||
return partial(wrap_interpolator, interp_class, **kwargs) | ||
|
||
|
||
def _get_valid_fill_mask(arr, dim, limit): | ||
'''helper function to determine values that can be filled when limit is not | ||
None''' | ||
kw = {dim: limit + 1} | ||
return arr.isnull().rolling(min_periods=1, **kw).sum() <= limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably too late to be helpful - but are we sure about the name here? We don't generally add
_na
onto methods (bfill_na
?), and pandas isinterpolate
onlyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment from @shoyer above: #1640 (comment)