-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
WIP: Feature/interpolate #1640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Feature/interpolate #1640
Conversation
xarray/core/interpolate.py
Outdated
raise ValueError("Index must be monotonicly increasing") | ||
|
||
# TODO: consider using the np_interp_wrapper for linear method | ||
out = np.apply_along_axis(scipy_interp_wrapper, axis, arr, index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think scipy.interpolate.interp1d
can interpolate a multiple dimensional array along one axis, though scipy is an optional requirement.
(But I did not check it is certainly faster.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think scipy.interpolate.interp1d
can interpolate a multiple dimensional array along one axis, though scipy is an optional requirement.
(But I did not check it is certainly faster.)
EDIT:
I noticed that I did not understand correctly.
The multi-dimensional interpolation does not work for filling the missing values which are distributed randomly.
So ignore my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, exactly. For 1 and 2d arrays, this should behave identically to the pandas interpolate method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe apply_ufunc with vectorize=True would work here? I guess it would reorder dimensions in a non-desired way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is on my list of things to try today. Getting things to behave like pandas required a bit of dancing and I've saved the obvious optimizations for later. Specifically, I want to get some more tests written first.
xarray/core/interpolate.py
Outdated
raise ValueError("Index must be monotonicly increasing") | ||
|
||
# TODO: consider using the np_interp_wrapper for linear method | ||
out = np.apply_along_axis(scipy_interp_wrapper, axis, arr, index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe apply_ufunc with vectorize=True would work here? I guess it would reorder dimensions in a non-desired way.
xarray/core/interpolate.py
Outdated
@@ -0,0 +1,95 @@ | |||
"""Define core operations for xarray objects. | |||
|
|||
TODO(shoyer): rewrite this module, making use of xarray.core.computation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO note doesn't belong here :)
xarray/core/dataarray.py
Outdated
@@ -1175,6 +1175,35 @@ def fillna(self, value): | |||
out = ops.fillna(self, value) | |||
return out | |||
|
|||
def interpolate(self, dim=None, method='linear', inplace=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like calling the interpolate_na
rather than interpolate
to make it clear what this method does. See here for the related pandas discussion: pandas-dev/pandas#9340
xarray/core/missing.py
Outdated
output_core_dims=[[dim]], | ||
dask='forbidden', | ||
vectorize=True, | ||
keep_attrs=True).transpose(*arr.dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer - can you take a look at this? About half of my tests are passing but I suspect I'm still missing something in the apply_ufunc
call.
I'm getting a few errors that look like this:
ValueError: inconsistent size for core dimension 'dim0': 1 vs 6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem here is that you are treating your interpolator
as an argument to vectorize over. This might technically work (by interpreting interpolator
as a scalar), but was certainly was not intended.
Possibly we should have a more informative error message here.
There are various ways you can fix this (using either a class or function returned by a function), but either way probably want the way you use apply_ufunc
to be something more like apply_ufunc(wrap_interpolator(interpolator), index, arr, ...)
.
@jhamman Would it be possible to have a Quoting http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.interpolate.html
I suspect that a lot of the time, users want to fill in "small" gaps but not "large" ones. It is what I usually do. |
@dcherian - Yes, we can do that. Just haven't gotten there yet. |
xarray/core/missing.py
Outdated
self._right = yi[-1] | ||
elif isinstance(fill_value, Iterable): | ||
self._left = fill_value[0] | ||
self._right = fill_value[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this only works for a pair of values, let's restrict this to isintance(fill_value, tuple)
and explicitly check that it has length 2.
xarray/core/missing.py
Outdated
|
||
if fill_value is None: | ||
self._left = np.nan | ||
self._right = yi[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little surprising to me -- I would expect both to default to NaN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'm not sure if this is a Pandas bug or an intention feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have them both default to NaN in Xarray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 let's use NaN here.
xarray/core/missing.py
Outdated
def get_clean_interp_args(arr, method, dim, **kwargs): | ||
'''normalize interolation arguments and choose which index to use | ||
|
||
provides compatability with pandas interolation methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interolation -> interpolation
xarray/core/missing.py
Outdated
elif method == 'time': | ||
index = arr.get_index(dim) | ||
method = 'linear' | ||
assert isinstance(index, pd.DatetimeIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably raise ValueError instead
xarray/core/dataarray.py
Outdated
specify and order (int) e.g. da.interpolate_na(method='polynomial', order=4) | ||
|
||
# TODO: Pandas also support: scipy wrappers of 'krogh', | ||
# 'piecewise_polynomial', 'spline', and 'pchip' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I find this interface from pandas a little confusing. It combines two separate choices in one argument:
- Choosing the points at which interpolation is done.
- How to interpolate between these points.
I would slightly rather that we had two separate arguments. Maybe:
points='coordinate'
vspoints='equal_spacing'
or a booleanuse_coordinate=True
/use_coordinate=False
method='nearest'
,method='linear'
,method='quadratic'
, etc, for indicating how to interpolate.
The downside is that this would be inconsistent with pandas. But in the existing interface, it is not at all obvious to me whether a method like 'polynomial'
uses coordinate points or equally spaced points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree. As I've been working on this I've been thinking we can improve on the Pandas interface and this would be the obvious way to do that.
xarray/core/missing.py
Outdated
elif method == 'spline': | ||
return SplineInterpolator | ||
except ImportError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be more useful to raise an error about how scipy is needed for the given interpolator rather than a generic error message.
xarray/core/missing.py
Outdated
output_core_dims=[[dim]], | ||
dask='forbidden', | ||
vectorize=True, | ||
keep_attrs=True).transpose(*arr.dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem here is that you are treating your interpolator
as an argument to vectorize over. This might technically work (by interpreting interpolator
as a scalar), but was certainly was not intended.
Possibly we should have a more informative error message here.
There are various ways you can fix this (using either a class or function returned by a function), but either way probably want the way you use apply_ufunc
to be something more like apply_ufunc(wrap_interpolator(interpolator), index, arr, ...)
.
xarray/core/missing.py
Outdated
input_core_dims=[[], [dim], [dim]], | ||
output_core_dims=[[dim]], | ||
dask='forbidden', | ||
vectorize=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the scipy.interpolate.interp1d
is already vectorized in the right way:
https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.interpolate.interp1d.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can only use interp1d on 1D slices. The key point here is that we should be able to interpolate at different index positions for different slices. I don't think interp1d
will let us do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. That would be useful for the interpolate_at
variant.
xarray/core/missing.py
Outdated
input_core_dims=[[], [dim], [dim]], | ||
output_core_dims=[[dim]], | ||
dask='forbidden', | ||
vectorize=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. That would be useful for the interpolate_at
variant.
xarray/core/missing.py
Outdated
interpolator, index, arr, kwargs=kwargs, | ||
input_core_dims=[[], [dim], [dim]], | ||
output_core_dims=[[dim]], | ||
dask='forbidden', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dask='parallelized'
would work here -- I tested it with vectorize=True
.
first working version of interpolate method/module pep8 after merge master update interpolate, more pandas compat tests add interpolate_na to api encoding fix for py2 in dataarray.py working... checkin, roughed in interpolator classes move tests and some mods to apply_ufunc usage in interp_na add method to kwargs fixes for scipy and some docs cleanup scipy vs numpy interpolator selection cleanups add limit to interpolate_na bfill/ffill parallelized new interface with use_coordinate kwarg use partial function to wrap interpolate class a few fixes for ffill/bfill, placeholder for interpolate_at method add some tests fix test
xarray/core/missing.py
Outdated
output_sizes=dict(zip(arr.dims, arr.shape)), | ||
dask='parallelized', | ||
vectorize=True, | ||
keep_attrs=True).transpose(*arr.dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer - I finally got the apply_ufunc
to work. Turns out I just had a bug in wrap_interpolator
Nonetheless, I am now using a partial function to simplify this call.
xarray/core/missing.py
Outdated
raise NotImplementedError('dim is a required argument') | ||
|
||
if limit is not None: | ||
valids = _get_valid_fill_mask(arr, dim, limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dcherian - the limit option is now supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. Thanks!
xarray/core/dataarray.py
Outdated
order given both 'polynomial' and 'spline' require that you also | ||
specify and order (int) e.g. da.interpolate_na(method='polynomial', | ||
order=4) | ||
use_coordinate : boolean, default True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a description.
xarray/core/dataarray.py
Outdated
Specifies the dimension along which to interpolate. | ||
method : {'linear', 'time', 'index', 'values', 'nearest'} | ||
'linear': ignore the index and treat the values as equally | ||
spaced. default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor point, but can we format this a little more consistently?
The compat
arg on the Dataset constructor gets formatted nicely into a list by Sphinx and is still pretty readable as plain text:
Lines 342 to 350 in 0818a36
compat : {'broadcast_equals', 'equals', 'identical'}, optional | |
String indicating how to compare variables of the same name for | |
potential conflicts when initializing this dataset: | |
- 'broadcast_equals': all values must be equal when variables are | |
broadcast against each other to ensure common dimensions. | |
- 'equals': all values and dimensions must be the same. | |
- 'identical': all values, dimensions and attributes must be the | |
same. |
xarray/core/dataarray.py
Outdated
specify and order (int) e.g. da.interpolate_na(method='polynomial', | ||
order=4) | ||
use_coordinate : boolean, default True | ||
limit : limit : int, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit :
is repeated twice
xarray/core/dataarray.py
Outdated
order=4) | ||
use_coordinate : boolean, default True | ||
limit : limit : int, default None | ||
Maximum number of consecutive NaNs to fill. Must be greater than 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add "or None for no limit"
xarray/core/dataarray.py
Outdated
limit=None, **kwargs): | ||
# this is just here so I remember the signature we discussed | ||
# dim: the dimension along which to interpolate | ||
# locs: a broadcastable boolean mask describing where interpolation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean mask? I'm not quite sure how that would make sense. My immediate thought would be that locs
should be a 1d array of coordinate values.
xarray/core/missing.py
Outdated
keep_attrs=True).transpose(*self.dims) | ||
|
||
if limit is not None: | ||
print(valids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove :)
xarray/tests/test_missing.py
Outdated
|
||
result = da.ffill('time', limit=1) | ||
expected = DataArray( | ||
[0, 0, np.nan, np.nan, np.nan, 3, 4, 5, 5, 6, 7], dims='time') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add a couple of tests using dask.
xarray/core/missing.py
Outdated
'''inverse of ffill''' | ||
import bottleneck as bn | ||
|
||
arr = np.flip(arr, axis=axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this requires numpy 0.12 or later so I'll need to add a compat function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was done in 19d21b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe npcompat.flip
?
And then numpy=1.11
is also supported?
This is ready for some more reviews. I'm seeing some failing tests on the py27-min build that I'm struggling to detangle at the moment but I'm pretty sure they're just pytest related. cc @darothen - I'd particularly appreciate your thoughts on the interpolators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature addition is quite great :)
Can we have interpolate_na
, ffill
, bfil
also for xr.Dataset
?
drop_na
is in Dataset
.
xarray/core/dataarray.py
Outdated
---------- | ||
dim : str | ||
Specifies the dimension along which to interpolate. | ||
method : {'linear', 'time', 'index', 'values', 'nearest'}, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably some explanation about method='time'
option is necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. we actually dropped the time/index/values methods in favor of a more explicit interface that uses use_coordinate
.
xarray/core/missing.py
Outdated
from scipy.interpolate import UnivariateSpline | ||
|
||
if kind is None: | ||
raise ValueError('kind is a required argument') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what should be passed as kind
?
xarray/core/missing.py
Outdated
if isinstance(index, pd.DatetimeIndex): | ||
index = index.values.astype(np.float64) | ||
# check index sorting now so we can skip it later | ||
if not (np.diff(index) > 0).all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we require a monotonically-increasing index, can we pass assume_sorted=True
to ScipyInterpolator
?
if use_coordinate is True: | ||
index = arr.get_index(dim) | ||
else: | ||
index = arr.coords[use_coordinate] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check arr.coords[use_coordinate]
is certainly 1-dimensional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can also raise an appropriate error if coordinate is MultiIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fujiisoup - do we have a handy way to tell if an coordinate is a xarray DataArray/Multindex? Its not a subset of pd.Multindex
but I need something like:
isinstance(index, (pd.MultiIndex, xr.MultiIndexCoordinate))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the cleanest and most comprehensive thing to do is to try to always cast the interpolation index to float64
, while catching any error messages (to replace them with something more informative). When a MultiIndex
is cast to numpy array it ends up with object dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comment :)
xarray/core/dataarray.py
Outdated
Parameters | ||
---------- | ||
dim : str | ||
Specifies the dimension along which to propogate values when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
propogate -> propagate?
xarray/core/missing.py
Outdated
**kwargs): | ||
from scipy.interpolate import UnivariateSpline | ||
|
||
if method is not 'spline': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if method is not 'spline':
-> if method != 'spline':
DataArray | ||
''' | ||
from .missing import bfill | ||
return bfill(self, dim, limit=limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need bottleneck installed to use bfill
or ffill
? Maybe it should be noted in docstrings.
xarray/tests/test_missing.py
Outdated
@pytest.mark.parametrize('frac_nan', [0, 0.5, 1]) | ||
@pytest.mark.parametrize('method', ['time', 'index', 'values', 'linear', | ||
'nearest', 'zero', 'slinear', | ||
'quadratic', 'cubic']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to test also for 'barycentric', 'krog', 'pchip', 'spline' methods?
xarray/core/dataarray.py
Outdated
Specifies the dimension along which to interpolate. | ||
method : {'linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', | ||
'polynomial', 'barycentric', 'krog', 'pchip', | ||
'spline'}, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pandas seems to support Akima
.
Maybe we can also add this?
Scipy provides the similar interface.
xarray/core/dataarray.py
Outdated
'spline'}, optional | ||
String indicating which method to use for interpolation: | ||
|
||
- 'linear': linear interpolation (Default). Additional keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed pandas.interpolate(method='linear')
behaves differently.
Though I think our definition is more natural, we can note this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to which values Pandas uses for the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
pandas' docstring says
‘linear’: ignore the index and treat the values as equally spaced.
‘index’: use the actual numerical values of the index.
the difference of which may correspond to our use_coordinate
option [False/True].
Our linear
option means a linear interpolation.
I've just thought the same name with different meanings is sometimes confusing.
It is a very trivial point (and maybe a kind of overconcern).
if use_coordinate is True: | ||
index = arr.get_index(dim) | ||
else: | ||
index = arr.coords[use_coordinate] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can also raise an appropriate error if coordinate is MultiIndex
@fujiisoup - care to take another pass here? I think this is done but would appreciate a few more reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more comments.
Maybe I miss something but we require numpy=1.12 because of np.flip
?
doc/computation.rst
Outdated
.. ipython:: python | ||
|
||
x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=['x'], | ||
coords={'xx': [0, 1, 1.1, 1.9, 3]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here is a typo. Maybe
x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=['x'],
coords={'xx': ('x', [0, 1, 1.1, 1.9, 3])})
or
x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=['x'],
coords={'x': [0, 1, 1.1, 1.9, 3]})
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really?
I got
ValueError: coordinate xx has dimensions ('xx',), but these are not a subset of the DataArray dimensions ['x']
from this script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were right. I was missing the Variable constructor.
xarray/core/missing.py
Outdated
'''inverse of ffill''' | ||
import bottleneck as bn | ||
|
||
arr = np.flip(arr, axis=axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe npcompat.flip
?
And then numpy=1.11
is also supported?
This looks good to me. I will merge it in this weekend if there are no further comments. |
use_coordinate=use_coordinate, **kwargs) | ||
|
||
def ffill(self, dim, limit=None): | ||
'''Fill NaN values by propogating values forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to change now, but FYI PEP8 is """
on its own line IIRC
@@ -1228,6 +1228,97 @@ def fillna(self, value): | |||
out = ops.fillna(self, value) | |||
return out | |||
|
|||
def interpolate_na(self, dim=None, method='linear', limit=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably too late to be helpful - but are we sure about the name here? We don't generally add _na
onto methods (bfill_na
?), and pandas is interpolate
only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment from @shoyer above: #1640 (comment)
self.f = UnivariateSpline(xi, yi, k=order, **self.cons_kwargs) | ||
|
||
|
||
def _apply_over_vars_with_dim(func, self, dim=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! Could we use this elsewhere in the future? Or even put it on Dataset
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, certainly. I'm not 100% convinced we don't already have something like this but I didn't seem to find it. @shoyer, do you have thoughts on this?
xarray/core/missing.py
Outdated
'''wrapper for datasets''' | ||
from .dataset import Dataset | ||
|
||
ds = Dataset(coords=self.coords, attrs=self.attrs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If people are subclassing Dataset
(we do internally), this will void their subclass. I think generally we should use type(self)
or self.__class__
axis = arr.get_axis_num(dim) | ||
|
||
# work around for bottleneck 178 | ||
_limit = limit if limit is not None else arr.shape[axis] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
xarray/tests/test_missing.py
Outdated
for dim in ['time', 'x']: | ||
actual = da.interpolate_na(method=method, dim=dim) | ||
expected = df.interpolate(method=method, axis=da.get_axis_num(dim)) | ||
np.testing.assert_allclose(actual.values, expected.values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perennial note for future tests: xarray.tests.assert_close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expected is a Pandas.DataFrame so I don't think this will work with xarray.tests.assert_close
def test_interpolate_pd_compat_non_uniform_index(): | ||
shapes = [(8, 8), (1, 20), (20, 1), (100, 100)] | ||
frac_nans = [0, 0.5, 1] | ||
methods = ['time', 'index', 'values'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obv don't change but for future tests: pytest handles these parameterizations v well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxim-lian - I was using the pytest parameterizations here but they weren't playing nice with the @requires_foo
decorators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can have a look. Do you remember what the setup was that didn't work? No stress if not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think skipped tests are returned as functions without arguments. The parameterizations then were passed to functions that didn't take arguments and that obviously didn't work.
So:
@parameterize('foo', [1, 2])
@skip
def bar(foo):
pass
becomes
@parameterize('foo', [1, 2])
def bar():
pass
when skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use pytest.importorskip
- while @jreback is right that it doesn't allow as much control as writing a function ourselves, and @WillAyd gives some more benefits , we're not using anything that the standard pytest function can't do, in any of our functions.
If people agree, I could add a note to this function: https://github.com/pydata/xarray/pull/1640/files/48505a51e69d6b3663783db414962f3011e94aa1#diff-e84d0339b6da4d18c02f6b2d54fbd5bdR47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall we had some difficulty with decorators based on pytest and subclass methods, but that doesn't apply here.
This is great! While I left some comments, I don't think any warrant changes now, as long as ppl are good with the |
xarray/core/missing.py
Outdated
|
||
if fill_value is None: | ||
self._left = np.nan | ||
self._right = yi[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 let's use NaN here.
xarray/core/dataarray.py
Outdated
def bfill(self, dim, limit=None): | ||
'''Fill NaN values by propogating values backward | ||
|
||
*Requires bottleneck and Numpy v1.12.0 or later.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think NumPy 1.12 is still required with our flip
backport
xarray/core/dataset.py
Outdated
def bfill(self, dim, limit=None): | ||
'''Fill NaN values by propogating values backward | ||
|
||
*Requires bottleneck and Numpy v1.12.0 or later.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above: numpy 1.12 is not required with our backport.
def test_interpolate_pd_compat_non_uniform_index(): | ||
shapes = [(8, 8), (1, 20), (20, 1), (100, 100)] | ||
frac_nans = [0, 0.5, 1] | ||
methods = ['time', 'index', 'values'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall we had some difficulty with decorators based on pytest and subclass methods, but that doesn't apply here.
…match pandas -- adjusted tests and docs to reflect this change
ready to go? |
Thanks for this great work! |
git diff upstream/master | flake8 --diff
whats-new.rst
for all changes andapi.rst
for new APIRough draft of interpolate method for filling of arbitrary nans.
cc @darothen