Skip to content

WIP: Feature/interpolate #1640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Dec 30, 2017
Merged

WIP: Feature/interpolate #1640

merged 27 commits into from
Dec 30, 2017

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Oct 20, 2017

Rough draft of interpolate method for filling of arbitrary nans.

cc @darothen

raise ValueError("Index must be monotonicly increasing")

# TODO: consider using the np_interp_wrapper for linear method
out = np.apply_along_axis(scipy_interp_wrapper, axis, arr, index,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think scipy.interpolate.interp1d can interpolate a multiple dimensional array along one axis, though scipy is an optional requirement.
(But I did not check it is certainly faster.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think scipy.interpolate.interp1d can interpolate a multiple dimensional array along one axis, though scipy is an optional requirement.
(But I did not check it is certainly faster.)

EDIT:
I noticed that I did not understand correctly.
The multi-dimensional interpolation does not work for filling the missing values which are distributed randomly.

So ignore my comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly. For 1 and 2d arrays, this should behave identically to the pandas interpolate method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe apply_ufunc with vectorize=True would work here? I guess it would reorder dimensions in a non-desired way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is on my list of things to try today. Getting things to behave like pandas required a bit of dancing and I've saved the obvious optimizations for later. Specifically, I want to get some more tests written first.

raise ValueError("Index must be monotonicly increasing")

# TODO: consider using the np_interp_wrapper for linear method
out = np.apply_along_axis(scipy_interp_wrapper, axis, arr, index,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe apply_ufunc with vectorize=True would work here? I guess it would reorder dimensions in a non-desired way.

@@ -0,0 +1,95 @@
"""Define core operations for xarray objects.

TODO(shoyer): rewrite this module, making use of xarray.core.computation,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO note doesn't belong here :)

@@ -1175,6 +1175,35 @@ def fillna(self, value):
out = ops.fillna(self, value)
return out

def interpolate(self, dim=None, method='linear', inplace=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like calling the interpolate_na rather than interpolate to make it clear what this method does. See here for the related pandas discussion: pandas-dev/pandas#9340

output_core_dims=[[dim]],
dask='forbidden',
vectorize=True,
keep_attrs=True).transpose(*arr.dims)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer - can you take a look at this? About half of my tests are passing but I suspect I'm still missing something in the apply_ufunc call.

I'm getting a few errors that look like this:

ValueError: inconsistent size for core dimension 'dim0': 1 vs 6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem here is that you are treating your interpolator as an argument to vectorize over. This might technically work (by interpreting interpolator as a scalar), but was certainly was not intended.

Possibly we should have a more informative error message here.

There are various ways you can fix this (using either a class or function returned by a function), but either way probably want the way you use apply_ufunc to be something more like apply_ufunc(wrap_interpolator(interpolator), index, arr, ...).

@dcherian
Copy link
Contributor

dcherian commented Nov 4, 2017

@jhamman Would it be possible to have a limit option like pandas?

Quoting http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.interpolate.html

limit : int, default None.

    Maximum number of consecutive NaNs to fill. Must be greater than 0.

I suspect that a lot of the time, users want to fill in "small" gaps but not "large" ones. It is what I usually do.

@jhamman
Copy link
Member Author

jhamman commented Nov 4, 2017

@dcherian - Yes, we can do that. Just haven't gotten there yet.

self._right = yi[-1]
elif isinstance(fill_value, Iterable):
self._left = fill_value[0]
self._right = fill_value[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this only works for a pair of values, let's restrict this to isintance(fill_value, tuple) and explicitly check that it has length 2.


if fill_value is None:
self._left = np.nan
self._right = yi[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little surprising to me -- I would expect both to default to NaN.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm not sure if this is a Pandas bug or an intention feature.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have them both default to NaN in Xarray?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 let's use NaN here.

def get_clean_interp_args(arr, method, dim, **kwargs):
'''normalize interolation arguments and choose which index to use

provides compatability with pandas interolation methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interolation -> interpolation

elif method == 'time':
index = arr.get_index(dim)
method = 'linear'
assert isinstance(index, pd.DatetimeIndex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably raise ValueError instead

specify and order (int) e.g. da.interpolate_na(method='polynomial', order=4)

# TODO: Pandas also support: scipy wrappers of 'krogh',
# 'piecewise_polynomial', 'spline', and 'pchip'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find this interface from pandas a little confusing. It combines two separate choices in one argument:

  • Choosing the points at which interpolation is done.
  • How to interpolate between these points.

I would slightly rather that we had two separate arguments. Maybe:

  • points='coordinate' vs points='equal_spacing' or a boolean use_coordinate=True/use_coordinate=False
  • method='nearest', method='linear', method='quadratic', etc, for indicating how to interpolate.

The downside is that this would be inconsistent with pandas. But in the existing interface, it is not at all obvious to me whether a method like 'polynomial' uses coordinate points or equally spaced points.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree. As I've been working on this I've been thinking we can improve on the Pandas interface and this would be the obvious way to do that.

elif method == 'spline':
return SplineInterpolator
except ImportError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be more useful to raise an error about how scipy is needed for the given interpolator rather than a generic error message.

output_core_dims=[[dim]],
dask='forbidden',
vectorize=True,
keep_attrs=True).transpose(*arr.dims)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem here is that you are treating your interpolator as an argument to vectorize over. This might technically work (by interpreting interpolator as a scalar), but was certainly was not intended.

Possibly we should have a more informative error message here.

There are various ways you can fix this (using either a class or function returned by a function), but either way probably want the way you use apply_ufunc to be something more like apply_ufunc(wrap_interpolator(interpolator), index, arr, ...).

input_core_dims=[[], [dim], [dim]],
output_core_dims=[[dim]],
dask='forbidden',
vectorize=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the scipy.interpolate.interp1d is already vectorized in the right way:
https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.interpolate.interp1d.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can only use interp1d on 1D slices. The key point here is that we should be able to interpolate at different index positions for different slices. I don't think interp1d will let us do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. That would be useful for the interpolate_at variant.

input_core_dims=[[], [dim], [dim]],
output_core_dims=[[dim]],
dask='forbidden',
vectorize=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. That would be useful for the interpolate_at variant.

interpolator, index, arr, kwargs=kwargs,
input_core_dims=[[], [dim], [dim]],
output_core_dims=[[dim]],
dask='forbidden',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dask='parallelized' would work here -- I tested it with vectorize=True.

@shoyer shoyer mentioned this pull request Nov 8, 2017
4 tasks
Joe Hamman added 2 commits November 10, 2017 23:25
first working version of interpolate method/module

pep8 after merge master

update interpolate, more pandas compat tests

add interpolate_na to api

encoding fix for py2 in dataarray.py

working...

checkin, roughed in interpolator classes

move tests and some mods to apply_ufunc usage in interp_na

add method to kwargs

fixes for scipy and some docs

cleanup scipy vs numpy interpolator selection

cleanups

add limit to interpolate_na

bfill/ffill parallelized

new interface with use_coordinate kwarg

use partial function to wrap interpolate class

a few fixes for ffill/bfill, placeholder for interpolate_at method

add some tests

fix test
output_sizes=dict(zip(arr.dims, arr.shape)),
dask='parallelized',
vectorize=True,
keep_attrs=True).transpose(*arr.dims)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer - I finally got the apply_ufunc to work. Turns out I just had a bug in wrap_interpolator Nonetheless, I am now using a partial function to simplify this call.

raise NotImplementedError('dim is a required argument')

if limit is not None:
valids = _get_valid_fill_mask(arr, dim, limit)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcherian - the limit option is now supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks!

order given both 'polynomial' and 'spline' require that you also
specify and order (int) e.g. da.interpolate_na(method='polynomial',
order=4)
use_coordinate : boolean, default True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a description.

Specifies the dimension along which to interpolate.
method : {'linear', 'time', 'index', 'values', 'nearest'}
'linear': ignore the index and treat the values as equally
spaced. default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor point, but can we format this a little more consistently?

The compat arg on the Dataset constructor gets formatted nicely into a list by Sphinx and is still pretty readable as plain text:

compat : {'broadcast_equals', 'equals', 'identical'}, optional
String indicating how to compare variables of the same name for
potential conflicts when initializing this dataset:
- 'broadcast_equals': all values must be equal when variables are
broadcast against each other to ensure common dimensions.
- 'equals': all values and dimensions must be the same.
- 'identical': all values, dimensions and attributes must be the
same.

specify and order (int) e.g. da.interpolate_na(method='polynomial',
order=4)
use_coordinate : boolean, default True
limit : limit : int, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit : is repeated twice

order=4)
use_coordinate : boolean, default True
limit : limit : int, default None
Maximum number of consecutive NaNs to fill. Must be greater than 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "or None for no limit"

limit=None, **kwargs):
# this is just here so I remember the signature we discussed
# dim: the dimension along which to interpolate
# locs: a broadcastable boolean mask describing where interpolation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean mask? I'm not quite sure how that would make sense. My immediate thought would be that locs should be a 1d array of coordinate values.

keep_attrs=True).transpose(*self.dims)

if limit is not None:
print(valids)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove :)


result = da.ffill('time', limit=1)
expected = DataArray(
[0, 0, np.nan, np.nan, np.nan, 3, 4, 5, 5, 6, 7], dims='time')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a couple of tests using dask.

'''inverse of ffill'''
import bottleneck as bn

arr = np.flip(arr, axis=axis)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this requires numpy 0.12 or later so I'll need to add a compat function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was done in 19d21b8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe npcompat.flip?
And then numpy=1.11 is also supported?

@jhamman
Copy link
Member Author

jhamman commented Nov 14, 2017

This is ready for some more reviews. I'm seeing some failing tests on the py27-min build that I'm struggling to detangle at the moment but I'm pretty sure they're just pytest related.

cc @darothen - I'd particularly appreciate your thoughts on the interpolators.

Copy link
Member

@fujiisoup fujiisoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature addition is quite great :)

Can we have interpolate_na, ffill, bfil also for xr.Dataset?
drop_na is in Dataset.

----------
dim : str
Specifies the dimension along which to interpolate.
method : {'linear', 'time', 'index', 'values', 'nearest'}, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably some explanation about method='time' option is necessary here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. we actually dropped the time/index/values methods in favor of a more explicit interface that uses use_coordinate.

from scipy.interpolate import UnivariateSpline

if kind is None:
raise ValueError('kind is a required argument')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should be passed as kind?

if isinstance(index, pd.DatetimeIndex):
index = index.values.astype(np.float64)
# check index sorting now so we can skip it later
if not (np.diff(index) > 0).all():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we require a monotonically-increasing index, can we pass assume_sorted=True to ScipyInterpolator?

if use_coordinate is True:
index = arr.get_index(dim)
else:
index = arr.coords[use_coordinate]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check arr.coords[use_coordinate] is certainly 1-dimensional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can also raise an appropriate error if coordinate is MultiIndex

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fujiisoup - do we have a handy way to tell if an coordinate is a xarray DataArray/Multindex? Its not a subset of pd.Multindex but I need something like:

isinstance(index, (pd.MultiIndex, xr.MultiIndexCoordinate))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the cleanest and most comprehensive thing to do is to try to always cast the interpolation index to float64, while catching any error messages (to replace them with something more informative). When a MultiIndex is cast to numpy array it ends up with object dtype.

Copy link
Member

@fujiisoup fujiisoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comment :)

Parameters
----------
dim : str
Specifies the dimension along which to propogate values when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propogate -> propagate?

**kwargs):
from scipy.interpolate import UnivariateSpline

if method is not 'spline':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if method is not 'spline': -> if method != 'spline':

DataArray
'''
from .missing import bfill
return bfill(self, dim, limit=limit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need bottleneck installed to use bfill or ffill? Maybe it should be noted in docstrings.

@pytest.mark.parametrize('frac_nan', [0, 0.5, 1])
@pytest.mark.parametrize('method', ['time', 'index', 'values', 'linear',
'nearest', 'zero', 'slinear',
'quadratic', 'cubic'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to test also for 'barycentric', 'krog', 'pchip', 'spline' methods?

Specifies the dimension along which to interpolate.
method : {'linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'polynomial', 'barycentric', 'krog', 'pchip',
'spline'}, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pandas seems to support Akima.
Maybe we can also add this?
Scipy provides the similar interface.

'spline'}, optional
String indicating which method to use for interpolation:

- 'linear': linear interpolation (Default). Additional keyword
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed pandas.interpolate(method='linear') behaves differently.
Though I think our definition is more natural, we can note this here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to which values Pandas uses for the index?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

pandas' docstring says

‘linear’: ignore the index and treat the values as equally spaced.
‘index’: use the actual numerical values of the index.

the difference of which may correspond to our use_coordinate option [False/True].

Our linear option means a linear interpolation.
I've just thought the same name with different meanings is sometimes confusing.
It is a very trivial point (and maybe a kind of overconcern).

if use_coordinate is True:
index = arr.get_index(dim)
else:
index = arr.coords[use_coordinate]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can also raise an appropriate error if coordinate is MultiIndex

This was referenced Nov 21, 2017
@jhamman
Copy link
Member Author

jhamman commented Dec 18, 2017

@fujiisoup - care to take another pass here? I think this is done but would appreciate a few more reviews.

Copy link
Member

@fujiisoup fujiisoup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments.
Maybe I miss something but we require numpy=1.12 because of np.flip?

.. ipython:: python

x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=['x'],
coords={'xx': [0, 1, 1.1, 1.9, 3]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here is a typo. Maybe

x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=['x'],
                    coords={'xx': ('x', [0, 1, 1.1, 1.9, 3])})

or

x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=['x'],
          coords={'x': [0, 1, 1.1, 1.9, 3]})

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really?
I got
ValueError: coordinate xx has dimensions ('xx',), but these are not a subset of the DataArray dimensions ['x']
from this script.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right. I was missing the Variable constructor.

'''inverse of ffill'''
import bottleneck as bn

arr = np.flip(arr, axis=axis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe npcompat.flip?
And then numpy=1.11 is also supported?

@fujiisoup
Copy link
Member

This looks good to me.
Thanks for the great work :)

I will merge it in this weekend if there are no further comments.

use_coordinate=use_coordinate, **kwargs)

def ffill(self, dim, limit=None):
'''Fill NaN values by propogating values forward
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to change now, but FYI PEP8 is """ on its own line IIRC

@@ -1228,6 +1228,97 @@ def fillna(self, value):
out = ops.fillna(self, value)
return out

def interpolate_na(self, dim=None, method='linear', limit=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably too late to be helpful - but are we sure about the name here? We don't generally add _na onto methods (bfill_na?), and pandas is interpolate only

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment from @shoyer above: #1640 (comment)

self.f = UnivariateSpline(xi, yi, k=order, **self.cons_kwargs)


def _apply_over_vars_with_dim(func, self, dim=None, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Could we use this elsewhere in the future? Or even put it on Dataset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, certainly. I'm not 100% convinced we don't already have something like this but I didn't seem to find it. @shoyer, do you have thoughts on this?

'''wrapper for datasets'''
from .dataset import Dataset

ds = Dataset(coords=self.coords, attrs=self.attrs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If people are subclassing Dataset (we do internally), this will void their subclass. I think generally we should use type(self) or self.__class__

axis = arr.get_axis_num(dim)

# work around for bottleneck 178
_limit = limit if limit is not None else arr.shape[axis]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

for dim in ['time', 'x']:
actual = da.interpolate_na(method=method, dim=dim)
expected = df.interpolate(method=method, axis=da.get_axis_num(dim))
np.testing.assert_allclose(actual.values, expected.values)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perennial note for future tests: xarray.tests.assert_close

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected is a Pandas.DataFrame so I don't think this will work with xarray.tests.assert_close

def test_interpolate_pd_compat_non_uniform_index():
shapes = [(8, 8), (1, 20), (20, 1), (100, 100)]
frac_nans = [0, 0.5, 1]
methods = ['time', 'index', 'values']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obv don't change but for future tests: pytest handles these parameterizations v well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxim-lian - I was using the pytest parameterizations here but they weren't playing nice with the @requires_foo decorators.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can have a look. Do you remember what the setup was that didn't work? No stress if not

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think skipped tests are returned as functions without arguments. The parameterizations then were passed to functions that didn't take arguments and that obviously didn't work.

So:

@parameterize('foo', [1, 2])
@skip
def bar(foo):
    pass

becomes

@parameterize('foo', [1, 2])
def bar():
    pass

when skipped.

Copy link
Collaborator

@max-sixty max-sixty Dec 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use pytest.importorskip - while @jreback is right that it doesn't allow as much control as writing a function ourselves, and @WillAyd gives some more benefits , we're not using anything that the standard pytest function can't do, in any of our functions.

If people agree, I could add a note to this function: https://github.com/pydata/xarray/pull/1640/files/48505a51e69d6b3663783db414962f3011e94aa1#diff-e84d0339b6da4d18c02f6b2d54fbd5bdR47

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall we had some difficulty with decorators based on pytest and subclass methods, but that doesn't apply here.

@max-sixty
Copy link
Collaborator

This is great! While I left some comments, I don't think any warrant changes now, as long as ppl are good with the interpolate_na name


if fill_value is None:
self._left = np.nan
self._right = yi[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 let's use NaN here.

def bfill(self, dim, limit=None):
'''Fill NaN values by propogating values backward

*Requires bottleneck and Numpy v1.12.0 or later.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think NumPy 1.12 is still required with our flip backport

def bfill(self, dim, limit=None):
'''Fill NaN values by propogating values backward

*Requires bottleneck and Numpy v1.12.0 or later.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above: numpy 1.12 is not required with our backport.

def test_interpolate_pd_compat_non_uniform_index():
shapes = [(8, 8), (1, 20), (20, 1), (100, 100)]
frac_nans = [0, 0.5, 1]
methods = ['time', 'index', 'values']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall we had some difficulty with decorators based on pytest and subclass methods, but that doesn't apply here.

@fujiisoup
Copy link
Member

ready to go?

@fujiisoup fujiisoup merged commit 783b527 into pydata:master Dec 30, 2017
@fujiisoup
Copy link
Member

Thanks for this great work!

@jhamman jhamman deleted the feature/interpolate branch December 30, 2017 06:58
@max-sixty max-sixty mentioned this pull request Jan 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants