Skip to content

WIP: Feature/interpolate #1640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Dec 30, 2017
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
1582c1f
initial interpolate commit
Oct 17, 2017
ab727e7
fix to interpolate wrapper function
Nov 11, 2017
95006c4
remove duplicate limit handling in ffill/bfill
Nov 11, 2017
4a4f6eb
tests are passing
Nov 12, 2017
42d63ef
more docs, more tests
Nov 13, 2017
263ec98
Merge branch 'master' of github.com:pydata/xarray into feature/interp…
Nov 13, 2017
19d21b8
backward compat and add benchmarks
Nov 13, 2017
f937c07
skip tests for numpy versions before 1.12
Nov 13, 2017
8717e38
test fixes for py27 fixture
Nov 13, 2017
3d5c1b1
try reording decorators
Nov 13, 2017
1864e8f
minor reorg of travis to make the flake8 check useful
Nov 16, 2017
f58d464
cleanup following @fujiisoup's comments
Nov 18, 2017
1b93808
dataset missing methods, some more docs, and more scipy interpolators
Nov 27, 2017
6f83b7b
Merge branch 'master' of github.com:pydata/xarray into feature/interp…
Dec 6, 2017
33df6af
Merge branch 'master' of github.com:pydata/xarray into feature/interp…
Dec 10, 2017
eafe67a
Merge remote-tracking branch 'upstream/master' into feature/interpolate
Dec 16, 2017
dd9fa8c
workaround for parameterized tests that are skipped in missing.py module
Dec 16, 2017
88d1569
requires_np112 for dataset interpolate test
Dec 18, 2017
3fb9261
Merge branch 'master' of github.com:pydata/xarray into feature/interp…
Dec 21, 2017
37882b7
remove req for np 112
Dec 21, 2017
a04e83e
fix typo in docs
Dec 21, 2017
48505a5
@requires_np112 for methods that use apply_ufunc in missing.py
Dec 21, 2017
20f957d
Merge branch 'master' of github.com:pydata/xarray into feature/interp…
Dec 21, 2017
282bb65
reuse type in apply over vars with dim
Dec 21, 2017
a6fcb7f
rework the fill value convention for linear interpolation, no longer …
Dec 22, 2017
2b0d9e1
flake8
Dec 30, 2017
d3220f3
Merge branch 'master' of github.com:pydata/xarray into feature/interp…
Dec 30, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,9 +89,9 @@ install:
- python xarray/util/print_versions.py

script:
- git diff upstream/master xarray/**/*py | flake8 --diff --exit-zero || true
- python -OO -c "import xarray"
- py.test xarray --cov=xarray --cov-config ci/.coveragerc --cov-report term-missing --verbose $EXTRA_FLAGS
- git diff upstream/master **/*py | flake8 --diff --exit-zero || true

after_success:
- coveralls
74 changes: 74 additions & 0 deletions asv_bench/benchmarks/dataarray_missing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import pandas as pd

try:
import dask
import dask.multiprocessing
except ImportError:
pass

import xarray as xr

from . import randn, requires_dask


def make_bench_data(shape, frac_nan, chunks):
vals = randn(shape, frac_nan)
coords = {'time': pd.date_range('2000-01-01', freq='D',
periods=shape[0])}
da = xr.DataArray(vals, dims=('time', 'x', 'y'), coords=coords)

if chunks is not None:
da = da.chunk(chunks)

return da


def time_interpolate_na(shape, chunks, method, limit):
if chunks is not None:
requires_dask()
da = make_bench_data(shape, 0.1, chunks=chunks)
actual = da.interpolate_na(dim='time', method='linear', limit=limit)

if chunks is not None:
actual = actual.compute()


time_interpolate_na.param_names = ['shape', 'chunks', 'method', 'limit']
time_interpolate_na.params = ([(3650, 200, 400), (100, 25, 25)],
[None, {'x': 25, 'y': 25}],
['linear', 'spline', 'quadratic', 'cubic'],
[None, 3])


def time_ffill(shape, chunks, limit):

da = make_bench_data(shape, 0.1, chunks=chunks)
actual = da.ffill(dim='time', limit=limit)

if chunks is not None:
actual = actual.compute()


time_ffill.param_names = ['shape', 'chunks', 'limit']
time_ffill.params = ([(3650, 200, 400), (100, 25, 25)],
[None, {'x': 25, 'y': 25}],
[None, 3])


def time_bfill(shape, chunks, limit):

da = make_bench_data(shape, 0.1, chunks=chunks)
actual = da.bfill(dim='time', limit=limit)

if chunks is not None:
actual = actual.compute()


time_bfill.param_names = ['shape', 'chunks', 'limit']
time_bfill.params = ([(3650, 200, 400), (100, 25, 25)],
[None, {'x': 25, 'y': 25}],
[None, 3])
3 changes: 3 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,9 @@ Computation
:py:attr:`~DataArray.count`
:py:attr:`~DataArray.dropna`
:py:attr:`~DataArray.fillna`
:py:attr:`~DataArray.ffill`
:py:attr:`~DataArray.bfill`
:py:attr:`~DataArray.interpolate_na`
:py:attr:`~DataArray.where`

**ndarray methods**:
Expand Down
87 changes: 87 additions & 0 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -1220,6 +1220,93 @@ def fillna(self, value):
out = ops.fillna(self, value)
return out

def interpolate_na(self, dim=None, method='linear', limit=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably too late to be helpful - but are we sure about the name here? We don't generally add _na onto methods (bfill_na?), and pandas is interpolate only

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment from @shoyer above: #1640 (comment)

use_coordinate=True,
**kwargs):
"""Interpolate values according to different methods.

Parameters
----------
dim : str
Specifies the dimension along which to interpolate.
method : {'linear', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'polynomial', 'barycentric', 'krog', 'pchip',
'spline'}, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pandas seems to support Akima.
Maybe we can also add this?
Scipy provides the similar interface.

String indicating which method to use for interpolation:

- 'linear': linear interpolation (Default). Additional keyword
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed pandas.interpolate(method='linear') behaves differently.
Though I think our definition is more natural, we can note this here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to which values Pandas uses for the index?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

pandas' docstring says

‘linear’: ignore the index and treat the values as equally spaced.
‘index’: use the actual numerical values of the index.

the difference of which may correspond to our use_coordinate option [False/True].

Our linear option means a linear interpolation.
I've just thought the same name with different meanings is sometimes confusing.
It is a very trivial point (and maybe a kind of overconcern).

arguments are passed to ``numpy.interp``
- 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'polynomial': are passed to ``scipy.interpolate.interp1d``. If
method=='polynomial', the ``order`` keyword argument must also be
provided.
- 'barycentric', 'krog', 'pchip', 'spline': use their respective
``scipy.interpolate`` classes.
use_coordinate : boolean or str, default True
Specifies which index to use as the x values in the interpolation
formulated as `y = f(x)`. If False, values are treated as if
eqaully-spaced along `dim`. If True, the IndexVariable `dim` is
used. If use_coordinate is a string, it specifies the name of a
coordinate variariable to use as the index.
limit : int, default None
Maximum number of consecutive NaNs to fill. Must be greater than 0
or None for no limit.

Returns
-------
DataArray

See also
--------
numpy.interp
scipy.interpolate
"""
from .missing import interp_na
return interp_na(self, dim=dim, method=method, limit=limit,
use_coordinate=use_coordinate, **kwargs)

def ffill(self, dim, limit=None):
'''Fill NaN values by propogating values forward
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to change now, but FYI PEP8 is """ on its own line IIRC


Parameters
----------
dim : str
Specifies the dimension along which to propogate values when
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propogate -> propagate?

filling.
limit : int, default None
The maximum number of consecutive NaN values to forward fill. In
other words, if there is a gap with more than this number of
consecutive NaNs, it will only be partially filled. Must be greater
than 0 or None for no limit.

Returns
-------
DataArray
'''
from .missing import ffill
return ffill(self, dim, limit=limit)

def bfill(self, dim, limit=None):
'''Fill NaN values by propogating values backward

Parameters
----------
dim : str
Specifies the dimension along which to propogate values when
filling.
limit : int, default None
The maximum number of consecutive NaN values to backward fill. In
other words, if there is a gap with more than this number of
consecutive NaNs, it will only be partially filled. Must be greater
than 0 or None for no limit.

Returns
-------
DataArray
'''
from .missing import bfill
return bfill(self, dim, limit=limit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need bottleneck installed to use bfill or ffill? Maybe it should be noted in docstrings.


def combine_first(self, other):
"""Combine two DataArray objects, with union of coordinates.

Expand Down
Loading