Skip to content

ffill & bfill methods #1696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions xarray/core/dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -1214,6 +1214,9 @@ def fillna(self, value):
out = ops.fillna(self, value)
return out

def ffill(self, dim, limit=None):
return ops.ffill(self, dim, limit)

def combine_first(self, other):
"""Combine two DataArray objects, with union of coordinates.

Expand Down
16 changes: 16 additions & 0 deletions xarray/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,22 @@ def fillna(data, other, join="left", dataset_join="left"):
dataset_fill_value=np.nan,
keep_attrs=True)

def ffill(data, dim, limit=None):
axis_num = data.get_axis_num(dim)

if not has_bottleneck:
raise ImportError('ffill requires bottleneck to be installed')

data = data.copy()

if limit is None:
# bottleneck raises an error if you pass `None`
data.values = bn.push(data.values, axis=axis_num)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try apply_ufunc + transpose() afterwards instead? That would work on arbitrary xarray objects. It's awkward to need to restore the original dimension order, but it adds almost no overhead given NumPy's memory model. @jhamman is doing something like that in #1640.

I think you could even use dask='parallelized' here, to allow for doing multiple forward fills at the same time.

Parallelizing across the pushed axis with dask would be harder (and certainly isn't necessary for a first pass), because you would need something hierarchical that can push data between chunks. But I think you could still make this work with multiple passes to get forward fill-values on the boundaries.

else:
data.values = bn.push(data.values, axis=axis_num, n=limit)

return data


def where_method(self, cond, other=dtypes.NA):
"""Return elements from `self` or `other` depending on `cond`.
Expand Down
18 changes: 18 additions & 0 deletions xarray/tests/test_dataarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -3146,3 +3146,21 @@ def test_raise_no_warning_for_nan_in_binary_ops():
with pytest.warns(None) as record:
xr.DataArray([1, 2, np.NaN]) > 0
assert len(record) == 0


@pytest.mark.parametrize('da', (1, 2), indirect=True)
def test_ffill_functions(da):
result = da.ffill('time')
assert result.isnull().sum() == 0

def test_ffill_limit(da):
da = DataArray(
[0, np.nan, np.nan, np.nan, np.nan, 3, 4, 5, np.nan, 6, 7],
dims='time')
result = da.ffill('time')
expected = DataArray([0, 0, 0, 0, 0, 3, 4, 5, 5, 6, 7], dims='time')
assert_array_equal(result, expected)

result = da.ffill('time', limit=1)
expected = DataArray(
[0, 0, np.nan, np.nan, np.nan, 3, 4, 5, 5, 6, 7], dims='time')