Skip to content

Feature Request: Efficient rolling with strides #3608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
niowniow opened this issue Dec 10, 2019 · 8 comments
Open

Feature Request: Efficient rolling with strides #3608

niowniow opened this issue Dec 10, 2019 · 8 comments

Comments

@niowniow
Copy link
Contributor

niowniow commented Dec 10, 2019

Xarray is facing the same issues in its current rolling implementation (DataArrayRolling and DatasetRolling) as described in this pandas issue. Namely, the construct methods stride parameter is applied after the rolling is computed. Technically, we are computing more than we would need to because we partially throwing it away due to striding.

In PR #3607 the issue is solved for the ...Rolling's __iter__ function but not for the construct, reduce and _bottleneck_reduce methods.
Since the way Xarray's rolling is implemented relies on numpy, we could introduce a sliding window function as described here.

Any opinions?

@niowniow
Copy link
Contributor Author

Previous enhancement requests asking for a stride argument to rolling: pandas-dev/pandas#15354, pandas-dev/pandas#22976, pandas-dev/pandas#27654 (comment), dask/dask#4659, numpy/numpy#7753

Originally posted by @pilkibun in pandas-dev/pandas#26959 (comment)

@GenevieveBuckley
Copy link

Is this useful/relevant for you? dask/dask#7234

@keewis
Copy link
Collaborator

keewis commented Mar 31, 2021

@dcherian, should this have been closed by #4977?

@dcherian
Copy link
Contributor

No. but this should be really easy to fix.

construct already supports stride.

reduce can easily support stride by passing it on here:

windows = self._construct(
obj, rolling_dim, keep_attrs=keep_attrs, fill_value=fillna
)

We should also add it to _reduce_method

bottleneck does not support stride so we can only use a .isel call at the end of _bottleneck_reduce

return DataArray(values, self.obj.coords, attrs=attrs, name=self.obj.name)

Note: sliding_window_view does not support stride because it's easy to stride after constructing that view (i saw this on some numpy issue)

@niowniow
Copy link
Contributor Author

Quickly glancing over sliding_window_view I didn't immediately understand how to use it with stride. Would I need to

  1. transform the DataArray to dask array using chunk (which may involve an overhead!?),
  2. then use rolling which itself uses sliding_window_view because its a dask array!?
  3. Then use isel with stride on the new dimension?

reduce can easily support stride by passing it on here:

I think that's what I did in #3607. It's been a while

@dcherian
Copy link
Contributor

sliding_window_view is a numpy function (see npcompat.py), so you need not transform to dask.

For reduce I think we just have to pass stride to _construct as in #3608 (comment) and for bottleneck insert the .isel call (copied from the end of _construct) after the DataArray is constructed.

@kmsquire
Copy link
Contributor

Question: instead of adding stride to reduce and _reduce_method, why not add it as a member of DataArrayRolling directly? This would allow, e.g., __iter__ to use it as well, and seems like a cleaner interface.

I've been confused why some parameters are available only in construct (stride, fill_value), some are available both in construct and in the DataArrayRolling constructor (keep_attrs), and some are only available in the constructor (min_periods, center, and soon pad).

@niowniow
Copy link
Contributor Author

You are right. It's quite confusing. I've already added a stride parameter in my PR #3607
I didn't follow through with it and at the moment the checks are not successful anymore.
Maybe someone else could give an opinion on the pro/cons of a stride parameter in rolling?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants