Skip to content

ENH: add time-window capability to .rolling #13513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion ci/lint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,19 @@ if [ "$LINT" ]; then
fi

done
echo "Linting DONE"
echo "Linting *.py DONE"

echo "Linting *.pyx"
for path in 'window.pyx'
do
echo "linting -> pandas/$path"
flake8 pandas/$path --filename '*.pyx' --select=E501,E302,E203,E226,E111,E114,E221,E303,E128,E231,E126,E128
if [ $? -ne "0" ]; then
RET=1
fi

done
echo "Linting *.pyx DONE"

echo "Check for invalid testing"
grep -r -E --include '*.py' --exclude nosetester.py --exclude testing.py '(numpy|np)\.testing' pandas
Expand Down
85 changes: 85 additions & 0 deletions doc/source/computation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,91 @@ For some windowing functions, additional parameters must be specified:
such that the weights are normalized with respect to each other. Weights
of ``[1, 1, 1]`` and ``[2, 2, 2]`` yield the same result.

.. _stats.moments.ts:

Time-aware Rolling
~~~~~~~~~~~~~~~~~~

.. versionadded:: 0.19.0

New in version 0.19.0 are the ability to pass an offset (or convertible) to a ``.rolling()`` method and have it produce
variable sized windows based on the passed time window. For each time point, this includes all preceding values occurring
within the indicated time delta.

This can be particularly useful for a non-regular time frequency index.

.. ipython:: python

dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
index=pd.date_range('20130101 09:00:00', periods=5, freq='s'))
dft

This is a regular frequency index. Using an integer window parameter works to roll along the window frequency.

.. ipython:: python

dft.rolling(2).sum()
dft.rolling(2, min_periods=1).sum()

Specifying an offset allows a more intuitive specification of the rolling frequency.

.. ipython:: python

dft.rolling('2s').sum()

Using a non-regular, but still monotonic index, rolling with an integer window does not impart any special calculation.
Copy link
Member

@wesm wesm Jul 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code examples aside, from this description it isn't precisely clear what the time window does. Essentially we need to distinguish this clearly from resample (which has its own complexities with anchored vs non-anchored time offsets).

Perhaps something like: "For each time point, includes all preceding values occurring within the indicated time delta." If it confuses people we can always create a diagram (for example: I will probably include this in the 2nd ed of my book and almost certainly create a diagram to make it clear).



.. ipython:: python


dft = DataFrame({'B': [0, 1, 2, np.nan, 4]},
index = pd.Index([pd.Timestamp('20130101 09:00:00'),
pd.Timestamp('20130101 09:00:02'),
pd.Timestamp('20130101 09:00:03'),
pd.Timestamp('20130101 09:00:05'),
pd.Timestamp('20130101 09:00:06')],
name='foo'))

dft
dft.rolling(2).sum()


Using the time-specification generates variable windows for this sparse data.

.. ipython:: python

dft.rolling('2s').sum()

Furthermore, we now allow an optional ``on`` parameter to specify a column (rather than the
default of the index) in a DataFrame.

.. ipython:: python

dft = dft.reset_index()
dft
dft.rolling('2s', on='foo').sum()

.. _stats.moments.ts-versus-resampling:

Time-aware Rolling vs. Resampling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Using ``.rolling()`` with a time-based index is quite similar to :ref:`resampling <timeseries.resampling>`. They
both operate and perform reductive operations on time-indexed pandas objects.

When using ``.rolling()`` with an offset. The offset is a time-delta. Take a backwards-in-time looking window, and
aggregate all of the values in that window (including the end-point, but not the start-point). This is the new value
at that point in the result. These are variable sized windows in time-space for each point of the input. You will get
a same sized result as the input.

When using ``.resample()`` with an offset. Construct a new index that is the frequency of the offset. For each frequency
bin, aggregate points from the input within a backwards-in-time looking window that fall in that bin. The result of this
aggregation is the output for that frequency point. The windows are fixed size size in the frequency space. Your result
will have the shape of a regular frequency between the min and the max of the original input object.

To summarize, ``.rolling()`` is a time-based window operation, while ``.resample()`` is a frequency-based window operation.

Centering Windows
~~~~~~~~~~~~~~~~~

Expand Down
6 changes: 5 additions & 1 deletion doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1324,7 +1324,11 @@ performing resampling operations during frequency conversion (e.g., converting
secondly data into 5-minutely data). This is extremely common in, but not
limited to, financial applications.

``resample`` is a time-based groupby, followed by a reduction method on each of its groups.
``.resample()`` is a time-based groupby, followed by a reduction method on each of its groups.

.. note::

``.resample()`` is similar to using a ``.rolling()`` operation with a time-based offset, see a discussion `here <stats.moments.ts-versus-resampling>`

See some :ref:`cookbook examples <cookbook.resample>` for some advanced strategies

Expand Down
63 changes: 61 additions & 2 deletions doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,17 @@
v0.19.0 (August ??, 2016)
-------------------------

This is a major release from 0.18.2 and includes a small number of API changes, several new features,
This is a major release from 0.18.1 and includes a small number of API changes, several new features,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche slight changes need here (which are in master). done here i think is fine.

enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

Highlights include:

- :func:`merge_asof` for asof-style time-series joining, see :ref:`here <whatsnew_0190.enhancements.asof_merge>`
- ``.rolling()`` are now time-series aware, see :ref:`here <whatsnew_0190.enhancements.rolling_ts>`
- pandas development api, see :ref:`here <whatsnew_0190.dev_api>`

.. contents:: What's new in v0.18.2
.. contents:: What's new in v0.19.0
:local:
:backlinks: none

Expand Down Expand Up @@ -131,6 +132,64 @@ that forward filling happens automatically taking the most recent non-NaN value.
This returns a merged DataFrame with the entries in the same order as the original left
passed DataFrame (``trades`` in this case), with the fields of the ``quotes`` merged.

.. _whatsnew_0190.enhancements.rolling_ts:

``.rolling()`` are now time-series aware
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``.rolling()`` objects are now time-series aware and can accept a time-series offset (or convertible) for the ``window`` argument (:issue:`13327`, :issue:`12995`)
See the full documentation :ref:`here <stats.moments.ts>`.

.. ipython:: python

dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
index=pd.date_range('20130101 09:00:00', periods=5, freq='s'))
dft

This is a regular frequency index. Using an integer window parameter works to roll along the window frequency.

.. ipython:: python

dft.rolling(2).sum()
dft.rolling(2, min_periods=1).sum()

Specifying an offset allows a more intuitive specification of the rolling frequency.

.. ipython:: python

dft.rolling('2s').sum()

Using a non-regular, but still monotonic index, rolling with an integer window does not impart any special calculation.

.. ipython:: python


dft = DataFrame({'B': [0, 1, 2, np.nan, 4]},
index = pd.Index([pd.Timestamp('20130101 09:00:00'),
pd.Timestamp('20130101 09:00:02'),
pd.Timestamp('20130101 09:00:03'),
pd.Timestamp('20130101 09:00:05'),
pd.Timestamp('20130101 09:00:06')],
name='foo'))

dft
dft.rolling(2).sum()

Using the time-specification generates variable windows for this sparse data.

.. ipython:: python

dft.rolling('2s').sum()

Furthermore, we now allow an optional ``on`` parameter to specify a column (rather than the
default of the index) in a DataFrame.

.. ipython:: python

dft = dft.reset_index()
dft
dft.rolling('2s', on='foo').sum()

.. _whatsnew_0190.enhancements.read_csv_dupe_col_names_support:

:func:`read_csv` has improved support for duplicate column names
Expand Down
5 changes: 3 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -5343,11 +5343,12 @@ def _add_series_or_dataframe_operations(cls):

@Appender(rwindow.rolling.__doc__)
def rolling(self, window, min_periods=None, freq=None, center=False,
win_type=None, axis=0):
win_type=None, on=None, axis=0):
axis = self._get_axis_number(axis)
return rwindow.rolling(self, window=window,
min_periods=min_periods, freq=freq,
center=center, win_type=win_type, axis=axis)
center=center, win_type=win_type,
on=on, axis=axis)

cls.rolling = rolling

Expand Down
Loading