Skip to content

implement+test mean for datetimelike EA/Index/Series #24757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Jun 10, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
434e2cd
implement+test mean for datetimelike EA/Index/Series
jbrockmendel Jan 13, 2019
c8abf33
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 14, 2019
30eeb64
update imports
jbrockmendel Jan 14, 2019
d48e2ef
isort fixup
jbrockmendel Jan 14, 2019
0e32be2
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 14, 2019
231458d
params for docstring
jbrockmendel Jan 14, 2019
1129e8c
test for numeric_only
jbrockmendel Jan 14, 2019
38f829a
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 16, 2019
aba90ec
copy/paste fixup
jbrockmendel Jan 16, 2019
176b355
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 17, 2019
ccb790b
Disable for PeriodArray
jbrockmendel Jan 17, 2019
4f4cb6d
Delete assertions missed in previous commit
jbrockmendel Jan 17, 2019
ec83db1
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 17, 2019
5fb1db9
xfail numeric_only=False case
jbrockmendel Jan 17, 2019
028c789
Merge branch 'means' of https://github.com/jbrockmendel/pandas into m…
jbrockmendel Jan 17, 2019
50e714e
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 29, 2019
1b24e7d
add todo comment
jbrockmendel Jan 29, 2019
4d40906
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 29, 2019
a49da37
dont expect datetime in dataframe.mean
jbrockmendel Jan 29, 2019
94d3466
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Feb 1, 2019
e4e6a03
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Feb 7, 2019
7953a7b
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Feb 11, 2019
15307da
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Feb 18, 2019
da719e1
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Feb 22, 2019
637b415
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Feb 28, 2019
58bca36
whatsnew
jbrockmendel Feb 28, 2019
abcb87a
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Mar 3, 2019
4df0b1c
add versionadded
jbrockmendel Mar 3, 2019
c9736d7
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Mar 9, 2019
d2f5e6f
change NotImplementedError to TypeError
jbrockmendel Mar 9, 2019
de9025c
add mean to class docstrings and docs/source/reference/indexing.rst
jbrockmendel Mar 9, 2019
4c553a9
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Mar 14, 2019
330fc41
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Mar 20, 2019
7c6201b
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Apr 5, 2019
642d4e2
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Apr 12, 2019
7b9fd42
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Apr 20, 2019
4be012f
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel May 12, 2019
5682b65
remove axis keyword for now
jorisvandenbossche May 14, 2019
581ff1a
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel May 14, 2019
34a83a4
Merge branch 'means' of https://github.com/jbrockmendel/pandas into m…
jbrockmendel May 14, 2019
14150ee
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel May 16, 2019
450c8ce
don't pass axis to methods
jorisvandenbossche May 16, 2019
3e31ca1
add returns to docstring
jorisvandenbossche May 16, 2019
111c345
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jun 3, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/source/reference/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,13 @@ Conversion
DatetimeIndex.to_series
DatetimeIndex.to_frame

Methods
~~~~~~~
.. autosummary::
:toctree: api/

DatetimeIndex.mean

TimedeltaIndex
--------------
.. autosummary::
Expand Down Expand Up @@ -435,6 +442,13 @@ Conversion
TimedeltaIndex.ceil
TimedeltaIndex.to_frame

Methods
~~~~~~~
.. autosummary::
:toctree: api/

TimedeltaIndex.mean

.. currentmodule:: pandas

PeriodIndex
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ Other Enhancements
- :meth:`DataFrame.query` and :meth:`DataFrame.eval` now supports quoting column names with backticks to refer to names with spaces (:issue:`6508`)
- :func:`merge_asof` now gives a more clear error message when merge keys are categoricals that are not equal (:issue:`26136`)
- :meth:`pandas.core.window.Rolling` supports exponential (or Poisson) window type (:issue:`21303`)
- :class:`DatetimeIndex` and :class:`TimedeltaIndex` now have a `mean` method (:issue:`24757`)
-

.. _whatsnew_0250.api_breaking:
Expand Down
50 changes: 49 additions & 1 deletion pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -1382,7 +1382,7 @@ def _ensure_localized(self, arg, ambiguous='raise', nonexistent='raise',
def _reduce(self, name, axis=0, skipna=True, **kwargs):
op = getattr(self, name, None)
if op:
return op(axis=axis, skipna=skipna, **kwargs)
return op(skipna=skipna, **kwargs)
else:
return super()._reduce(name, skipna, **kwargs)

Expand Down Expand Up @@ -1438,6 +1438,54 @@ def max(self, axis=None, skipna=True, *args, **kwargs):
# Don't have to worry about NA `result`, since no NA went in.
return self._box_func(result)

def mean(self, skipna=True):
"""
Return the mean value of the Array.

.. versionadded:: 0.25.0

Parameters
----------
skipna : bool, default True
Whether to ignore any NaT elements

Returns
-------
scalar (Timestamp or Timedelta)

See Also
--------
numpy.ndarray.mean
Series.mean : Return the mean value in a Series.

Notes
-----
mean is only defined for Datetime and Timedelta dtypes, not for Period.
"""
if is_period_dtype(self):
# See discussion in GH#24757
raise TypeError(
"mean is not implemented for {cls} since the meaning is "
"ambiguous. An alternative is "
"obj.to_timestamp(how='start').mean()"
.format(cls=type(self).__name__))

mask = self.isna()
if skipna:
values = self[~mask]
elif mask.any():
return NaT
else:
values = self

if not len(values):
# short-circut for empty max / min
return NaT

result = nanops.nanmean(values.view('i8'), skipna=skipna)
# Don't have to worry about NA `result`, since no NA went in.
return self._box_func(result)


# -------------------------------------------------------------------
# Shared Constructor Helpers
Expand Down
1 change: 1 addition & 0 deletions pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ class DatetimeIndexOpsMixin(ExtensionOpsMixin):
_maybe_mask_results = ea_passthrough(
DatetimeLikeArrayMixin._maybe_mask_results)
__iter__ = ea_passthrough(DatetimeLikeArrayMixin.__iter__)
mean = ea_passthrough(DatetimeLikeArrayMixin.mean)

@property
def freq(self):
Expand Down
1 change: 1 addition & 0 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ class DatetimeIndex(DatetimeIndexOpsMixin, Int64Index, DatetimeDelegateMixin):
to_frame
month_name
day_name
mean

See Also
--------
Expand Down
1 change: 1 addition & 0 deletions pandas/core/indexes/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ class TimedeltaIndex(DatetimeIndexOpsMixin, dtl.TimelikeOps, Int64Index,
floor
ceil
to_frame
mean

See Also
--------
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -3729,6 +3729,10 @@ def _reduce(self, op, name, axis=0, skipna=True, numeric_only=None,
elif is_datetime64_dtype(delegate):
# use DatetimeIndex implementation to handle skipna correctly
delegate = DatetimeIndex(delegate)
elif is_timedelta64_dtype(delegate) and hasattr(TimedeltaIndex, name):
# use TimedeltaIndex to handle skipna correctly
# TODO: remove hasattr check after TimedeltaIndex has `std` method
delegate = TimedeltaIndex(delegate)

# dispatch to numpy arrays
elif isinstance(delegate, np.ndarray):
Expand Down
41 changes: 41 additions & 0 deletions pandas/tests/frame/test_analytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -1205,6 +1205,47 @@ def test_mean_corner(self, float_frame, float_string_frame):
means = float_frame.mean(0)
assert means['bool'] == float_frame['bool'].values.mean()

def test_mean_datetimelike(self):
# GH#24757 check that datetimelike are excluded by default, handled
# correctly with numeric_only=True

df = pd.DataFrame({
'A': np.arange(3),
'B': pd.date_range('2016-01-01', periods=3),
'C': pd.timedelta_range('1D', periods=3),
'D': pd.period_range('2016', periods=3, freq='A')
})
result = df.mean(numeric_only=True)
expected = pd.Series({'A': 1.})
tm.assert_series_equal(result, expected)

result = df.mean()
expected = pd.Series({
'A': 1.,
'C': df.loc[1, 'C']
})
tm.assert_series_equal(result, expected)

@pytest.mark.xfail(reason="casts to object-dtype and then tries to "
"add timestamps",
raises=TypeError, strict=True)
def test_mean_datetimelike_numeric_only_false(self):
df = pd.DataFrame({
'A': np.arange(3),
'B': pd.date_range('2016-01-01', periods=3),
'C': pd.timedelta_range('1D', periods=3),
'D': pd.period_range('2016', periods=3, freq='A')
})

result = df.mean(numeric_only=False)
expected = pd.Series({
'A': 1,
'B': df.loc[1, 'B'],
'C': df.loc[1, 'C'],
'D': df.loc[1, 'D']
})
tm.assert_series_equal(result, expected)

def test_stats_mixed_type(self, float_string_frame):
# don't blow up
float_string_frame.std(1)
Expand Down
69 changes: 69 additions & 0 deletions pandas/tests/reductions/test_stat_reductions.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,78 @@

import pandas as pd
from pandas import DataFrame, Series
from pandas.core.arrays import DatetimeArray, PeriodArray, TimedeltaArray
import pandas.util.testing as tm


class TestDatetimeLikeStatReductions:

@pytest.mark.parametrize('box', [Series, pd.Index, DatetimeArray])
def test_dt64_mean(self, tz_naive_fixture, box):
tz = tz_naive_fixture

dti = pd.date_range('2001-01-01', periods=11, tz=tz)
# shuffle so that we are not just working with monotone-increasing
dti = dti.take([4, 1, 3, 10, 9, 7, 8, 5, 0, 2, 6])
dtarr = dti._data

obj = box(dtarr)
assert obj.mean() == pd.Timestamp('2001-01-06', tz=tz)
assert obj.mean(skipna=False) == pd.Timestamp('2001-01-06', tz=tz)

# dtarr[-2] will be the first date 2001-01-1
dtarr[-2] = pd.NaT

obj = box(dtarr)
assert obj.mean() == pd.Timestamp('2001-01-06 07:12:00', tz=tz)
assert obj.mean(skipna=False) is pd.NaT

@pytest.mark.parametrize('box', [Series, pd.Index, PeriodArray])
def test_period_mean(self, box):
# GH#24757
dti = pd.date_range('2001-01-01', periods=11)
# shuffle so that we are not just working with monotone-increasing
dti = dti.take([4, 1, 3, 10, 9, 7, 8, 5, 0, 2, 6])

# use hourly frequency to avoid rounding errors in expected results
# TODO: flesh this out with different frequencies
parr = dti._data.to_period('H')
obj = box(parr)
with pytest.raises(TypeError, match="ambiguous"):
obj.mean()
with pytest.raises(TypeError, match="ambiguous"):
obj.mean(skipna=True)

# parr[-2] will be the first date 2001-01-1
parr[-2] = pd.NaT

with pytest.raises(TypeError, match="ambiguous"):
obj.mean()
with pytest.raises(TypeError, match="ambiguous"):
obj.mean(skipna=True)

@pytest.mark.parametrize('box', [Series, pd.Index, TimedeltaArray])
def test_td64_mean(self, box):
tdi = pd.TimedeltaIndex([0, 3, -2, -7, 1, 2, -1, 3, 5, -2, 4],
unit='D')

tdarr = tdi._data
obj = box(tdarr)

result = obj.mean()
expected = np.array(tdarr).mean()
assert result == expected

tdarr[0] = pd.NaT
assert obj.mean(skipna=False) is pd.NaT

result2 = obj.mean(skipna=True)
assert result2 == tdi[1:].mean()

# exact equality fails by 1 nanosecond
assert result2.round('us') == (result * 11. / 10).round('us')


class TestSeriesStatReductions:
# Note: the name TestSeriesStatReductions indicates these tests
# were moved from a series-specific test file, _not_ that these tests are
Expand Down