-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
WIP: implement reductions for DatetimeArray/TimedeltaArray/PeriodArray #23890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your intent to share the same implementation between the arrays and index classes? Having _make_reduction
and _get_reduction_vals
be top-level functions instead of bound methods is a bit unfortunate, but OK if we get to have one implementation.
|
||
def method(self, skipna=True, **kwargs): | ||
if only_timedelta: | ||
raise TypeError('"{meth}" reduction is not valid for {cls}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this. only_timedelta
makes me think we shouldn't raise here if the array is timedelta dtype, but right now it looks like we raise unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops, that should be if only_timedelta and not is_timedelta64_dtype(self)
if vals is NaT: | ||
return NaT | ||
|
||
# Try to minimize floating point error by rounding before casting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does DatetimeIndex do this casting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reductions DatetimeIndex has is min and max, for which it is not relevant.
@@ -460,6 +460,14 @@ def nanmean(values, axis=None, skipna=True, mask=None): | |||
elif is_float_dtype(dtype): | |||
dtype_sum = dtype | |||
dtype_count = dtype | |||
elif is_datetime64_dtype(dtype) or is_datetime64tz_dtype(dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_datetime64_any_dtype
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, but I'm kind of hoping to get rid of that
@@ -460,6 +460,14 @@ def nanmean(values, axis=None, skipna=True, mask=None): | |||
elif is_float_dtype(dtype): | |||
dtype_sum = dtype | |||
dtype_count = dtype | |||
elif is_datetime64_dtype(dtype) or is_datetime64tz_dtype(dtype): | |||
from pandas import DatetimeIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be boxed in an index? Shouldn't values
be a DatetimeArray right now? Or is it not yet since we haven't implemented DTA as an extension array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know this module especially well, but my assumption is that it could be a numpy array at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that Ideally nanmean
will only be called via Series.mean
and our ExtensionArray's .mean
methods. Though I may be missing some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the exception will be Series[datetime64[ns]]
, which may pass an ndarray here...
|
||
from pandas.tseries import frequencies | ||
from pandas.tseries.offsets import DateOffset, Tick | ||
|
||
from .base import ExtensionOpsMixin | ||
|
||
|
||
def _get_reduction_vals(obj, skipna): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you seem to be reinventing the wheel here. we already do all of this in nanops.py for timedelta. I am not sure how this should be integrated here, but this is not the way. (meaning re-write all of the code we already have)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is definitely a Proof Of Concept. For now the main question for you is if you're on board with the idea of bringing the Index reduction signatures in line with everything else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes about the signatures generally. but maybe let's start with min/max as more straightforward? To do what you are suggestnig here will require using nanops (the interface can certainly be here), but the implementation is already there (in nanops).
@@ -795,7 +795,7 @@ def _ndarray_values(self): | |||
def empty(self): | |||
return not self.size | |||
|
|||
def max(self): | |||
def max(self, skipna=True, axis=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Series, these are all (self, axis=None, skipna=None, ...)
.
masked_vals = values | ||
if mask is not None: | ||
masked_vals = values[~mask] | ||
the_mean = DatetimeIndex(masked_vals).mean(skipna=skipna) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls try to follow the patterns in this module
_wrap_results exists for a reason
again i see lots of reinventing the wheel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing in this PR has been updated since previous conversation. I’ll get around to this suggestion before too long.
Closing in favor of #24757. |
Some idiosyncrasies in signatures between Series vs Index make the parametrized test cases ugly. Do we want to adapt the Index reductions to have a
skipna
kwarg like the Series and EA methods?Still needs tests for timedelta and period dtypes.