-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF: Separate groupby, rolling, and window agg/apply list/dict-like #53986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/apply.py
Outdated
# For SeriesGroupBy this matches _obj_with_exclusions | ||
selected_obj = obj._selected_obj | ||
else: | ||
assert False | ||
selected_obj = obj._obj_with_exclusions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selected_obj
is always the same as obj
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct; I cleaned this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, but a few comments.
I've always wondered if it is possible to combine compute_(dict|list)_like
by converting dictlike into listlikes, like we already do in Series.replace
, but that can be for a future discussion.
pandas/core/apply.py
Outdated
else: | ||
selected_obj = obj._selected_obj | ||
selection = obj._selection | ||
selected_obj = obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obj
here will no longer be a groupby object, but mypy still thinks this can be groupby object. Can you fix the type situation also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an assert here for mypy
|
||
class ResamplerWindowApply(Apply): | ||
class ResamplerWindowApply(GroupByApply): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think Resampler
is more similar to Groupby
than to BaseWindows
, because it inherits from BaseGroupBy
. Can you explain why it shouldn't be treated in GroupByApply
and rename this to ResamplerWindowApply
?
Also, this means GroupByApply.obj
can be Resampler | BaseWindow
, which may not be precise/what we want?
Can we have a class BaseGroupbyApply(Apply)
as a parent of both GroupByApply
and ResamplerWindowApply
and keep the common functionality there in order to have correct type hints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and rename this to
ResamplerWindowApply
?
It is named ResamplerWindowApply
, so I'm not sure what you're asking.
The main reason to separate this off now is that GroupBy requires the context (to use as_index=True
) whereas this does not.
In general, I'm thinking of removing the inheritance structure in this file, keeping a single class (something like ApplyOp
or maybe UdfOp
) that stores the state we need to pass around and making everything else methods on this class. Currently, the use of inheritance makes it hard to understand the code (both for humans and IDEs).
What do you think of moving toward this structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is named
ResamplerWindowApply
, so I'm not sure what you're asking.
Sorry, I meant just WindowApply
, but you answered my question (i.e. the difference is as_index=True
), thanks.
These new methods are very similar to the methods on GroupbyApply
, the difference being only the numba kwargs and the context manager. Can you look again if it's possible to avoid the repetition?
What do you think of moving toward this structure?
Fro a surface view if looks to me like it would be simplest to have the current Apply
class only be used by Series
and DataFrame
and let the groupby/window cases be handled in a separate structure. That would simplify the type hints a lot, for example, and maybe the code also. Is that the plan you're describing, or do you have something else in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you look again if it's possible to avoid the repetition?
An alternative is to introduce a condition
argument to temp_setattr
so that optionally setting an attribute is a bit easier. I've taken this approach in the latest revision.
Is that the plan you're describing, or do you have something else in mind?
I was more thinking of still having groupby / window / resample in core.apply
, just not having the inheritance structure. I attempted this today and it's a bit of a mixed bag. The code may be more straightforward to follow, but there is also more branching. I think your improvements that will take hold in 3.0 and some general cleanups will improve this code alone; I'm going to hold off on pursuing this more for now.
pandas/core/apply.py
Outdated
@@ -465,6 +452,7 @@ def agg_or_apply_dict_like( | |||
assert op_name in ["agg", "apply"] | |||
|
|||
obj = self.obj | |||
assert isinstance(obj, (ABCSeries, ABCDataFrame)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this method to NDFrameApply
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: if your idea is to make Apply.obj
only be Series | DataFrame
then there's no reason to ove because I guess you will want to merge Apply
with NDFrameApply
later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved this to NDFrameApply
pandas/core/apply.py
Outdated
@@ -343,6 +343,7 @@ def agg_or_apply_list_like( | |||
self, op_name: Literal["agg", "apply"] | |||
) -> DataFrame | Series: | |||
obj = self.obj | |||
assert isinstance(obj, (ABCSeries, ABCDataFrame)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this ethod to NDFrameApply?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
class ResamplerWindowApply(Apply): | ||
class ResamplerWindowApply(GroupByApply): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is named
ResamplerWindowApply
, so I'm not sure what you're asking.
Sorry, I meant just WindowApply
, but you answered my question (i.e. the difference is as_index=True
), thanks.
These new methods are very similar to the methods on GroupbyApply
, the difference being only the numba kwargs and the context manager. Can you look again if it's possible to avoid the repetition?
What do you think of moving toward this structure?
Fro a surface view if looks to me like it would be simplest to have the current Apply
class only be used by Series
and DataFrame
and let the groupby/window cases be handled in a separate structure. That would simplify the type hints a lot, for example, and maybe the code also. Is that the plan you're describing, or do you have something else in mind?
pandas/core/apply.py
Outdated
@@ -1309,6 +1353,50 @@ def apply(self): | |||
def transform(self): | |||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is superfluous now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly. We currently have this implemented on the base class Apply
. It should not be called for e.g. GroupBy. Without this, if it accidentally gets called it would generate a perhaps confusing error; this is more straightforward to understand. Also - users should never encounter this; only developers.
That said, this is definitely a code smell. I think it may be good to move transform into core.apply
.
Just two suggestion about explaining the return value types for Else looks good to me. |
There's a conflict that needs to be resolved, else looks good. |
…rate_groupby_apply # Conflicts: # pandas/core/apply.py
Thanks, @rhshadrach. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Should enable work on #53839.