REF: Separate groupby, rolling, and window agg/apply list/dict-like #53986

rhshadrach · 2023-07-03T17:30:22Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Should enable work on #53839.

topper-123 · 2023-07-05T08:34:30Z

pandas/core/apply.py

            # For SeriesGroupBy this matches _obj_with_exclusions
            selected_obj = obj._selected_obj
        else:
+            assert False
            selected_obj = obj._obj_with_exclusions


selected_obj is always the same as obj
here?

Correct; I cleaned this up.

pandas/core/apply.py

topper-123

I like this, but a few comments.

I've always wondered if it is possible to combine compute_(dict|list)_like by converting dictlike into listlikes, like we already do in Series.replace, but that can be for a future discussion.

topper-123 · 2023-07-05T08:39:31Z

pandas/core/apply.py

-        else:
-            selected_obj = obj._selected_obj
-            selection = obj._selection
+        selected_obj = obj


obj here will no longer be a groupby object, but mypy still thinks this can be groupby object. Can you fix the type situation also?

Added an assert here for mypy

pandas/core/apply.py

topper-123 · 2023-07-05T08:54:23Z

pandas/core/apply.py


-class ResamplerWindowApply(Apply):
+class ResamplerWindowApply(GroupByApply):


I would think Resampler is more similar to Groupby than to BaseWindows, because it inherits from BaseGroupBy. Can you explain why it shouldn't be treated in GroupByApply and rename this to ResamplerWindowApply?

Also, this means GroupByApply.obj can be Resampler | BaseWindow, which may not be precise/what we want?

Can we have a class BaseGroupbyApply(Apply) as a parent of both GroupByApply and ResamplerWindowApply and keep the common functionality there in order to have correct type hints?

and rename this to ResamplerWindowApply?

It is named ResamplerWindowApply, so I'm not sure what you're asking.

The main reason to separate this off now is that GroupBy requires the context (to use as_index=True) whereas this does not.

In general, I'm thinking of removing the inheritance structure in this file, keeping a single class (something like ApplyOp or maybe UdfOp) that stores the state we need to pass around and making everything else methods on this class. Currently, the use of inheritance makes it hard to understand the code (both for humans and IDEs).

What do you think of moving toward this structure?

It is named ResamplerWindowApply, so I'm not sure what you're asking.

Sorry, I meant just WindowApply, but you answered my question (i.e. the difference is as_index=True), thanks.

These new methods are very similar to the methods on GroupbyApply, the difference being only the numba kwargs and the context manager. Can you look again if it's possible to avoid the repetition?

What do you think of moving toward this structure?

Fro a surface view if looks to me like it would be simplest to have the current Apply class only be used by Series and DataFrame and let the groupby/window cases be handled in a separate structure. That would simplify the type hints a lot, for example, and maybe the code also. Is that the plan you're describing, or do you have something else in mind?

Can you look again if it's possible to avoid the repetition?

An alternative is to introduce a condition argument to temp_setattr so that optionally setting an attribute is a bit easier. I've taken this approach in the latest revision.

Is that the plan you're describing, or do you have something else in mind?

I was more thinking of still having groupby / window / resample in core.apply, just not having the inheritance structure. I attempted this today and it's a bit of a mixed bag. The code may be more straightforward to follow, but there is also more branching. I think your improvements that will take hold in 3.0 and some general cleanups will improve this code alone; I'm going to hold off on pursuing this more for now.

topper-123 · 2023-07-05T21:57:37Z

pandas/core/apply.py

@@ -465,6 +452,7 @@ def agg_or_apply_dict_like(
        assert op_name in ["agg", "apply"]

        obj = self.obj
+        assert isinstance(obj, (ABCSeries, ABCDataFrame))


Move this method to NDFrameApply?

EDIT: if your idea is to make Apply.obj only be Series | DataFrame then there's no reason to ove because I guess you will want to merge Apply with NDFrameApply later?

I've moved this to NDFrameApply

topper-123 · 2023-07-05T21:58:59Z

pandas/core/apply.py

@@ -343,6 +343,7 @@ def agg_or_apply_list_like(
        self, op_name: Literal["agg", "apply"]
    ) -> DataFrame | Series:
        obj = self.obj
+        assert isinstance(obj, (ABCSeries, ABCDataFrame))


move this ethod to NDFrameApply?

pandas/core/apply.py

topper-123 · 2023-07-05T22:58:31Z

pandas/core/apply.py


-class ResamplerWindowApply(Apply):
+class ResamplerWindowApply(GroupByApply):


It is named ResamplerWindowApply, so I'm not sure what you're asking.

Sorry, I meant just WindowApply, but you answered my question (i.e. the difference is as_index=True), thanks.

These new methods are very similar to the methods on GroupbyApply, the difference being only the numba kwargs and the context manager. Can you look again if it's possible to avoid the repetition?

What do you think of moving toward this structure?

Fro a surface view if looks to me like it would be simplest to have the current Apply class only be used by Series and DataFrame and let the groupby/window cases be handled in a separate structure. That would simplify the type hints a lot, for example, and maybe the code also. Is that the plan you're describing, or do you have something else in mind?

topper-123 · 2023-07-05T22:59:35Z

pandas/core/apply.py

@@ -1309,6 +1353,50 @@ def apply(self):
    def transform(self):
        raise NotImplementedError


This is superfluous now?

Not exactly. We currently have this implemented on the base class Apply. It should not be called for e.g. GroupBy. Without this, if it accidentally gets called it would generate a perhaps confusing error; this is more straightforward to understand. Also - users should never encounter this; only developers.

That said, this is definitely a code smell. I think it may be good to move transform into core.apply.

pandas/core/apply.py

topper-123 · 2023-07-07T12:05:16Z

Just two suggestion about explaining the return value types for compute_list_like and compute_list_like, because return value types can't be expressed in the type system and IMO it's not always super clear how we can know the return value types in result_data from the input data.

Else looks good to me.

topper-123 · 2023-07-07T23:01:17Z

There's a conflict that needs to be resolved, else looks good.

…rate_groupby_apply # Conflicts: # pandas/core/apply.py

topper-123 · 2023-07-09T08:59:11Z

Thanks, @rhshadrach.

rhshadrach added 2 commits July 3, 2023 13:08

REF: Separate groupby, rolling, and window agg/apply list/dict-like

205d3b3

type-hints

e99ed99

rhshadrach added Refactor Internal refactoring of code Apply Apply, Aggregate, Transform, Map labels Jul 3, 2023

rhshadrach added this to the 2.1 milestone Jul 3, 2023

rhshadrach requested a review from topper-123 July 4, 2023 13:26

topper-123 reviewed Jul 5, 2023

View reviewed changes

pandas/core/apply.py Outdated Show resolved Hide resolved

topper-123 reviewed Jul 5, 2023

View reviewed changes

rhshadrach added 2 commits July 5, 2023 09:51

cleanup

9e10456

mypy fixup

fb06909

topper-123 reviewed Jul 5, 2023

View reviewed changes

Rework

3e181aa

topper-123 reviewed Jul 7, 2023

View reviewed changes

pandas/core/apply.py Show resolved Hide resolved

topper-123 reviewed Jul 7, 2023

View reviewed changes

pandas/core/apply.py Outdated Show resolved Hide resolved

docstrings and comment

2d8317d

Merge branch 'main' of https://github.com/pandas-dev/pandas into sepa…

4b5f0fb

…rate_groupby_apply # Conflicts: # pandas/core/apply.py

topper-123 merged commit c126eeb into pandas-dev:main Jul 9, 2023

rhshadrach deleted the separate_groupby_apply branch July 9, 2023 21:59


		class ResamplerWindowApply(Apply):
		class ResamplerWindowApply(GroupByApply):

		@@ -1309,6 +1353,50 @@ def apply(self):
		def transform(self):
		raise NotImplementedError

Uh oh!

REF: Separate groupby, rolling, and window agg/apply list/dict-like #53986

REF: Separate groupby, rolling, and window agg/apply list/dict-like #53986

Uh oh!

Conversation

rhshadrach commented Jul 3, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

topper-123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhshadrach Jul 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

topper-123 commented Jul 7, 2023

Uh oh!

topper-123 commented Jul 7, 2023

Uh oh!

topper-123 commented Jul 9, 2023

Uh oh!

Uh oh!

rhshadrach Jul 5, 2023 •

edited

Loading