Skip to content

DOC/CLN: move NDFrame.groupby to (DataFrame|Series).groupby #30314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 1, 2020

Conversation

topper-123
Copy link
Contributor

By moving this method up to dataFrame/Series, we get better doc strings and more precise type hints.

@topper-123 topper-123 force-pushed the move_ndframe_groupby branch 2 times, most recently from f64fe81 to fdf3e8c Compare December 18, 2019 00:06
@WillAyd
Copy link
Member

WillAyd commented Dec 18, 2019

Makes sense. Can we get rid of the get_grouper function altogether as part of this?

@topper-123
Copy link
Contributor Author

get_groupby is used in Resampler, so its used elsewhere.

@topper-123 topper-123 force-pushed the move_ndframe_groupby branch 2 times, most recently from 51868aa to 0d866ce Compare December 18, 2019 02:26
@@ -113,6 +114,7 @@ def _groupby_and_merge(
by = [by]

lby = left.groupby(by, sort=False)
rby = None # type: Optional[groupby.DataFrameGroupBy]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to silence mypy.

group_keys: bool = True,
squeeze: bool = False,
observed: bool = False,
) -> "grp_generic.DataFrameGroupBy":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do from pandas.core.groupby.generic import DataFrameGroupBy? That would make the signature a bit cleaner for end users ("grp_generic" is meaningless for them)

Copy link
Contributor Author

@topper-123 topper-123 Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not possible, because it would give circular import errors.

Maybe in the future, when we we're om min python 3.8, we can do from __future__ import annotations and get it to work, but that's a long way in the future.

EDIT: I've renamed "grp_generic" to "groupby_generic", which is a bit clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not if you put it behind a if TYPE_CHECKING: ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually also don't understand this aspect of importing: what's the difference between from pandas.core.groupby import generic vs from pandas.core.groupby.generic import DataFrameGroupBy? Both import the pandas.core.groupby.generic (and thus run all imports in that file), no?

Copy link
Contributor Author

@topper-123 topper-123 Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I can articulate it, but I understand it like this: If you attach a name to the new module (like you'd do in from pandas.core.groupby.generic import DataFrameGroupBy) you force the pandas.core.groupby.generic module to be run at import time. This causes an error, if the calling module (here frame.py) itself must be run before groupby.generic, casing a circular error.

By not importing a name (here DataFrameGroupby), running the module is delayed, and things work.

@WillAyd
Copy link
Member

WillAyd commented Dec 18, 2019

@topper-123 IIUC the get_groupby usage in Resample sends the object as an argument so with this refactor wouldn't we just be able to change code like:

grouped = get_groupby(subset, by=None, grouper=grouper, axis=self.axis)

To

subset.groupby(by=None, grouper=grouper, axis=self.axis)

? I think would be good to go all the way on that cleanup if so

@topper-123
Copy link
Contributor Author

@WillAyd , good observation, I've removed get_groupby from the code base in the newest version.

I've also renamed grp_generic to groupby_generic to make it's content a bit clearer.

@topper-123
Copy link
Contributor Author

topper-123 commented Dec 18, 2019

@WillAyd , looking more into it, that doesn't work after all, because the signatures to get_groupby and (DataFrame|Series).groupby are different. E.g. .groupby does not have parameters grouper and exclusions etc.

I'll revert the removal of get_groupby.

@topper-123
Copy link
Contributor Author

topper-123 commented Dec 21, 2019

Is this ok to merge? Notice that DataFrame.groupby doesn't have a grouper argument, while DataFrame.GroupBy does, so calling the the two are not equivalent.

@WillAyd
Copy link
Member

WillAyd commented Dec 23, 2019

Hmm so I think would be nice if we replaced get_grouper and aligned signatures; as it currently stands I'm not really sure this adds much though by moving the functions from the base class to subclasses so +/-0 on the change but let's see what @jreback thinks

@jreback
Copy link
Contributor

jreback commented Dec 23, 2019

ok with aligning signatures but in a dedicated PR

@topper-123 topper-123 force-pushed the move_ndframe_groupby branch 2 times, most recently from ce3dc84 to 138185e Compare December 23, 2019 23:44
@jreback jreback added this to the 1.0 milestone Dec 26, 2019
@topper-123
Copy link
Contributor Author

I've rebased the PR.

I'm not sure I understand how the signatures can be aligned: e.g. GroupBy has a grouper argument, which NDFrame.groupby doesn't have. I don't think it's a good idea to add the internal parameters of GroupBy to the public-facing NDFrame.groupby.

@jreback
Copy link
Contributor

jreback commented Jan 1, 2020

I've rebased the PR.

I'm not sure I understand how the signatures can be aligned: e.g. GroupBy has a grouper argument, which NDFrame.groupby doesn't have. I don't think it's a good idea to add the internal parameters of GroupBy to the public-facing NDFrame.groupby.

yeah I am now confused on what was said about aligning things. The Grouper and GroupBy are different things here.

thanks for the PR.

@jreback jreback merged commit 3c8030f into pandas-dev:master Jan 1, 2020
@topper-123 topper-123 deleted the move_ndframe_groupby branch January 1, 2020 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants