Skip to content

BUG: Can't pass arguments to DataFrameGroupBy.agg when using list/dict-like and string aliases #53839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
sartajsartaj opened this issue Jun 25, 2023 · 3 comments
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Reduction Operations sum, mean, min, max, etc.

Comments

@sartajsartaj
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

when i apply sum() on groupby object there is a option to select numeric_only columns like :

movies.groupby('Genre').sum(numeric_only = True) # movies here is imdb-top-1000.csv

but if I want to apply multiple aggregate function on ALL numeric_only columns, there is no option to write "numeric_only = True" anywhere according to documentation.
movies.groupby('Genre').agg( ['min', 'max', 'mean'] ) # This code gives error in 2.0.2 version of pandas

This wasn't a problem in earlier version (version 1.5) of pandas as it used to automatically select numerical columns.
Please fix this !
At least provide some option to write numeric_only = True like :
movies.groupby('Genre').agg( ['min', 'max', 'mean'] , numeric_only = True)

Feature Description

At least provide some option to write numeric_only = True like :
movies.groupby('Genre').agg( ['min', 'max', 'mean'] , numeric_only = True)

Alternative Solutions

At least provide some option to write numeric_only = True like :
movies.groupby('Genre').agg( ['min', 'max', 'mean'] , numeric_only = True)

Additional Context

No response

@sartajsartaj sartajsartaj added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 25, 2023
@samukweku
Copy link
Contributor

you can use a lambda function to pass the extra arguments, instead of strings. You can also prefilter, selecting only numeric columns, before running the groupby operation

@rhshadrach
Copy link
Member

you can use a lambda function to pass the extra arguments, instead of strings. You can also prefilter, selecting only numeric columns, before running the groupby operation

Agreed on each of these workarounds, though the former will be significantly ill-performant. We allow passing numeric_only on non-list inputs (e.g. gb.agg("sum", numeric_only=True)), and many other groupby-ops, I think we should be doing the same here.

The fix for #45658 would address this (internally we'd be using DataFrameGroupBy.sum instead of SeriesGroupBy.sum), but is currently blocked by the DataFrame case because of a transpose that happens with the results. That transpose behavior does not occur in groupby, and so we should be able to implement the fix proposed for #45658. First we'll need to split off the handling of groupby from that of DataFrame (I was already strongly leaning toward doing this).

cc @topper-123

@rhshadrach rhshadrach added Bug Groupby Apply Apply, Aggregate, Transform, Map Reduction Operations sum, mean, min, max, etc. and removed Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 25, 2023
@rhshadrach rhshadrach self-assigned this Jun 25, 2023
@rhshadrach rhshadrach changed the title ENH: BUG: Can't pass arguments to DataFrameGroupBy.agg when using list/dict-like and string aliases Jun 27, 2023
@topper-123
Copy link
Contributor

I think we should be doing the same here.

Yeah, I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

No branches or pull requests

4 participants