PERF: Use DataFrame-level reductions in DataFrame.agg with list of funcs#65031
Merged
Conversation
When DataFrame.agg receives a list of function names (e.g. ["sum"]), use DataFrame-level reductions per dtype group instead of extracting each column as a Series and calling Series.agg per column. closes pandas-dev#45658 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
Author
|
cc @rhshadrach |
rhshadrach
reviewed
Apr 10, 2026
| # Compute reductions per dtype group to preserve per-column dtypes. | ||
| # Using to_frame().T for each result avoids the slow | ||
| # DataFrame(list-of-Series) construction path. | ||
| groups = obj.columns.groupby(obj.dtypes) # type: ignore[arg-type] |
Member
There was a problem hiding this comment.
I like this - but do you feel certain that we can rely on equality of dtypes here? I don't know of any examples that would cause problems, just wondering if there are edge cases where dtypes would give as equal when there is some subtle difference (e.g. time resolution).
As long as it's the case that if two dtypes say they are equal when they are not precisely equal we would call this a bug, I'm good here.
Member
Author
There was a problem hiding this comment.
We can't prevent a hypothetical 3rd party EADtype from lying about its equality, but im pretty confident this works as expected for all our dtypes.
Member
|
Thanks @jbrockmendel |
Sharl0tteIsTaken
added a commit
to Sharl0tteIsTaken/pandas
that referenced
this pull request
Apr 12, 2026
…-comparison * upstream/main: PERF: use lookup instead of hash_inner_join for merge with unique right keys (pandas-dev#64691) BUG : update `SeriesGroupBy.ohlc()` to honor `as_index=False` (pandas-dev#65141) PERF: Use DataFrame-level reductions in DataFrame.agg with list of funcs (pandas-dev#65031) DOC: document required external libraries in read_* I/O docstrings (pandas-dev#65143) DOC: improve MultiIndex.is_monotonic_increasing/decreasing docstrings (pandas-dev#65154) BUG: Raise ValueError for non-boolean numeric_only in DataFrame/Series reductions (GH#53098) (pandas-dev#65131) BUG: Timedelta.round() raises ZeroDivisionError when internal unit is 's' and target frequency is sub-second (pandas-dev#64836) ENH: Add replace method to Index (closes pandas-dev#19495) (pandas-dev#65099) PERF: improve StringArray.isna (pandas-dev#57733) BUG: read parquet files with older pytz (DEP: keep lower pytz minimum version) (pandas-dev#65133) DEPR: deprecate dates-with-datetime64 in _maybe_downcast_for_indexing (pandas-dev#64871) DOC: note that DataFrame.values is not writeable (pandas-dev#65142) CLN: Update groupby observed defaults (pandas-dev#65148) PERF: avoid materializing values[indexer] in Block.setitem (pandas-dev#64251) DOC: update GroupBy.sum/min/max See Also sections (pandas-dev#65144)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DataFrame.aggreceives a list of string function names (e.g.["sum", "mean"]), use DataFrame-level reductions per dtype group instead of extracting each column as a Series and callingSeries.aggper column.df.agg(["sum"])from ~110ms to ~0.5ms (~220x speedup).closes #45658
Test plan
pandas/tests/apply/tests pass (922 passed)pandas/tests/reductions/tests pass (546 passed)🤖 Generated with Claude Code