PERF: Use DataFrame-level reductions in DataFrame.agg with list of funcs by jbrockmendel · Pull Request #65031 · pandas-dev/pandas

jbrockmendel · 2026-04-02T18:25:36Z

Summary

When DataFrame.agg receives a list of string function names (e.g. ["sum", "mean"]), use DataFrame-level reductions per dtype group instead of extracting each column as a Series and calling Series.agg per column.
For a 1000-column DataFrame, this reduces the time for df.agg(["sum"]) from ~110ms to ~0.5ms (~220x speedup).
Falls back to the existing per-column path for non-string functions, duplicate column names, or non-reduction methods.

Test plan

All existing pandas/tests/apply/ tests pass (922 passed)
All pandas/tests/reductions/ tests pass (546 passed)
Verified correctness with mixed dtypes (int/float), extension types (Int64, Float64), string columns, empty DataFrames, duplicate column names, and lambda fallback

🤖 Generated with Claude Code

When DataFrame.agg receives a list of function names (e.g. ["sum"]), use DataFrame-level reductions per dtype group instead of extracting each column as a Series and calling Series.agg per column. closes pandas-dev#45658 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jbrockmendel · 2026-04-09T15:04:54Z

cc @rhshadrach

rhshadrach · 2026-04-10T11:41:13Z

+        # Compute reductions per dtype group to preserve per-column dtypes.
+        # Using to_frame().T for each result avoids the slow
+        # DataFrame(list-of-Series) construction path.
+        groups = obj.columns.groupby(obj.dtypes)  # type: ignore[arg-type]


I like this - but do you feel certain that we can rely on equality of dtypes here? I don't know of any examples that would cause problems, just wondering if there are edge cases where dtypes would give as equal when there is some subtle difference (e.g. time resolution).

As long as it's the case that if two dtypes say they are equal when they are not precisely equal we would call this a bug, I'm good here.

We can't prevent a hypothetical 3rd party EADtype from lying about its equality, but im pretty confident this works as expected for all our dtypes.

rhshadrach

lgtm

rhshadrach · 2026-04-10T21:12:48Z

Thanks @jbrockmendel

…-comparison * upstream/main: PERF: use lookup instead of hash_inner_join for merge with unique right keys (pandas-dev#64691) BUG : update `SeriesGroupBy.ohlc()` to honor `as_index=False` (pandas-dev#65141) PERF: Use DataFrame-level reductions in DataFrame.agg with list of funcs (pandas-dev#65031) DOC: document required external libraries in read_* I/O docstrings (pandas-dev#65143) DOC: improve MultiIndex.is_monotonic_increasing/decreasing docstrings (pandas-dev#65154) BUG: Raise ValueError for non-boolean numeric_only in DataFrame/Series reductions (GH#53098) (pandas-dev#65131) BUG: Timedelta.round() raises ZeroDivisionError when internal unit is 's' and target frequency is sub-second (pandas-dev#64836) ENH: Add replace method to Index (closes pandas-dev#19495) (pandas-dev#65099) PERF: improve StringArray.isna (pandas-dev#57733) BUG: read parquet files with older pytz (DEP: keep lower pytz minimum version) (pandas-dev#65133) DEPR: deprecate dates-with-datetime64 in _maybe_downcast_for_indexing (pandas-dev#64871) DOC: note that DataFrame.values is not writeable (pandas-dev#65142) CLN: Update groupby observed defaults (pandas-dev#65148) PERF: avoid materializing values[indexer] in Block.setitem (pandas-dev#64251) DOC: update GroupBy.sum/min/max See Also sections (pandas-dev#65144)

jbrockmendel added the Performance Memory or execution speed performance label Apr 2, 2026

fix mypy error in _agg_list_like_frame_reductions

215b80c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jbrockmendel marked this pull request as ready for review April 9, 2026 15:04

rhshadrach reviewed Apr 10, 2026

View reviewed changes

rhshadrach added the Apply Apply, Aggregate, Transform, Map label Apr 10, 2026

rhshadrach approved these changes Apr 10, 2026

View reviewed changes

rhshadrach added this to the 3.1 milestone Apr 10, 2026

rhshadrach merged commit 593c2df into pandas-dev:main Apr 10, 2026
51 checks passed

jbrockmendel deleted the perf-45658 branch April 10, 2026 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: Use DataFrame-level reductions in DataFrame.agg with list of funcs#65031

PERF: Use DataFrame-level reductions in DataFrame.agg with list of funcs#65031
rhshadrach merged 2 commits into
pandas-dev:mainfrom
jbrockmendel:perf-45658

jbrockmendel commented Apr 2, 2026

Uh oh!

jbrockmendel commented Apr 9, 2026

Uh oh!

rhshadrach Apr 10, 2026

Uh oh!

jbrockmendel Apr 10, 2026

Uh oh!

rhshadrach left a comment

Uh oh!

Uh oh!

rhshadrach commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

jbrockmendel commented Apr 2, 2026

Summary

Test plan

Uh oh!

jbrockmendel commented Apr 9, 2026

Uh oh!

rhshadrach Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rhshadrach commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants