Skip to content

PERF: Dataframe reductions with EA dtypes #54509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 12, 2023

Conversation

lukemanley
Copy link
Member

Avoids re-checking the _reduce method signature on every block which can be slow. No whatsnew as I believe the method signature check was only added in the 2.1 cycle.

import pandas as pd
import numpy as np

df_wide = pd.DataFrame(np.random.randn(4, 10_000), dtype="float64[pyarrow]")

%timeit df_wide.sum()

# 3.2 s ± 54.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   <- main
# 2.02 s ± 89.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <- PR    

@lukemanley lukemanley added Performance Memory or execution speed performance ExtensionArray Extending pandas with custom dtypes or arrays. Reduction Operations sum, mean, min, max, etc. labels Aug 12, 2023
@lukemanley lukemanley added this to the 2.1 milestone Aug 12, 2023
@phofl phofl merged commit 1f94a1b into pandas-dev:main Aug 12, 2023
@phofl
Copy link
Member

phofl commented Aug 12, 2023

thx @lukemanley

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Aug 12, 2023
phofl pushed a commit that referenced this pull request Aug 12, 2023
…A dtypes) (#54514)

Backport PR #54509: PERF: Dataframe reductions with EA dtypes

Co-authored-by: Luke Manley <[email protected]>
mroeschke pushed a commit to mroeschke/pandas that referenced this pull request Aug 18, 2023
@lukemanley lukemanley deleted the perf-dataframe-reduce-ea branch September 6, 2023 00:54
@wzrycj
Copy link

wzrycj commented Oct 19, 2023

Hello @lukemanley,

Great work on the performance enhancement! I'm still learning and was curious about how you identified this for improvement. Could you share a bit about your process?

Thank you for your time and contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Performance Memory or execution speed performance Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants