Skip to content

PERF: Add var to masked arrays #48379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 13, 2022
Merged

PERF: Add var to masked arrays #48379

merged 8 commits into from
Sep 13, 2022

Conversation

phofl
Copy link
Member

@phofl phofl commented Sep 3, 2022

  • closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@phofl phofl added NA - MaskedArrays Related to pd.NA and nullable extension arrays Reduction Operations sum, mean, min, max, etc. labels Sep 3, 2022
# Conflicts:
#	pandas/core/array_algos/masked_reductions.py
#	pandas/core/arrays/masked.py
#	pandas/tests/reductions/test_reductions.py
@@ -787,6 +787,15 @@ def test_mean_masked_overflow(self):
assert result_masked - result_numpy == 0
assert result_masked == 1e17

def test_var_masked_array(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you parameterize over ddof=[0, 1]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@mroeschke mroeschke added this to the 1.6 milestone Sep 13, 2022
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can merge on green after the merge conflict is resolved

# Conflicts:
#	doc/source/whatsnew/v1.6.0.rst
@phofl phofl merged commit 6b396cf into pandas-dev:main Sep 13, 2022
@phofl phofl deleted the masked_var branch September 13, 2022 16:16
@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
@rhshadrach
Copy link
Member

This patch may have induced a potential regression. Please check the links below. If any ASVs are parameterized, the combinations of parameters that a regression has been detected appear as subbullets. This is a partially automated message.

@rhshadrach
Copy link
Member

Here is a more direct link (still need to workout how to incorporate parameterizations in the link):

https://asv-runner.github.io/asv-collection/pandas/#series_methods.NanOps.time_func?p-func='var'&p-dtype='boolean'&p-N=1000000

@phofl
Copy link
Member Author

phofl commented Dec 24, 2022

Hm guess boolean is special casy here. We could cast the array to float, this would get most of the regression back, but not sure if this is worth it

@rhshadrach
Copy link
Member

Taking the variance of Booleans seems pretty rare to me, but at the same time, the solution sounds very easy. I'm thinking to at least open a tracking issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NA - MaskedArrays Related to pd.NA and nullable extension arrays Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants