Skip to content

ENH: Optimize CoW for fillna with ea dtypes #51411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 25, 2023
Merged

Conversation

phofl
Copy link
Member

@phofl phofl commented Feb 15, 2023

new_values = self.values.fillna(value=value, method=None, limit=limit)
nb = self.make_block_same_class(new_values)
return nb._maybe_downcast([nb], downcast)
if using_cow and self._can_hold_na and not self.values.isna().any():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if using_cow and self._can_hold_na and not self.values.isna().any():
if using_cow and self._can_hold_na and not self.values._hasna:

(for some EAs that can be optimized)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -240,23 +240,26 @@ def test_factorize_empty(self, data):
tm.assert_numpy_array_equal(codes, expected_codes)
self.assert_extension_array_equal(uniques, expected_uniques)

def test_fillna_copy_frame(self, data_missing):
def test_fillna_copy_frame(self, data_missing, using_copy_on_write):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to use the fixture here (in the /extension) tests, then ideally we redefine it in extension/conftest.py, to make it easier for downstream authors.

It might become unavoidable to use it at some point, but for this specific case we could maybe also verify the "copy" by modifying the result and ensuring the parent didn't update. In the end, that's the behaviour we care about for users, the values being identical is more an implementation detail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restructured the tests, but we need using_copy_on_write for the sparse test, so I copied it to conftest.py in extension

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we need using_copy_on_write for the sparse test, so I copied it to conftest.py in extension

Sidenote: if it's only for the sparse test override, and not in the base class definitions, then we don't need it in the conftest.py in extension/ (because it's only needed if it is relevant for external packages subclassing those base test classes as well)

# Conflicts:
#	pandas/tests/copy_view/test_interp_fillna.py
#	pandas/tests/extension/json/test_json.py
@phofl phofl added this to the 2.0 milestone Feb 25, 2023
@phofl
Copy link
Member Author

phofl commented Feb 25, 2023

Merging, happy to address potential comments in a follow up

@phofl phofl merged commit 5de7c76 into pandas-dev:main Feb 25, 2023
@phofl phofl deleted the cow_fillna_ea branch February 25, 2023 22:46
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Feb 25, 2023
phofl added a commit that referenced this pull request Feb 26, 2023
… ea dtypes) (#51639)

Backport PR #51411: ENH: Optimize CoW for fillna with ea dtypes

Co-authored-by: Patrick Hoefler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants