Skip to content

BUG: Fix behavior of replace_list with mixed types. #40555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 1, 2021
Merged

BUG: Fix behavior of replace_list with mixed types. #40555

merged 7 commits into from
Jun 1, 2021

Conversation

hasan-yaman
Copy link
Contributor

Copy link
Member

@mzeitlin11 mzeitlin11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pr, @hasan-yaman!

@mzeitlin11
Copy link
Member

@mzeitlin11 mzeitlin11 added Regression Functionality that used to work in a prior pandas version replace replace method labels Mar 21, 2021
@pytest.mark.parametrize(
"s, to_replace, value, expected",
[
(DataFrame([1]), np.array([1.0]), [0], DataFrame([0])),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to test this for different container types for to_replace rather than only np.array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added lists too. Should I add more types?

@hasan-yaman
Copy link
Contributor Author

Thanks for the comments.

  • I reverted changes in cast.py file since it turns out making changes in cast.py wasn't good idea.
  • I updated the tests.
  • I need some help for the PR.
    # Exclude anything that we know we won't contain
    pairs = [
    (x, y) for x, y in zip(src_list, dest_list) if self._can_hold_element(x)
    ]

Removing if statement here will solve the problem but i am wondering is there are better way to solve it?

@@ -1659,3 +1658,25 @@ def test_replace_bytes(self, frame_or_series):
expected = obj.copy()
obj = obj.replace({None: np.nan})
tm.assert_equal(obj, expected)

@pytest.mark.parametrize(
"s, to_replace, value, expected",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give this a multi-letter name instead of 's'

@simonjayhawkins simonjayhawkins added this to the 1.2.5 milestone Apr 12, 2021
@simonjayhawkins
Copy link
Member

Thanks @hasan-yaman for the PR. can you fix-up failing tests... it appears actual fix has been lost in a subsequent commit and add a release note (will be 1.2.5 but can add to doc\source\whatsnew\v1.2.4.rst in the first instance and move once doc\source\whatsnew\v1.2.5.rst is available)

@jreback
Copy link
Contributor

jreback commented Apr 12, 2021

Thanks @hasan-yaman for the PR. can you fix-up failing tests... it appears actual fix has been lost in a subsequent commit and add a release note (will be 1.2.5 but can add to doc\source\whatsnew\v1.2.4.rst in the first instance and move once doc\source\whatsnew\v1.2.5.rst is available)

this is just a tests, no whatsnew needed.

@simonjayhawkins
Copy link
Member

this is just a tests, no whatsnew needed.

IIUC this PR a regression fix for #40371 but the fix, added in the first commit, has been lost.

@hasan-yaman
Copy link
Contributor Author

Actually, first commit was causing too many problems with the tests, I couldn't fix them. So, I reverted the changes from the first commit.
I am open to any idea to fix the problem. (Maybe others who know pandas better than me can attempt the fix the problem with separate PR.)

@simonjayhawkins
Copy link
Member

I am open to any idea to fix the problem. (Maybe others who know pandas better than me can attempt the fix the problem with separate PR.)

The regression was caused by #38097, so the fix probably involves code changed in that PR. cc @jbrockmendel

@jreback
Copy link
Contributor

jreback commented Apr 13, 2021

this is just a tests, no whatsnew needed.

IIUC this PR a regression fix for #40371 but the fix, added in the first commit, has been lost.

what do you mean? if this doesn't have a patch then the tests shouldn't pass (or is this *already) fixed?

@simonjayhawkins
Copy link
Member

the regression is not fixed.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label May 14, 2021
@simonjayhawkins
Copy link
Member

this is just a tests, no whatsnew needed.

IIUC this PR a regression fix for #40371 but the fix, added in the first commit, has been lost.

what do you mean? if this doesn't have a patch then the tests shouldn't pass (or is this *already) fixed?

The tests are not passing.

@simonjayhawkins
Copy link
Member

simonjayhawkins commented May 24, 2021

to_replace is dispatched to _replace_list if is_list_like(to_replace)

_replace_list only accepts lists, but also need to make sure that the list is python floats

>>> import numpy as np
>>> 
>>> type(np.array([1.0])[0])
<class 'numpy.float64'>
>>> 
>>> type(list(np.array([1.0]))[0])
<class 'numpy.float64'>
>>> 
>>> type(np.array([1.0]).tolist()[0])
<class 'float'>
>>> 

or change _can_hold_element

@simonjayhawkins
Copy link
Member

or change _can_hold_element

this maybe a better option since this is where we have the inconsistency

>>> 
>>> pd.core.dtypes.cast.can_hold_element(arr=pd.Series([1]).values, element=1.0)
True
>>> 
>>> pd.core.dtypes.cast.can_hold_element(
...     arr=pd.Series([1]).values, element=np.array([1.0])[0]
... )
False
>>> 

@simonjayhawkins
Copy link
Member

or change _can_hold_element

this maybe a better option since this is where we have the inconsistency

changed my mind again! since we have a mature backport branch and may not have any more patch releases on 1.2.x let's not make this change and just patch the code that was added in #38097 to convert a numpy array to a list of python objects.

([1.0], [1], [0], [0.0]),
],
)
@pytest.mark.parametrize("box", [list, tuple, np.array])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to test pd.array too? Better safe than sorry. :)

Copy link
Member

@simonjayhawkins simonjayhawkins May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would fail. EA is not a documented allowed type for the to_replace argument of the replace method. to be fair, nor is numpy array or tuple. The safest way to restore the old undocumented behavior is to revert #38097, but not sure that a performance hit for users passing the correct type is the way forward.

Any container, even a list, with numpy scalars will still fail. This is a fix for a specific case for an arguably incorrect usage without a perf hit for correct usage.

@simonjayhawkins
Copy link
Member

@jreback can you take a look. not the best fix, but will be fixed properly ( I think) in #41644, which would be updated to remove the code added here on master straight after this is backported.

# until the issue is fixed properly in can_hold_element

# error: "Iterable[Any]" has no attribute "tolist"
if hasattr(src_list, "tolist"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could maybe change this to a simple isinstance(src_list, np.array)

@simonjayhawkins
Copy link
Member

will merge (and backport) this shortly if no further comments

@jreback
Copy link
Contributor

jreback commented Jun 1, 2021

yep looks ok (rebase if u can though)

@simonjayhawkins
Copy link
Member

yep looks ok (rebase if u can though)

merged master an hour ago, no commits to master since then

@simonjayhawkins simonjayhawkins merged commit 9231b49 into pandas-dev:master Jun 1, 2021
@simonjayhawkins
Copy link
Member

Thanks @hasan-yaman

@lumberbot-app

This comment has been minimized.

simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this pull request Jun 1, 2021
simonjayhawkins added a commit that referenced this pull request Jun 1, 2021
TLouf pushed a commit to TLouf/pandas that referenced this pull request Jun 1, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version replace replace method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Change in behavior of replace with integer series and float to_replace
7 participants