Skip to content

BUG: series with dtype "Int64" don't work correctly with replace() method #34530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
MarcusJellinghaus opened this issue Jun 2, 2020 · 2 comments · Fixed by #34733
Closed
2 tasks done
Labels
Bug Regression Functionality that used to work in a prior pandas version replace replace method
Milestone

Comments

@MarcusJellinghaus
Copy link

MarcusJellinghaus commented Jun 2, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas. ( Version 1.0.4)


I get an AssertionError from the following code:

Code Sample

Int_series = pd.Series(data=[1, 2, 3], dtype="Int64")
Int_series.replace("", "ABC", inplace=True)  # throws exception
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/sandbox/pandas/pandas/core/internals/blocks.py in replace(self, to_replace, value, inplace, regex, convert)
    731         try:
--> 732             blocks = self.putmask(mask, value, inplace=inplace)
    733             # Note: it is _not_ the case that self._can_hold_element(value)

~/sandbox/pandas/pandas/core/internals/blocks.py in putmask(self, mask, new, inplace, axis, transpose)
   1608
-> 1609         new_values[mask] = new
   1610         return [self.make_block(values=new_values)]

~/sandbox/pandas/pandas/core/arrays/masked.py in __setitem__(self, key, value)
     93             value = [value]
---> 94         value, mask = self._coerce_to_array(value)
     95

~/sandbox/pandas/pandas/core/arrays/integer.py in _coerce_to_array(self, value)
    417     def _coerce_to_array(self, value) -> Tuple[np.ndarray, np.ndarray]:
--> 418         return coerce_to_array(value, dtype=self.dtype)
    419

~/sandbox/pandas/pandas/core/arrays/integer.py in coerce_to_array(values, dtype, mask, copy)
    241     elif not (is_integer_dtype(values) or is_float_dtype(values)):
--> 242         raise TypeError(f"{values.dtype} cannot be converted to an IntegerDtype")
    243

TypeError: <U3 cannot be converted to an IntegerDtype

During handling of the above exception, another exception occurred:

AssertionError                            Traceback (most recent call last)
<ipython-input-3-36a959f691c3> in <module>
      1 Int_series = pd.Series(data=[1, 2, 3], dtype="Int64")
----> 2 Int_series.replace("", "ABC", inplace=True)  # throws exception

~/sandbox/pandas/pandas/core/series.py in replace(self, to_replace, value, inplace, limit, regex, method)
   4449             limit=limit,
   4450             regex=regex,
-> 4451             method=method,
   4452         )
   4453

~/sandbox/pandas/pandas/core/generic.py in replace(self, to_replace, value, inplace, limit, regex, method)
   6662                 elif not is_list_like(value):  # NA -> 0
   6663                     new_data = self._mgr.replace(
-> 6664                         to_replace=to_replace, value=value, inplace=inplace, regex=regex
   6665                     )
   6666                 else:

~/sandbox/pandas/pandas/core/internals/managers.py in replace(self, value, **kwargs)
    592     def replace(self, value, **kwargs) -> "BlockManager":
    593         assert np.ndim(value) == 0, value
--> 594         return self.apply("replace", value=value, **kwargs)
    595
    596     def replace_list(

~/sandbox/pandas/pandas/core/internals/managers.py in apply(self, f, align_keys, **kwargs)
    400                 applied = b.apply(f, **kwargs)
    401             else:
--> 402                 applied = getattr(b, f)(**kwargs)
    403             result_blocks = _extend_blocks(applied, result_blocks)
    404

~/sandbox/pandas/pandas/core/internals/blocks.py in replace(self, to_replace, value, inplace, regex, convert)
    743                 raise
    744
--> 745             assert not self._can_hold_element(value), value
    746
    747             # try again with a compatible block

AssertionError: ABC

Problem description

The code does not work on Pandas Version 1.0.4. It throws an exception. However, it works on Pandas Version 1.0.1, and also with dtype="int64.

Expected Output

No crash. function replace() itself does nothing, since there is no "" in the series.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.0.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.0.4
numpy : 1.18.4
pytz : 2020.1
dateutil : 2.8.1
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : 5.4.2
hypothesis : None
sphinx : 3.0.3
blosc : None
feather : None
xlsxwriter : None

@MarcusJellinghaus MarcusJellinghaus added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 2, 2020
@TomAugspurger
Copy link
Contributor

Added the traceback to the original post.

The line it raises on is #27768 (cc @jbrockmendel).

Not sure what to do here. I think the choices are

  1. remove the assert
  2. only assert for non extension blocks
  3. push the _can_hold_element into the EA interface

@TomAugspurger TomAugspurger added Bug Regression Functionality that used to work in a prior pandas version replace replace method and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 2, 2020
@jbrockmendel
Copy link
Member

IIUC the call to self.putmask (L810 in the linked PR) raises and then the assert not self._can_hold_element(value) (L823) fails?

I think _can_hold_element or something like it will need to be in the EA interface eventually. The same method would also be used for validating a fill_value for e.g. take or fillna or...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version replace replace method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants