Skip to content

BUG: Fix replace for different dtype equal value #34878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from
Closed

BUG: Fix replace for different dtype equal value #34878

wants to merge 15 commits into from

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Jun 20, 2020

Bug noticed when looking at a different issue #34871 (comment)

@gfyoung gfyoung added Bug DataFrame DataFrame data structure Series Series data structure labels Jun 20, 2020
if inplace:
return [self]
return [self.copy()]
return self.astype(object).replace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only want to do this if these are different types, not generally. please show asv's for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the if not self._can_hold_element(to_replace) check is making sure we have different types.

Not seeing any performance regressions from the change (at least in the replace benchmarks):

(pandas-dev) > asv continuous -f 1.1 upstream/master replace-diff-dtype -b ^replace
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.
·· Installing 1e014777 <replace-diff-dtype> into conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks)
[  0.00%] · For pandas commit 9114b4bc <master> (round 1/2):
[  0.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[  0.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[  4.17%] ··· Running (replace.Convert.time_replace--)....
[ 20.83%] ··· Running (replace.ReplaceList.time_replace_list--)..
[ 25.00%] · For pandas commit 1e014777 <replace-diff-dtype> (round 1/2):
[ 25.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 25.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 29.17%] ··· Running (replace.Convert.time_replace--)....
[ 45.83%] ··· Running (replace.ReplaceList.time_replace_list--)..
[ 50.00%] · For pandas commit 1e014777 <replace-diff-dtype> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 54.17%] ··· replace.Convert.time_replace                                                                                                                                                                 ok
[ 54.17%] ··· ============= ============ ============
              --                   replace_data      
              ------------- -------------------------
               constructor   Timestamp    Timedelta  
              ============= ============ ============
                DataFrame     98.6±3ms    95.2±0.5ms 
                  Series     74.0±0.3ms    74.6±1ms  
              ============= ============ ============

[ 58.33%] ··· replace.FillNa.time_fillna                                                                                                                                                                   ok
[ 58.33%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.04±0.01ms 
                False     3.01±0.3ms 
              ========= =============

[ 62.50%] ··· replace.FillNa.time_replace                                                                                                                                                                  ok
[ 62.50%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.10±0.02ms 
                False    4.02±0.07ms 
              ========= =============

[ 66.67%] ··· replace.ReplaceDict.time_replace_series                                                                                                                                                      ok
[ 66.67%] ··· ========= ============
               inplace              
              --------- ------------
                 True    4.27±0.02s 
                False    4.24±0.04s 
              ========= ============

[ 70.83%] ··· replace.ReplaceList.time_replace_list                                                                                                                                                        ok
[ 70.83%] ··· ========= ============
               inplace              
              --------- ------------
                 True    51.6±0.6μs 
                False     405±5ms   
              ========= ============

[ 75.00%] ··· replace.ReplaceList.time_replace_list_one_match                                                                                                                                              ok
[ 75.00%] ··· ========= ==========
               inplace            
              --------- ----------
                 True    89.2±3ms 
                False    467±4ms  
              ========= ==========

[ 75.00%] · For pandas commit 9114b4bc <master> (round 2/2):
[ 75.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 75.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 79.17%] ··· replace.Convert.time_replace                                                                                                                                                                 ok
[ 79.17%] ··· ============= =========== ===========
              --                  replace_data     
              ------------- -----------------------
               constructor   Timestamp   Timedelta 
              ============= =========== ===========
                DataFrame     102±4ms     103±3ms  
                  Series      80.0±4ms    83.1±4ms 
              ============= =========== ===========

[ 83.33%] ··· replace.FillNa.time_fillna                                                                                                                                                                   ok
[ 83.33%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.10±0.04ms 
                False     3.04±0.1ms 
              ========= =============

[ 87.50%] ··· replace.FillNa.time_replace                                                                                                                                                                  ok
[ 87.50%] ··· ========= =============
               inplace               
              --------- -------------
                 True    1.20±0.04ms 
                False     4.26±0.2ms 
              ========= =============

[ 91.67%] ··· replace.ReplaceDict.time_replace_series                                                                                                                                                      ok
[ 91.67%] ··· ========= ============
               inplace              
              --------- ------------
                 True    4.26±0.01s 
                False    4.24±0.01s 
              ========= ============

[ 95.83%] ··· replace.ReplaceList.time_replace_list                                                                                                                                                        ok
[ 95.83%] ··· ========= ==========
               inplace            
              --------- ----------
                 True    49.6±1μs 
                False    412±6ms  
              ========= ==========

[100.00%] ··· replace.ReplaceList.time_replace_list_one_match                                                                                                                                              ok
[100.00%] ··· ========= ==========
               inplace            
              --------- ----------
                 True    89.3±2ms 
                False    464±30ms 
              ========= ==========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2020

can u merge master and will look

@simonjayhawkins
Copy link
Member

@dsaxton can you move release note

@simonjayhawkins
Copy link
Member

marking as stale 😉

@@ -46,6 +46,10 @@ Bug fixes
- Bug in ``.groupby(..).rolling(..)`` where passing ``closed`` with column selection would raise a ``ValueError`` (:issue:`35549`)
- Bug in :class:`DataFrame` constructor failing to raise ``ValueError`` in some cases when ``data`` and ``index`` have mismatched lengths (:issue:`33437`)

**Reshaping**

- Bug where :meth:`DataFrame.replace` and :meth:`Series.replace` would fail when replacement value was of a different dtype but equal to values in the original object (:issue:`34871`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to 1.2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks similar to regression reported in #35376 with open PR (with different fix) #36444

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's the same issue. If another fix avoids astyping to object I would think it's preferred.

@jreback jreback added this to the 1.2 milestone Sep 19, 2020
@jreback jreback removed the Stale label Sep 19, 2020
@dsaxton
Copy link
Member Author

dsaxton commented Sep 19, 2020

Closing since the other PR looks like a better fix

@dsaxton dsaxton closed this Sep 19, 2020
@dsaxton dsaxton deleted the replace-diff-dtype branch September 19, 2020 14:04
@QuentinN42 QuentinN42 mentioned this pull request Sep 21, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug DataFrame DataFrame data structure Series Series data structure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants