Skip to content

BUG: DataFrame.replace with dict doesn't work when value=None #46606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
zeromh opened this issue Apr 1, 2022 · 6 comments · Fixed by #46930
Closed
2 of 3 tasks

BUG: DataFrame.replace with dict doesn't work when value=None #46606

zeromh opened this issue Apr 1, 2022 · 6 comments · Fixed by #46930
Assignees
Labels
DataFrame DataFrame data structure Docs good first issue replace replace method

Comments

@zeromh
Copy link

zeromh commented Apr 1, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame(dict(a=[1,2,3],
                      b=[1,2,3]))
df.replace({1:5}, value=None) # does not replace values at all
df.replace({1:5}) # correctly replaces 1s with 5s

Issue Description

The documentation for DataFrame.replace says, for the to_replace argument, that:

Dicts can be used to specify different replacement values for different existing values. For example, {'a': 'b', 'y': 'z'} replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way the value parameter should be None.

However, specifying value=None results in no replacement taking place. If I don't specify value at all, it works.

Expected Behavior

I expect (based on the docs) that specifying value=None will allow using a dictionary for replacement.

Installed Versions

INSTALLED VERSIONS

commit : 06d2301
python : 3.8.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.93-87.444.amzn2.x86_64
Version : #1 SMP Thu Jan 20 22:50:50 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.1
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : None
pytest : 7.1.1
hypothesis : None
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.1
IPython : 8.1.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : 0.8.1
fsspec : 2022.3.0
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
sqlalchemy : None
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
zstandard : None

@zeromh zeromh added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 1, 2022
@Kyrpel
Copy link

Kyrpel commented Apr 2, 2022

take

@simonjayhawkins
Copy link
Member

Thanks @zeromh for the report

The documentation for DataFrame.replace says, for the to_replace argument, that:

This does look like a documentation error. probably should be To use a dict in this way the value parameter should be None. -> To use a dict in this way, the optional value parameter should not be given.

since df.replace({1: 5}, value=10) does not replace either, it is not limited to None

also note that if you put a dict in the DataFrame then it still does not work...

df = pd.DataFrame(dict(a=[{1: 5}, 2, 3], b=[1, 2, 3]), dtype=object)
df.replace({1: 5}, value=10)

because this is reserved for a special case where the values to replace are restricted to certain columns

so this does work

df = pd.DataFrame({1: [5, 2, 3], "b": [1, 2, 3]})
df.replace({1: 5}, value=10)

so specifying a mapping for to_replace and also specifying a value for the replacement of the narrowed columns is allowed through the validation in the example in the OP.

This gets more tricky with

df = pd.DataFrame(dict(a=[{1: 5}, 2, 3], b=[1, 2, 3]), dtype=object)
df.replace({"a": {1: 5}}, value=10)

which is then interpreted as a nested replacement dictionary so does not necessarily give the result expected and raises ValueError: Series.replace cannot use dict-like to_replace and non-None value

and if you have a None replacement value where you might expected the result to be cast to object dtype, say

df = pd.DataFrame(dict(a=[1, 2, 3], b=[1, 2, 3]))
df.replace({"a": 1}, value=None)

you get a RecursionError!

whereas

df = pd.DataFrame(dict(a=[1, 2, 3], b=[1, 2, 3]))
df.replace({"a": {1: None}})

does work. The docs for the nested dictionary case also probably need updating too The value parameter should be None to use a nested dict in this way. -> The optional value parameter should not be specified to use a nested dict in this way.

@simonjayhawkins simonjayhawkins added Docs good first issue and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 3, 2022
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Apr 3, 2022
@ivan-afonichkin
Copy link

take

@simonjayhawkins simonjayhawkins added replace replace method DataFrame DataFrame data structure labels Apr 6, 2022
@simonjayhawkins
Copy link
Member

xref #46004 for related issue for Series.

@nkathy
Copy link

nkathy commented Apr 20, 2022

take

@stanleycai95
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFrame DataFrame data structure Docs good first issue replace replace method
Projects
None yet
6 participants