-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Series.replace
converts np.nan
into pd.NaT
implicitly
#48034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, thanks for your report. sorry, tested on 1.3.5 initially. Yes you are correct, this should not convert. Labelling as a regression, since this worked in 1.3.5 |
This is a general issue,
returns
|
Yes, it seems that I wonder if we can disable this behaviour for nans? Don't treat nan's as datetimes therefore don't convert nan's to NaT. |
I looked into this a bit and found, that we change the whole array when we use |
first bad commit: [b1a2f48] BUG: inconsistency in dtype of replace() (#44897) The change looks intentionally but agree with OP that would expect the code sample to be a no-op. |
#46393 (comment) maybe related. |
The change that caused the regression was in Now there is a I assume the last line should have been changed to now that won't fix this issue since there does not appear to be any logic to determine if the op is a no-op based on the regex not matching any values. There is code dotted around that short circuits if some other conditions determine that the op is a no-op. But also stepping back, whether the "fix" to be consistent with |
removing from 1.4.x milestone as I think needs further discussion. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
np.nan
are implicitly converted topd.NaT
when all doing:Series.replace
with regex = True flagpd.Timestamp
ornp.nan
Expected Behavior
This behaviour is inconsistent. The documentation of
Series.replace
suggests that the function only replaces strings matching regex with the to_replace value. I did not expect for it to implicitly coerce values.Expected behaviour is like the
a2
case in the example:np.nan
keeps beingnp.nan
and does not get converted.Where it happens
In L763-764 blocks.py, the
block
variable in L763 hasnp.nan
. But after runningblock.convert(...)
, the result haspd.NaT
. I believe the coercion happens specifically in L490 blocks.py:Installed Versions
INSTALLED VERSIONS
commit : 4bfe3d0
python : 3.9.13.final.0
python-bits : 64
OS : Darwin
OS-release : 21.5.0
Version : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:29 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T8101
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8
pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 62.1.0
Cython : None
pytest : 7.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.3
jinja2 : 3.0.3
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : 0.8.1
fsspec : 2022.02.0
gcsfs : None
markupsafe : 2.1.1
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 6.0.1
pyreadstat : None
pyxlsb : 1.0.9
s3fs : 0.4.2
scipy : None
snappy : None
sqlalchemy : 1.4.36
tables : None
tabulate : 0.8.9
xarray : None
xlrd : 2.0.1
xlwt : None
zstandard : None
The text was updated successfully, but these errors were encountered: