Skip to content

Misbehavior for Dataframe updates on unexisting DateTime index  #28249

@crbl1122

Description

@crbl1122

Code Sample, a copy-pastable example if possible

import pandas as pd
import datetime as dt

df = pd.DataFrame({'Date': ['2017-01-02', '2017-01-03','2017-01-04'], 
                   'T': [10, 11,12], 
                   'RM': [28, 29,30]})
df['Date'] = pd.to_datetime(df.Date,infer_datetime_format=True)
df.set_index('Date', inplace=True)
df = df.asfreq('D')
print(df)
print('Dataframe index of dtype: {} and freq: {}'.format(df.index.dtype_str, df.index.freq))

print("Droping one row")
df = df.drop(df.index[1])
print(df)
print('The new index is of dtype: {} and freq: {}'.format(df.index.dtype_str, df.index.freq))

print('''Let's change in place the unexisting index: 2017-01-03''')
df.loc['2017-01-03', 'RM']=290
print(df)
print('''The dataframe has shape: {} and it's new index is of dtype: {}'''.format(df.shape, df.index.dtype_str))

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution.]

According to Pandas documentation, the updates of a cell value based on index lookup (df.loc and df.at) should work correctly only when the index is existing within dataframe.
The problem I encountered happens when I try to update some cells accessed by DateTime index, in case the index (which is actually a date) does not exist in the dataframe. According to the documentation, an exception should be raised in this case.

What actually happens, is that without raising any exception: 1) Pandas transforms the DateTime index into an object index (thus making it unusable for timeseries processing), 2) insert new rows in the dataframe with the specified new object index and set all columns to Nan, except the updated one.

I solved the above problem, wrapping the update commands in conditional 'If' rules but according to the documentation it seems to be a misbehavior of Pandas.

Expected Output

Output of pd.show_versions()

Details

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.8.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-60-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 4.5.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.17.0
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.8
feather: None
matplotlib: 3.1.0
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml: 4.3.0
bs4: 4.7.1
html5lib: 0.9999999
sqlalchemy: 1.3.3
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.7.0

Metadata

Metadata

Assignees

Labels

DatetimeDatetime data dtypeIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds TestsUnit test(s) needed to prevent regressionsgood first issue

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions