Skip to content

API: NaT/nan mixed input coerces to timedelta64 #13467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sinhrks opened this issue Jun 16, 2016 · 5 comments · Fixed by #13477
Closed

API: NaT/nan mixed input coerces to timedelta64 #13467

sinhrks opened this issue Jun 16, 2016 · 5 comments · Fixed by #13477
Labels
API Design Clean Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@sinhrks
Copy link
Member

sinhrks commented Jun 16, 2016

Code Sample, a copy-pastable example if possible

Index/Series results in datetime64 dtype if all inputs are pd.NaT.

pd.Index([pd.NaT, pd.NaT])
# DatetimeIndex(['NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

pd.Series([pd.NaT, pd.NaT])
#0   NaT
#1   NaT
# dtype: datetime64[ns]

However, it coerces to timedelta64 if np.nan is included.

pd.Index([pd.NaT, np.nan])
# TimedeltaIndex([NaT, NaT], dtype='timedelta64[ns]', freq=None)

pd.Series([pd.NaT, np.nan])
#0   NaT
#1   NaT
# dtype: timedelta64[ns]

This looks to be caused by following internals:

pd.tslib.is_timestamp_array(np.array([pd.NaT, np.nan]))
# False
pd.lib.is_timedelta_array(np.array([pd.NaT, np.nan]))
# True

Expected Output

datetime64 dtype?

output of pd.show_versions()

on current master

@sinhrks sinhrks added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Dtype Conversions Unexpected or buggy dtype conversions API Design labels Jun 16, 2016
@jreback
Copy link
Contributor

jreback commented Jun 16, 2016

tslib.is_timestamp_array looks totally out of place / is old, need to remove that and replace with routine from src/inference.pyx (which is where is_timedelta_array is defined, along with all other inference routines); these get put into lib.pyx at compile time.

but your result should be the same (e.g. M8[ns])

@sinhrks
Copy link
Member Author

sinhrks commented Jun 16, 2016

OK. Also there is no tests to guarantee the behavior. Will submit a PR.

@sinhrks sinhrks changed the title BUG/API: NaT/nan mixed input coerces to timedelta64 API: NaT/nan mixed input coerces to timedelta64 Jun 16, 2016
@sinhrks sinhrks added Clean and removed Bug labels Jun 16, 2016
@sinhrks
Copy link
Member Author

sinhrks commented Jun 19, 2016

Found there are inconsistencies depending on the order. Is it OK that all these must be datetime64?

# OK
pd.Index([np.nan, pd.NaT])
# DatetimeIndex(['NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

# NG, must be DatetimeIndex
pd.Index([pd.NaT, np.nan])
# TimedeltaIndex([NaT, NaT], dtype='timedelta64[ns]', freq=None)
# OK
pd.lib.infer_dtype([np.nan, pd.NaT])
# 'datetime64'

# NG, msut be 'datetime64'
pd.lib.infer_dtype([pd.NaT, np.nan])
# 'mixed'

@sinhrks
Copy link
Member Author

sinhrks commented Jun 19, 2016

Actually it is not simple than I initially thought. On current master, we don't distinguish datetime64/timedelta64 NaT. Thus below input is regarded as timedelta.

pd.lib.infer_dtype(np.array([np.datetime64('nat'), pd.Timedelta('1 day')]))
# 'timedelta'

However, if input is all nan/nat like, it is inferred using last element.

pd.lib.infer_dtype(np.array([np.datetime64('nat'), np.timedelta64('nat')], dtype=object))
# 'timedelta'

pd.lib.infer_dtype(np.array([np.timedelta64('nat'), np.datetime64('nat')], dtype=object))
# 'datetime64'

If we regard np.datetime/timedelta nat as the same as pd.NaT, the both results should be "datetime64". Otherwise, np.datetime/timedelta mixture should be infered as "mixed".

@jreback
Copy link
Contributor

jreback commented Jun 24, 2016

ahh I c. Since we have a single pd.NaT for both M8 and m8, I think we need to keep that all nan -> M8 if not-specified.

@jreback jreback added this to the Next Major Release milestone Jun 24, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.18.2, Next Major Release Jul 6, 2016
nateGeorge pushed a commit to nateGeorge/pandas that referenced this issue Aug 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Clean Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
3 participants