Skip to content

ENH: add option to tz_localize to return NaT instead of raising a NonExistentTimeError #13057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dancsi opened this issue May 2, 2016 · 7 comments
Labels
API Design Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timezones Timezone data dtype
Milestone

Comments

@dancsi
Copy link
Contributor

dancsi commented May 2, 2016

It would be nice if the tz_localize function of a DatetimeIndex had an optional flag for silently returning NaT instead of throwing a NonExistentTimeError, if the timestamp is not valid in the given timezone (for example due to DST changes).
I ran into this problem while trying to tz_localize a large index, and it seems to me that this would be a much better solution than manually handling the exception with a lambda expression in a (slow) python loop.

@jreback
Copy link
Contributor

jreback commented May 2, 2016

pls show an example

tz_localize already has the ambiguous argument for this purpose

@jreback
Copy link
Contributor

jreback commented May 2, 2016

and pd.show_versions()

@dancsi
Copy link
Contributor Author

dancsi commented May 2, 2016

Here is a minimal example

import pandas as pd

df = pd.DataFrame({'large_series': [pd.Timestamp('2015-03-08 02:30:00')]})
ind = pd.DatetimeIndex(df['large_series']) 
ind = ind.tz_localize('America/Los_Angeles')

(imagine that large_series is indeed a long column, with some timestamps that are invalid)
The error that is thrown:

Traceback (most recent call last):
  File "C:/Dev/temp/pandas_demo.py", line 5, in <module>
    ind = ind.tz_localize('America/Los_Angeles')
  File "C:\Dev\Python35\lib\site-packages\pandas\util\decorators.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Dev\Python35\lib\site-packages\pandas\tseries\index.py", line 1843, in tz_localize
    ambiguous=ambiguous)
  File "pandas\tslib.pyx", line 3914, in pandas.tslib.tz_localize_to_utc (pandas\tslib.c:67511)
pytz.exceptions.NonExistentTimeError: 2015-03-08 02:30:00

Note that the exception is pytz.exceptions.NonExistentTimeError, and not pytz.AmbiguousTimeError, that is handled by the ambiguous flag. It seems that in the current master, this line is responsible.
Finally, the output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: en_US.UTF_8
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.10.1
Cython: 0.23.1
numpy: 1.11.0
scipy: 0.17.0
statsmodels: None
xarray: 0.7.2
IPython: 4.2.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: None
numexpr: 2.5.2
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.0
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None

@jreback
Copy link
Contributor

jreback commented May 2, 2016

so you want a errors='coerce' with the default being 'raise'. which will NaT the datetime.

ok I suppose, though this indicates a fundamental issue that you have. I don't think hiding this is the right answer. How did you generate this in the first place?

cc @rockg
cc @ischwabacher
cc @adamgreenhall

@jreback jreback added Timezones Timezone data dtype API Design Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels May 2, 2016
@dancsi
Copy link
Contributor Author

dancsi commented May 2, 2016

Exactly. I got the data from an external source (here, if you are interested). There are just a few timestamps out of 500k that are a few minutes after 2am on the day when DST becomes active, so I believe they are just an error in the dataset

@jreback
Copy link
Contributor

jreback commented May 2, 2016

ok, unless other objections, I don't see adding a coercion option as a problem. pull-requests welcome!

@dancsi
Copy link
Contributor Author

dancsi commented May 2, 2016

Here it is #13058. Hopefully, I didn't miss anything, as it is my first time contributing to a large OSS project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants