Timezones silently dropped in parsing #18702

jbrockmendel · 2017-12-09T05:04:34Z

TLDR: pandas should pass a tzinfos kwarg to the dateutil parser using sensible defaults.

dateutil has a bug that silently drops most timezones. That bug is inherited by pandas. The following is run on a machine located in US/Pacific:

>>> pd.Timestamp('2017-12-08 08:20 PM PST')     # <-- only parsed correctly because of locale
Timestamp('2017-12-08 20:20:00-0800', tz='tzlocal()')
>>> pd.Timestamp('2017-12-08 08:20 PM EST')     # <-- timezone silently dropped
Timestamp('2017-12-08 20:20:00')

There is a partial fix in progress over at dateutil, the most likely outcome of which is that these cases will raise in the future unless a tzinfos kwarg is explicitly passed to dateutil.parser.parse. The issue for pandas is then to decide on what tzinfos to pass (a suggestion to handle the most common use cases by default within dateutil went nowhere).

The tzinfos kwarg is a dictionary taking a string and returning a tzinfo object, e.g.

unambiguous_tzinfos = {
    'PDT': dateutil.tz.gettz('US/Pacific'),
    'PT': dateutil.tz.gettz('US/Pacific'),
    'MDT': dateutil.tz.gettz('US/Mountain'),
    'MT': dateutil.tz.gettz('US/Mountain'),
    'ET': dateutil.tz.gettz('US/Eastern'),
    'CET': dateutil.tz.gettz('Europe/Amsterdam),
    'NZDT': dateutil.tz.gettz('Pacific/Auckland')}

This example includes only abbreviations for which there are no other alternatives listed here. So e.g. "CST" is excluded since it could also be "China Standard Time", "EST" is excluded since it could refer to "Australian Eastern Standard Time". Note this is only a subset of the unambiguous abbreviations.

The text was updated successfully, but these errors were encountered:

jreback · 2017-12-09T14:34:59Z

hmm ok, I would rather hand off non-iso 8601 parsing to dateutil directly, so this would qualitfy. note that this only when format is not passed and in a very limited set of cases.

jbrockmendel · 2017-12-09T20:26:38Z

I'd prefer that dateutil handle this internally too; my hope is that consensus will develop over there once more people report that it doesn't Just Work. But until then, it's still a nontrivial question of exactly what we want to recognize by default and whether/how to let users customize it.

I see two viable options:

The most convenient thing to do -- at least in my comfortably Anglo-centric seat -- would be to pass defaults for a) abbreviations that are unambiguous and b) abbreviations for the most common timezones, e.g. assume CDT means "Central Daylight Time" and not "Cuba Daylight Time". Users who want to override that would need to do the parsing step before passing to the Timestamp/to_datetime constructor.
Same as 1, but allow users a mechanism to override the tzinfos dict that pandas passes to dateutil.

jreback · 2017-12-09T21:04:24Z

we shouldn’t be hard coding any time zones
i would think u can simply pull out the string and just try to localize

jbrockmendel · 2017-12-09T23:26:02Z

i would think u can simply pull out the string and just try to localize

Can you expand on that? Are you suggesting users should do this before passing to Timestamp/to_datetime?

jreback · 2017-12-10T00:00:16Z

of course not

when parsing if u hit something that looks like a tz
rather than an offset u can simply take the string and localize

jbrockmendel · 2017-12-10T00:04:57Z

of course not

Good. That seemed unlikely (and altogether silly).

when parsing if u hit something that looks like a tz rather than an offset u can simply take the string and localize

It's the "simply" that I'm having trouble with. here. This sounds like you're suggesting the parsing be done within pandas, which I thought was what we're trying to avoid. Can you give an example of what you have in mind?

jreback added Bug Difficulty Intermediate Timezones Timezone data dtype labels Dec 9, 2017

jreback added this to the Next Major Release milestone Dec 9, 2017

mroeschke mentioned this issue Sep 9, 2018

Inconsistency in array_to_datetime "utc_convert" #19623

Closed

jbrockmendel mentioned this issue Dec 5, 2018

WIP/ENH: Pass tzinfos to dateutil parser #24104

Closed

7 tasks

mroeschke mentioned this issue Jun 25, 2019

to_excel Mishandles Mixing of tz-aware datetimes #27008

Closed

jbrockmendel removed Effort Medium labels Oct 21, 2019

mroeschke mentioned this issue Jan 4, 2021

BUG: UnknownTimezoneWarning from df.to_datetime() #38928

Closed

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

mroeschke mentioned this issue Nov 17, 2022

BUG: parsing ISO8601 string with format= and timezone name fails #49747

Merged

6 tasks

mroeschke mentioned this issue Jan 17, 2023

DEPR: parsing tzlocal depending on user's system timezone #50791

Closed

This was referenced Feb 18, 2023

BUG: inferring incorrect datetime format #51476

Closed

DEPR: silently ignoring unrecognized timezones #51477

Merged

MarcoGorelli closed this as completed in #51477 Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timezones silently dropped in parsing #18702

Timezones silently dropped in parsing #18702

jbrockmendel commented Dec 9, 2017

jreback commented Dec 9, 2017

jbrockmendel commented Dec 9, 2017

jreback commented Dec 9, 2017

jbrockmendel commented Dec 9, 2017

jreback commented Dec 10, 2017

jbrockmendel commented Dec 10, 2017

Timezones silently dropped in parsing #18702

Timezones silently dropped in parsing #18702

Comments

jbrockmendel commented Dec 9, 2017

jreback commented Dec 9, 2017

jbrockmendel commented Dec 9, 2017

jreback commented Dec 9, 2017

jbrockmendel commented Dec 9, 2017

jreback commented Dec 10, 2017

jbrockmendel commented Dec 10, 2017