Wrong hour of DST end for Europe TZ #11481

izderadicka · 2015-10-30T11:01:22Z

In Europe DST ends by 2 am (this year on Sunday 2015-10-25), when clock is moved one hour back. However in Timestamp this transition is one hour earlier ( 1 am - which is hour, when transition happens in US I think). pytz works as expected.

See code ( tested in pandas 0.17.0 and 0.16.2):

In [1]: import pandas as pd
In [2]: import pytz
In [3]: from pandas import Timestamp

In [7]: pd.__version__
Out[7]: u'0.17.0'

In [8]: cz_tz=pytz.timezone('Europe/Prague')

In [9]: Timestamp('2015-10-25 00:00', tz=cz_tz)
Out[9]: Timestamp('2015-10-25 00:00:00+0200', tz='Europe/Prague')

In [10]: Timestamp('2015-10-25 01:00', tz=cz_tz)
Out[10]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Prague')

In [11]: Timestamp('2015-10-25 02:00', tz=cz_tz)
Out[11]: Timestamp('2015-10-25 02:00:00+0100', tz='Europe/Prague')

In [13]: from pandas.tseries.offsets import Hour

In [14]: Timestamp('2015-10-25 01:00', tz=cz_tz) + Hour(1)
Out[14]: Timestamp('2015-10-25 02:00:00+0100', tz='Europe/Prague')

In [16]: from datetime import datetime

In [17]: cz_tz.dst(datetime(2015,10,25, 1, 0))
Out[17]: datetime.timedelta(0, 3600)

In [18]: cz_tz.dst(datetime(2015,10,25, 2, 0))
---------------------------------------------------------------------------
AmbiguousTimeError                        Traceback (most recent call last)
<ipython-input-18-4b6d0ae06c09> in <module>()
----> 1 cz_tz.dst(datetime(2015,10,25, 2, 0))

/home/ivan/tmp/pandas-test/local/lib/python2.7/site-packages/pytz/tzinfo.pyc in dst(self, dt, is_dst)
    445             return None
    446         elif dt.tzinfo is not self:
--> 447             dt = self.localize(dt, is_dst)
    448             return dt.tzinfo._dst
    449         else:

/home/ivan/tmp/pandas-test/local/lib/python2.7/site-packages/pytz/tzinfo.pyc in localize(self, dt, is_dst)
    347         # ambiguous case
    348         if is_dst is None:
--> 349             raise AmbiguousTimeError(dt)
    350 
    351         # Filter out the possiblilities that don't match the requested

AmbiguousTimeError: 2015-10-25 02:00:00

In [19]: cz_tz.dst(datetime(2015,10,25, 2, 0), is_dst=True)
Out[19]: datetime.timedelta(0, 3600)

In [20]: cz_tz.dst(datetime(2015,10,25, 2, 0), is_dst=False)
Out[20]: datetime.timedelta(0)

In [21]: cz_tz.dst(datetime(2015,10,25, 3, 0))
Out[21]: datetime.timedelta(0)

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2015-10-30T11:55:23Z

I think it is an issue in the datetime string parsing, instead of the underlying datetime type itself:

In [18]: tz=pytz.timezone('Europe/Brussels')

In [19]: pd.Timestamp('2015-10-25 00:00', tz=tz)
Out[19]: Timestamp('2015-10-25 00:00:00+0200', tz='Europe/Brussels')

In [20]: pd.Timestamp('2015-10-25 01:00', tz=tz)   # <-- this should still be unambiguous and returns just a plain wrong timestamp
Out[20]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Brussels')

In [21]: pd.Timestamp('2015-10-25 03:00', tz=tz)
Out[21]: Timestamp('2015-10-25 03:00:00+0100', tz='Europe/Brussels')

In [22]: pd.Timestamp('2015-10-25 00:00', tz=tz) + pd.Timedelta('1 hour')  # <-- manually adding one hour however gives the correct value
Out[22]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Brussels')

In [23]: pd.Timestamp('2015-10-25 00:00', tz=tz) + pd.Timedelta('2 hour')
Out[23]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Brussels')

In [24]: pd.Timestamp('2015-10-25 00:00', tz=tz) + pd.Timedelta('3 hour')
Out[24]: Timestamp('2015-10-25 02:00:00+0100', tz='Europe/Brussels')

In [25]: pd.Timestamp('2015-10-25 00:00', tz=tz) + pd.Timedelta('4 hour')
Out[25]: Timestamp('2015-10-25 03:00:00+0100', tz='Europe/Brussels')

jreback · 2015-10-30T12:27:16Z

not a bug a all. you are just using this in an incorrect manner. you need to localize a time, simply constructing is exactly that, it just takes what you give it.

In [16]: Timestamp('2015-10-25').tz_localize(cz_tz)
Out[16]: Timestamp('2015-10-25 00:00:00+0200', tz='Europe/Prague')

In [17]: Timestamp('2015-10-25 02:00:00').tz_localize(cz_tz)
AmbiguousTimeError: Cannot infer dst time from Timestamp('2015-10-25 02:00:00'), try using the 'ambiguous' argument

In [18]: Timestamp('2015-10-25 01:00:00').tz_localize(cz_tz)
Out[18]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Prague')

In [19]: Timestamp('2015-10-25 03:00:00').tz_localize(cz_tz)
Out[19]: Timestamp('2015-10-25 03:00:00+0100', tz='Europe/Prague')

In [21]: Timestamp('2015-10-25 02:00:00').tz_localize(cz_tz, ambiguous=False)
Out[21]: Timestamp('2015-10-25 02:00:00+0100', tz='Europe/Prague')

In [22]: Timestamp('2015-10-25 02:00:00').tz_localize(cz_tz, ambiguous=True)
Out[22]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Prague')

jorisvandenbossche · 2015-10-30T12:38:34Z

@jreback I agree that localizing is the better option, but I don't really see how the below can be correct:

In [3]: pd.Timestamp('2015-10-25 01:00', tz='Europe/Brussels')
Out[3]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Brussels')

1 o'clock should not become 2 o'clock as it is both in the same DST part?

jreback · 2015-10-30T12:40:43Z

you misunderstand what is happening.

you are taking a naive time, and simply SETTING it to a tz. you HAVE to localize. If you don't you get the CURRENT time zone.

jorisvandenbossche · 2015-10-30T12:54:16Z

In any case, I agree this is not good documented. It is not really clear how the tz parameter is handled in Timestamp (no docs on that). I would expect it actually does a tz_localize afterwards?

BTW, DatetimeIndex handles this fine:

`In [15]: pd.DatetimeIndex(['2015-10-25 01:00'], tz='Europe/Brussels')
Out[15]: DatetimeIndex(['2015-10-25 01:00:00+02:00'], dtype='datetime64[ns]', freq=None, tz='Europe/Brussels')

and pytz itself also gives another result:

In [26]: tz = pytz.timezone('Europe/Brussels')

In [27]: print tz.localize(datetime.datetime(2015,10,25,1))
2015-10-25 01:00:00+02:00

In [28]: print pd.Timestamp('2015-10-25 01:00', tz=tz)
2015-10-25 02:00:00+02:00

(there is actually nothing ambiguous to pick in this case, 1am can only be 1am)

DatetimeIndex also raised an AmbiguousTimeError for '2015-10-25 02:00' (while Timestamp picks the second occurence), which seems the more correct behaviour.

@jreback It is true I don't really understand what is happening under the hood, but IMO this is not 'incorrect code' from a user perspective. This is perfectly allowed in our API (why is there otherwise a tz argument in a public class?), and is handled correctly in a related class (DatetimeIndex).

jreback · 2015-10-30T13:03:17Z

@jorisvandenbossche

I think this is using a different path for the localization when it is passing thru the Timestamp constructor. It should first create it as naive, THEN localize.

In [1]: pd.DatetimeIndex(['2015-10-25 01:00']).tz_localize(tz='Europe/Brussels')
Out[1]: DatetimeIndex(['2015-10-25 01:00:00+02:00'], dtype='datetime64[ns, Europe/Brussels]', freq=None)

In [2]: pd.DatetimeIndex(['2015-10-25 01:00'], tz='Europe/Brussels')
Out[2]: DatetimeIndex(['2015-10-25 01:00:00+02:00'], dtype='datetime64[ns, Europe/Brussels]', freq=None)

In [3]: Timestamp('2015-10-25 01:00').tz_localize('Europe/Brussels')
Out[3]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Brussels')

In [4]: Timestamp('2015-10-25 01:00',tz='Europe/Brussels')
Out[4]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Brussels')

I suspect the path for a dateutil tz actually works, but this does look incorrect for pytz (e.g. [3] == [4] should be True)

jreback · 2015-10-30T13:04:45Z

our DST experts
cc @rockg
cc @ischwabacher
cc @adamgreenhall
cc @sinhrks

izderadicka · 2015-10-30T13:54:23Z

Thanks for all comments - just from user perspective - it would be quite weird if Timestamp(x, tz=tz) behave differently from Timestamp(x).tz_localize(tz) - the result e in both cases is timestamp in same tz, right? And it should behave correctly - e.g. DST end is is 2 am in this time zone.

ischwabacher · 2015-10-30T14:34:27Z

I agree that this is a bug. This is a fall transition, not a spring one, so unless there's political hoo-hah going on, this is an ambiguous time rather than a nonexistent one. I'm not sure which offset the result should have, but it should have the original local time in that offset. This is regardless of whether you follow the Timestamp(x, tz=tz) or Timestamp(x).tz_localize(tz) route.

Also I'm not sure whether I deserve the designation of "expert". I just have loud opinions. ;D

jreback · 2015-10-30T14:35:43Z

@ischwabacher hahah, ok, let's revise to 'interested' parties!

ischwabacher · 2015-10-30T14:45:50Z

Also, I agree that this is a dupe of #8225.

jreback · 2015-10-30T14:51:05Z

@ischwabacher hmm, isn't #8225 about parsing though? (or is it really just that the parsing is ok, but its locazling incorrectly?)

jorisvandenbossche · 2015-10-30T14:55:32Z

@ischwabacher note that the initial example in this thread is not an ambiguous time. '2015-10-25 01:00' is unambiguously defined, as the transition happens from 03:00 -> 02:00, so only the timestamps between 02:00 and 03:00 are ambiguous.
(but anyway, it has of course something to do with the code that handles this fall transition, regardless of it being an ambiguous time or not)

As a side note, I would also prefer that pd.Timestamp('2015-10-25 02:00', tz='Europe/Brussels') would raise an AmbiguousTimeError (as tz_localize does) instead of using the second occurence. But this is indeed maybe costly to check (and it is the default of pytz's localize, is_dst=False)

ischwabacher · 2015-10-30T15:09:24Z

@jorisvandenbossche I'm not sure about the default handling of an ambiguous time. The latest news from python-dev is PEP 495. One of the themes in the discussion of that PEP was avoiding raising AmbiguousTimeError/NonexistentTimeError. If python-dev wants to avoid exceptions in its environment of individual datetime instances, I think it's definitely worth avoiding in our vectorized context. (FWIW, I think they're wrong, but "they" here refers to Tim and Guido, so the degree of certainty I have in that belief is low.)

@jreback It's definitely a dupe:

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: u'0.17.0'

In [3]: pd.Timestamp('2015-10-25 01:00', tz='Europe/Prague')
Out[3]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Prague')

In [4]: pd.Timestamp('2015-10-25 1:00', tz='Europe/Prague')
Out[4]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Prague')

In [5]: pd.Timestamp('2015-10-25 01:00').tz_localize('Europe/Prague')
Out[5]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Prague')

In [6]: pd.Timestamp('2015-10-25 1:00').tz_localize('Europe/Prague')
Out[6]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Prague')

Relevant:

The problem is in 6732306, presumably somewhere in parse_iso_8601_datetime.

jreback · 2015-10-30T15:11:19Z

@ischwabacher right, so the parsing is a red-herring. ok then!

jreback · 2015-10-30T15:12:26Z

actually I don't think the problem is parsing at all. It is a naive time when parsed. It is when assigning it is localized incorrectly I think.

jorisvandenbossche · 2015-10-30T15:15:45Z

But given that the result is correct/incorrect depending on slight changes in the string format, it seems that it has something to do with the parsing (or triggering another code route, so maybe not in the parsing itself but in only one of the code paths depending on which type of parsing was done):

In [58]:  pd.Timestamp('2015-10-25 01:00', tz='Europe/Prague') 
Out[58]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Prague')   #  <---- incorrect

In [59]:  pd.Timestamp('2015-10-25 1:00', tz='Europe/Prague')
Out[59]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Prague')   #  <---- correct

In [60]:  pd.Timestamp('20151025 01:00', tz='Europe/Prague')
Out[60]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Prague')   #  <---- correct

ischwabacher · 2015-10-30T15:31:55Z

Exactly. If you make the ISO 8601 parser give up, it passes it on to dateutil, which parses it correctly. But you can't get rid of the ISO 8601 parser, because you need a fast path for reading in large CSV files.

But that third example baffles me. I hadn't noticed that before— it's exactly the
/YYYY-MM-DD[ T]hh(:mm(:ss)?)?/ format that fails AFAICT.

jorisvandenbossche · 2015-10-30T15:41:31Z

Further note that the route taken by DatetimeIndex for parsing these ISO 8601 string is also correctly (but don't know if this is a fast path):

In [61]:  pd.DatetimeIndex(['2015-10-25 01:00'], tz='Europe/Prague')
Out[61]: DatetimeIndex(['2015-10-25 01:00:00+02:00'], dtype='datetime64[ns]', freq=None, tz='Europe/Prague')

jreback · 2015-10-30T15:46:36Z

hmm, these use exactly the same parser. the difference is that in the array processing (e.g. [61]), the tz is handled after, while in a Timestamp it is handled as the result of the out_tz_local IIRC variable (which is where the issue is).

ischwabacher · 2015-10-30T16:23:23Z

WTB: debugger that runs same code in parallel with different inputs, breaks when code paths diverge.

ischwabacher · 2015-10-30T20:21:41Z

I don't think that's it, despite the fact that it definitely looks like a bug, since Pacific/Chatham doesn't show this behavior:

In [15]: pd.Timestamp('2015-9-27 03:00:00', tz='Pacific/Chatham')
Out[15]: Timestamp('2015-09-27 03:00:00+1245', tz='Pacific/Chatham')

In [16]: pd.Timestamp('2015-9-27 3:00:00', tz='Pacific/Chatham')
Out[16]: Timestamp('2015-09-27 03:00:00+1245', tz='Pacific/Chatham')

In [17]: pd.Timestamp('2015-9-27 04:00:00', tz='Pacific/Chatham')
Out[17]: Timestamp('2015-09-27 04:00:00+1345', tz='Pacific/Chatham')

In [18]: pd.Timestamp('2015-9-27 4:00:00', tz='Pacific/Chatham')
Out[18]: Timestamp('2015-09-27 04:00:00+1345', tz='Pacific/Chatham')

sinhrks · 2015-10-31T00:12:30Z

As @jreback says, tz localization logic after _string_to_dts (ISO 8601 parser) looks incorrect though Timestamp and DatetimeIndex uses the same parser. Currently _string_to_dts doesn't care DST and regards tz-like string as pytz.FixedOffset.

closes pandas-dev#11481 closes pandas-dev#15777

* BUG: Timestamp doesn't respect tz DST closes #11481 closes #15777 * DOC: add doc-strings to tz_convert/tz_localize in tslib.pyx TST: more tests, xref #15823, xref #11708

jorisvandenbossche added Bug Datetime Datetime data dtype labels Oct 30, 2015

jreback closed this as completed Oct 30, 2015

jreback reopened this Oct 30, 2015

jreback added this to the Next Major Release milestone Oct 30, 2015

jreback added Difficulty Intermediate labels Oct 30, 2015

jreback mentioned this issue Oct 30, 2015

Timestamp constructor parses ISO 8601 incorrectly near DST boundaries #8225

Closed

ischwabacher referenced this issue Oct 30, 2015

BUG: Timestamp cannot parse nanosecond from string

6732306

rockg mentioned this issue Nov 27, 2015

pd.to_datetime("2015-11-18 15:30:00+05:30").tz_localize('UTC').tz_convert('Asia/Kolkata') returns '2015-11-18 16:30:00+0530' #11708

Closed

jreback mentioned this issue Dec 6, 2015

BUG: Parsing offset strings with non-zero minutes was incorrect #11774

Merged

jreback mentioned this issue Mar 22, 2017

Bug: Timestamp removes timezone localization #15777

Closed

mroeschke mentioned this issue Apr 7, 2017

BUG: Correct Timestamp localization with tz near DST (#11481) #15934

Merged

4 tasks

jreback modified the milestones: 0.20.0, Next Major Release Apr 7, 2017

jreback pushed a commit to mroeschke/pandas that referenced this issue Apr 8, 2017

BUG: Timestamp doesn't respect tz DST

44ff21d

closes pandas-dev#11481 closes pandas-dev#15777

jreback closed this as completed in #15934 Apr 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong hour of DST end for Europe TZ #11481

Wrong hour of DST end for Europe TZ #11481

izderadicka commented Oct 30, 2015

jorisvandenbossche commented Oct 30, 2015

jreback commented Oct 30, 2015

jorisvandenbossche commented Oct 30, 2015

jreback commented Oct 30, 2015

jorisvandenbossche commented Oct 30, 2015

jreback commented Oct 30, 2015

jreback commented Oct 30, 2015

izderadicka commented Oct 30, 2015

ischwabacher commented Oct 30, 2015

jreback commented Oct 30, 2015

ischwabacher commented Oct 30, 2015

jreback commented Oct 30, 2015

jorisvandenbossche commented Oct 30, 2015

ischwabacher commented Oct 30, 2015

jreback commented Oct 30, 2015

jreback commented Oct 30, 2015

jorisvandenbossche commented Oct 30, 2015

ischwabacher commented Oct 30, 2015

jorisvandenbossche commented Oct 30, 2015

jreback commented Oct 30, 2015

ischwabacher commented Oct 30, 2015

ischwabacher commented Oct 30, 2015

sinhrks commented Oct 31, 2015

Wrong hour of DST end for Europe TZ #11481

Wrong hour of DST end for Europe TZ #11481

Comments

izderadicka commented Oct 30, 2015

jorisvandenbossche commented Oct 30, 2015

jreback commented Oct 30, 2015

jorisvandenbossche commented Oct 30, 2015

jreback commented Oct 30, 2015

jorisvandenbossche commented Oct 30, 2015

jreback commented Oct 30, 2015

jreback commented Oct 30, 2015

izderadicka commented Oct 30, 2015

ischwabacher commented Oct 30, 2015

jreback commented Oct 30, 2015

ischwabacher commented Oct 30, 2015

jreback commented Oct 30, 2015

jorisvandenbossche commented Oct 30, 2015

ischwabacher commented Oct 30, 2015

jreback commented Oct 30, 2015

jreback commented Oct 30, 2015

jorisvandenbossche commented Oct 30, 2015

ischwabacher commented Oct 30, 2015

jorisvandenbossche commented Oct 30, 2015

jreback commented Oct 30, 2015

ischwabacher commented Oct 30, 2015

ischwabacher commented Oct 30, 2015

sinhrks commented Oct 31, 2015