Skip to content

Wrong hour of DST end for Europe TZ #11481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
izderadicka opened this issue Oct 30, 2015 · 23 comments · Fixed by #15934
Closed

Wrong hour of DST end for Europe TZ #11481

izderadicka opened this issue Oct 30, 2015 · 23 comments · Fixed by #15934
Labels
Bug Datetime Datetime data dtype
Milestone

Comments

@izderadicka
Copy link

xref #11708

In Europe DST ends by 2 am (this year on Sunday 2015-10-25), when clock is moved one hour back. However in Timestamp this transition is one hour earlier ( 1 am - which is hour, when transition happens in US I think). pytz works as expected.

See code ( tested in pandas 0.17.0 and 0.16.2):

In [1]: import pandas as pd
In [2]: import pytz
In [3]: from pandas import Timestamp

In [7]: pd.__version__
Out[7]: u'0.17.0'

In [8]: cz_tz=pytz.timezone('Europe/Prague')

In [9]: Timestamp('2015-10-25 00:00', tz=cz_tz)
Out[9]: Timestamp('2015-10-25 00:00:00+0200', tz='Europe/Prague')

In [10]: Timestamp('2015-10-25 01:00', tz=cz_tz)
Out[10]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Prague')

In [11]: Timestamp('2015-10-25 02:00', tz=cz_tz)
Out[11]: Timestamp('2015-10-25 02:00:00+0100', tz='Europe/Prague')

In [13]: from pandas.tseries.offsets import Hour

In [14]: Timestamp('2015-10-25 01:00', tz=cz_tz) + Hour(1)
Out[14]: Timestamp('2015-10-25 02:00:00+0100', tz='Europe/Prague')

In [16]: from datetime import datetime

In [17]: cz_tz.dst(datetime(2015,10,25, 1, 0))
Out[17]: datetime.timedelta(0, 3600)

In [18]: cz_tz.dst(datetime(2015,10,25, 2, 0))
---------------------------------------------------------------------------
AmbiguousTimeError                        Traceback (most recent call last)
<ipython-input-18-4b6d0ae06c09> in <module>()
----> 1 cz_tz.dst(datetime(2015,10,25, 2, 0))

/home/ivan/tmp/pandas-test/local/lib/python2.7/site-packages/pytz/tzinfo.pyc in dst(self, dt, is_dst)
    445             return None
    446         elif dt.tzinfo is not self:
--> 447             dt = self.localize(dt, is_dst)
    448             return dt.tzinfo._dst
    449         else:

/home/ivan/tmp/pandas-test/local/lib/python2.7/site-packages/pytz/tzinfo.pyc in localize(self, dt, is_dst)
    347         # ambiguous case
    348         if is_dst is None:
--> 349             raise AmbiguousTimeError(dt)
    350 
    351         # Filter out the possiblilities that don't match the requested

AmbiguousTimeError: 2015-10-25 02:00:00

In [19]: cz_tz.dst(datetime(2015,10,25, 2, 0), is_dst=True)
Out[19]: datetime.timedelta(0, 3600)

In [20]: cz_tz.dst(datetime(2015,10,25, 2, 0), is_dst=False)
Out[20]: datetime.timedelta(0)

In [21]: cz_tz.dst(datetime(2015,10,25, 3, 0))
Out[21]: datetime.timedelta(0)

@jorisvandenbossche
Copy link
Member

I think it is an issue in the datetime string parsing, instead of the underlying datetime type itself:

In [18]: tz=pytz.timezone('Europe/Brussels')

In [19]: pd.Timestamp('2015-10-25 00:00', tz=tz)
Out[19]: Timestamp('2015-10-25 00:00:00+0200', tz='Europe/Brussels')

In [20]: pd.Timestamp('2015-10-25 01:00', tz=tz)   # <-- this should still be unambiguous and returns just a plain wrong timestamp
Out[20]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Brussels')

In [21]: pd.Timestamp('2015-10-25 03:00', tz=tz)
Out[21]: Timestamp('2015-10-25 03:00:00+0100', tz='Europe/Brussels')

In [22]: pd.Timestamp('2015-10-25 00:00', tz=tz) + pd.Timedelta('1 hour')  # <-- manually adding one hour however gives the correct value
Out[22]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Brussels')

In [23]: pd.Timestamp('2015-10-25 00:00', tz=tz) + pd.Timedelta('2 hour')
Out[23]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Brussels')

In [24]: pd.Timestamp('2015-10-25 00:00', tz=tz) + pd.Timedelta('3 hour')
Out[24]: Timestamp('2015-10-25 02:00:00+0100', tz='Europe/Brussels')

In [25]: pd.Timestamp('2015-10-25 00:00', tz=tz) + pd.Timedelta('4 hour')
Out[25]: Timestamp('2015-10-25 03:00:00+0100', tz='Europe/Brussels')

@jorisvandenbossche jorisvandenbossche added Bug Datetime Datetime data dtype labels Oct 30, 2015
@jreback
Copy link
Contributor

jreback commented Oct 30, 2015

not a bug a all. you are just using this in an incorrect manner. you need to localize a time, simply constructing is exactly that, it just takes what you give it.

In [16]: Timestamp('2015-10-25').tz_localize(cz_tz)
Out[16]: Timestamp('2015-10-25 00:00:00+0200', tz='Europe/Prague')

In [17]: Timestamp('2015-10-25 02:00:00').tz_localize(cz_tz)
AmbiguousTimeError: Cannot infer dst time from Timestamp('2015-10-25 02:00:00'), try using the 'ambiguous' argument

In [18]: Timestamp('2015-10-25 01:00:00').tz_localize(cz_tz)
Out[18]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Prague')

In [19]: Timestamp('2015-10-25 03:00:00').tz_localize(cz_tz)
Out[19]: Timestamp('2015-10-25 03:00:00+0100', tz='Europe/Prague')

In [21]: Timestamp('2015-10-25 02:00:00').tz_localize(cz_tz, ambiguous=False)
Out[21]: Timestamp('2015-10-25 02:00:00+0100', tz='Europe/Prague')

In [22]: Timestamp('2015-10-25 02:00:00').tz_localize(cz_tz, ambiguous=True)
Out[22]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Prague')

@jreback jreback closed this as completed Oct 30, 2015
@jorisvandenbossche
Copy link
Member

@jreback I agree that localizing is the better option, but I don't really see how the below can be correct:

In [3]: pd.Timestamp('2015-10-25 01:00', tz='Europe/Brussels')
Out[3]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Brussels')

1 o'clock should not become 2 o'clock as it is both in the same DST part?

@jreback
Copy link
Contributor

jreback commented Oct 30, 2015

you misunderstand what is happening.

you are taking a naive time, and simply SETTING it to a tz. you HAVE to localize. If you don't you get the CURRENT time zone.

@jorisvandenbossche
Copy link
Member

In any case, I agree this is not good documented. It is not really clear how the tz parameter is handled in Timestamp (no docs on that). I would expect it actually does a tz_localize afterwards?

BTW, DatetimeIndex handles this fine:

`In [15]: pd.DatetimeIndex(['2015-10-25 01:00'], tz='Europe/Brussels')
Out[15]: DatetimeIndex(['2015-10-25 01:00:00+02:00'], dtype='datetime64[ns]', freq=None, tz='Europe/Brussels')

and pytz itself also gives another result:

In [26]: tz = pytz.timezone('Europe/Brussels')

In [27]: print tz.localize(datetime.datetime(2015,10,25,1))
2015-10-25 01:00:00+02:00

In [28]: print pd.Timestamp('2015-10-25 01:00', tz=tz)
2015-10-25 02:00:00+02:00

(there is actually nothing ambiguous to pick in this case, 1am can only be 1am)

DatetimeIndex also raised an AmbiguousTimeError for '2015-10-25 02:00' (while Timestamp picks the second occurence), which seems the more correct behaviour.

@jreback It is true I don't really understand what is happening under the hood, but IMO this is not 'incorrect code' from a user perspective. This is perfectly allowed in our API (why is there otherwise a tz argument in a public class?), and is handled correctly in a related class (DatetimeIndex).

@jreback
Copy link
Contributor

jreback commented Oct 30, 2015

@jorisvandenbossche

I think this is using a different path for the localization when it is passing thru the Timestamp constructor. It should first create it as naive, THEN localize.

In [1]: pd.DatetimeIndex(['2015-10-25 01:00']).tz_localize(tz='Europe/Brussels')
Out[1]: DatetimeIndex(['2015-10-25 01:00:00+02:00'], dtype='datetime64[ns, Europe/Brussels]', freq=None)

In [2]: pd.DatetimeIndex(['2015-10-25 01:00'], tz='Europe/Brussels')
Out[2]: DatetimeIndex(['2015-10-25 01:00:00+02:00'], dtype='datetime64[ns, Europe/Brussels]', freq=None)

In [3]: Timestamp('2015-10-25 01:00').tz_localize('Europe/Brussels')
Out[3]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Brussels')

In [4]: Timestamp('2015-10-25 01:00',tz='Europe/Brussels')
Out[4]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Brussels')

I suspect the path for a dateutil tz actually works, but this does look incorrect for pytz (e.g. [3] == [4] should be True)

@jreback jreback reopened this Oct 30, 2015
@jreback jreback added this to the Next Major Release milestone Oct 30, 2015
@jreback
Copy link
Contributor

jreback commented Oct 30, 2015

our DST experts
cc @rockg
cc @ischwabacher
cc @adamgreenhall
cc @sinhrks

@izderadicka
Copy link
Author

Thanks for all comments - just from user perspective - it would be quite weird if Timestamp(x, tz=tz) behave differently from Timestamp(x).tz_localize(tz) - the result e in both cases is timestamp in same tz, right? And it should behave correctly - e.g. DST end is is 2 am in this time zone.

@ischwabacher
Copy link
Contributor

I agree that this is a bug. This is a fall transition, not a spring one, so unless there's political hoo-hah going on, this is an ambiguous time rather than a nonexistent one. I'm not sure which offset the result should have, but it should have the original local time in that offset. This is regardless of whether you follow the Timestamp(x, tz=tz) or Timestamp(x).tz_localize(tz) route.

Also I'm not sure whether I deserve the designation of "expert". I just have loud opinions. ;D

@jreback
Copy link
Contributor

jreback commented Oct 30, 2015

@ischwabacher hahah, ok, let's revise to 'interested' parties!

@ischwabacher
Copy link
Contributor

Also, I agree that this is a dupe of #8225.

@jreback
Copy link
Contributor

jreback commented Oct 30, 2015

@ischwabacher hmm, isn't #8225 about parsing though? (or is it really just that the parsing is ok, but its locazling incorrectly?)

@jorisvandenbossche
Copy link
Member

@ischwabacher note that the initial example in this thread is not an ambiguous time. '2015-10-25 01:00' is unambiguously defined, as the transition happens from 03:00 -> 02:00, so only the timestamps between 02:00 and 03:00 are ambiguous.
(but anyway, it has of course something to do with the code that handles this fall transition, regardless of it being an ambiguous time or not)

As a side note, I would also prefer that pd.Timestamp('2015-10-25 02:00', tz='Europe/Brussels') would raise an AmbiguousTimeError (as tz_localize does) instead of using the second occurence. But this is indeed maybe costly to check (and it is the default of pytz's localize, is_dst=False)

@ischwabacher
Copy link
Contributor

@jorisvandenbossche I'm not sure about the default handling of an ambiguous time. The latest news from python-dev is PEP 495. One of the themes in the discussion of that PEP was avoiding raising AmbiguousTimeError/NonexistentTimeError. If python-dev wants to avoid exceptions in its environment of individual datetime instances, I think it's definitely worth avoiding in our vectorized context. (FWIW, I think they're wrong, but "they" here refers to Tim and Guido, so the degree of certainty I have in that belief is low.)

@jreback It's definitely a dupe:

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: u'0.17.0'

In [3]: pd.Timestamp('2015-10-25 01:00', tz='Europe/Prague')
Out[3]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Prague')

In [4]: pd.Timestamp('2015-10-25 1:00', tz='Europe/Prague')
Out[4]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Prague')

In [5]: pd.Timestamp('2015-10-25 01:00').tz_localize('Europe/Prague')
Out[5]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Prague')

In [6]: pd.Timestamp('2015-10-25 1:00').tz_localize('Europe/Prague')
Out[6]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Prague')

Relevant:

The problem is in 6732306, presumably somewhere in parse_iso_8601_datetime.

@jreback
Copy link
Contributor

jreback commented Oct 30, 2015

@ischwabacher right, so the parsing is a red-herring. ok then!

@jreback
Copy link
Contributor

jreback commented Oct 30, 2015

actually I don't think the problem is parsing at all. It is a naive time when parsed. It is when assigning it is localized incorrectly I think.

@jorisvandenbossche
Copy link
Member

But given that the result is correct/incorrect depending on slight changes in the string format, it seems that it has something to do with the parsing (or triggering another code route, so maybe not in the parsing itself but in only one of the code paths depending on which type of parsing was done):

In [58]:  pd.Timestamp('2015-10-25 01:00', tz='Europe/Prague') 
Out[58]: Timestamp('2015-10-25 02:00:00+0200', tz='Europe/Prague')   #  <---- incorrect

In [59]:  pd.Timestamp('2015-10-25 1:00', tz='Europe/Prague')
Out[59]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Prague')   #  <---- correct

In [60]:  pd.Timestamp('20151025 01:00', tz='Europe/Prague')
Out[60]: Timestamp('2015-10-25 01:00:00+0200', tz='Europe/Prague')   #  <---- correct

@ischwabacher
Copy link
Contributor

Exactly. If you make the ISO 8601 parser give up, it passes it on to dateutil, which parses it correctly. But you can't get rid of the ISO 8601 parser, because you need a fast path for reading in large CSV files.

But that third example baffles me. I hadn't noticed that before— it's exactly the
/YYYY-MM-DD[ T]hh(:mm(:ss)?)?/ format that fails AFAICT.

@jorisvandenbossche
Copy link
Member

Further note that the route taken by DatetimeIndex for parsing these ISO 8601 string is also correctly (but don't know if this is a fast path):

In [61]:  pd.DatetimeIndex(['2015-10-25 01:00'], tz='Europe/Prague')
Out[61]: DatetimeIndex(['2015-10-25 01:00:00+02:00'], dtype='datetime64[ns]', freq=None, tz='Europe/Prague')

@jreback
Copy link
Contributor

jreback commented Oct 30, 2015

hmm, these use exactly the same parser. the difference is that in the array processing (e.g. [61]), the tz is handled after, while in a Timestamp it is handled as the result of the out_tz_local IIRC variable (which is where the issue is).

@ischwabacher
Copy link
Contributor

WTB: debugger that runs same code in parallel with different inputs, breaks when code paths diverge.

@ischwabacher
Copy link
Contributor

I don't think that's it, despite the fact that it definitely looks like a bug, since Pacific/Chatham doesn't show this behavior:

In [15]: pd.Timestamp('2015-9-27 03:00:00', tz='Pacific/Chatham')
Out[15]: Timestamp('2015-09-27 03:00:00+1245', tz='Pacific/Chatham')

In [16]: pd.Timestamp('2015-9-27 3:00:00', tz='Pacific/Chatham')
Out[16]: Timestamp('2015-09-27 03:00:00+1245', tz='Pacific/Chatham')

In [17]: pd.Timestamp('2015-9-27 04:00:00', tz='Pacific/Chatham')
Out[17]: Timestamp('2015-09-27 04:00:00+1345', tz='Pacific/Chatham')

In [18]: pd.Timestamp('2015-9-27 4:00:00', tz='Pacific/Chatham')
Out[18]: Timestamp('2015-09-27 04:00:00+1345', tz='Pacific/Chatham')

@sinhrks
Copy link
Member

sinhrks commented Oct 31, 2015

As @jreback says, tz localization logic after _string_to_dts (ISO 8601 parser) looks incorrect though Timestamp and DatetimeIndex uses the same parser. Currently _string_to_dts doesn't care DST and regards tz-like string as pytz.FixedOffset.

@jreback jreback modified the milestones: 0.20.0, Next Major Release Apr 7, 2017
jreback pushed a commit to mroeschke/pandas that referenced this issue Apr 8, 2017
jreback pushed a commit that referenced this issue Apr 8, 2017
* BUG: Timestamp doesn't respect tz DST

closes #11481
closes #15777

* DOC: add doc-strings to tz_convert/tz_localize in tslib.pyx
TST: more tests, xref #15823, xref #11708
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants