Skip to content

BUG/ENH: Handle NonexistentTimeError in date rounding #23406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 2, 2018

Conversation

mroeschke
Copy link
Member

Similar strategy as #22647, added a nonexistent keyword argument to round, ceil, and floor to control rounding when encountering a NonexistentTimeError

This is also fixes a bug in the nonexistent='shift' implementation in #22644 where dates with timezones with negative UTC offsets got shifted by an additional hour. This bug is naturally tested by these rounding tests

@pep8speaks
Copy link

Hello @mroeschke! Thanks for submitting the PR.

@codecov
Copy link

codecov bot commented Oct 29, 2018

Codecov Report

Merging #23406 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23406      +/-   ##
==========================================
+ Coverage   92.18%   92.19%   +<.01%     
==========================================
  Files         161      161              
  Lines       51184    51192       +8     
==========================================
+ Hits        47185    47194       +9     
+ Misses       3999     3998       -1
Flag Coverage Δ
#multiple 90.62% <100%> (ø) ⬆️
#single 42.22% <55.55%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimelike.py 98.01% <100%> (ø) ⬆️
pandas/util/_decorators.py 91.34% <0%> (ø) ⬆️
pandas/core/indexes/multi.py 95.46% <0%> (ø) ⬆️
pandas/util/testing.py 86.84% <0%> (ø) ⬆️
pandas/util/_test_decorators.py 93.24% <0%> (ø) ⬆️
pandas/util/_doctools.py 12.87% <0%> (ø) ⬆️
pandas/compat/pickle_compat.py 75.6% <0%> (ø) ⬆️
pandas/core/indexes/base.py 96.62% <0%> (ø) ⬆️
pandas/core/ops.py 94.25% <0%> (+0.01%) ⬆️
pandas/core/indexes/range.py 95.79% <0%> (+0.02%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2e5994...b31f034. Read the comment docs.

@gfyoung gfyoung added Bug Datetime Datetime data dtype Timezones Timezone data dtype labels Oct 29, 2018
result = getattr(ts, method)(freq, nonexistent='NaT')
assert result is NaT

with pytest.raises(pytz.NonExistentTimeError):
Copy link
Member

@gfyoung gfyoung Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything meaningful in the error message? (question applies for all of your tests)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this just raises NonExistentTimeError: [time that was nonexistent] but I can test for the message as well.

@mroeschke mroeschke added this to the 0.24.0 milestone Oct 29, 2018
@@ -227,6 +227,7 @@ Other Enhancements
- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` in constructor (:issue:`2193`)
- :class:`DatetimeIndex` gained :attr:`DatetimeIndex.timetz` attribute. Returns local time with timezone information. (:issue:`21358`)
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``ambiguous`` argument for handling datetimes that are rounded to ambiguous times (:issue:`18946`)
- :meth:`round`, :meth:`ceil`, and meth:`floor` for :class:`DatetimeIndex` and :class:`Timestamp` now support an ``nonexistent`` argument for handling datetimes that are rounded to nonexistent times (:issue:`22647`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an nonexistent --> a nonexistent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this requires a mini-section to explain behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can link the section in the timeseries.rst that explains nonexistent behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. That works just as well.


with pytest.raises(pytz.NonExistentTimeError,
message='2018-03-11 02:00:00'):
getattr(s.dt, method)(freq, nonexistent='raise')
Copy link
Member

@gfyoung gfyoung Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost inclined to have parameterization on nonexistent, but up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to keep the format of this test. IMO parameterization over nonexistent would obfuscate the test too much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. Just a thought. 👍

More general question, to what extent are we using pytest.raises vs tm.assert_raises_regex ?

cc @jreback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok with pytest.raises to the extent we are just checking the type of the error. so ok here

Copy link
Member

@gfyoung gfyoung Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback : pytest.raises is also checking the error message here...

(that's why I asked about this)

@@ -852,6 +851,8 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
Py_ssize_t i, idx, pos, ntrans, n = len(vals)
int64_t *tdata
int64_t v, left, right, val, v_left, v_right
int64_t remaining_minutes, new_local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add with the previous line here

@@ -852,6 +851,8 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
Py_ssize_t i, idx, pos, ntrans, n = len(vals)
int64_t *tdata
int64_t v, left, right, val, v_left, v_right
int64_t remaining_minutes, new_local
int delta_idx_offset, delta_idx
ndarray[int64_t] result, result_a, result_b, dst_hours
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we use Py_ssize_t for indexers?

@@ -484,6 +484,17 @@ class NaTType(_NaT):
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

.. versionadded:: 0.24.0
nonexistent : 'shift', 'NaT', default 'raise'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if any way to share these doc-strings would be great, maybe templating?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel do you happen to know why for some of these NaT methods we aren't just passing is Timestamp.[method].__doc__ instead of copying these docstrings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the dependency structure. We import NaT in timestznps but not the other way around.

existing time
- 'NaT' will return NaT where there are nonexistent times
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above about the doc-strings

@mroeschke
Copy link
Member Author

Addressed your comments @jreback. I can make another issue between docstring templating between Timestamp and NaT as many methods share the same docstring (not just tz_localize)

@jreback jreback merged commit 2073fc2 into pandas-dev:master Nov 2, 2018
@jreback
Copy link
Contributor

jreback commented Nov 2, 2018

thanks @mroeschke

yes would like to template the doc-strings to avoid duplication.

@mroeschke mroeschke deleted the nonexistent_round branch November 2, 2018 15:18
thoo added a commit to thoo/pandas that referenced this pull request Nov 3, 2018
…xamples

* repo_org/master: (66 commits)
  CLN: doc string (pandas-dev#23469)
  DOC: Add cookbook entry for triangular correlation matrix (GH22840) (pandas-dev#23032)
  add number of Errors, Warnings to scripts/validate_docstrings.py (pandas-dev#23150)
  BUG: Allow freq conversion from dt64 to period (pandas-dev#23460)
  ENH: Add FrozenList.union and .difference (pandas-dev#23394)
  REF: cython cleanup, typing, optimizations (pandas-dev#23464)
  strictness and checks for Timedelta _simple_new (pandas-dev#23433)
  Fixing flake8 problems new to flake8 3.6.0 (pandas-dev#23472)
  DOC: Updating the docstring of Series.dot  (pandas-dev#22890)
  TST: Fixturize series/test_analytics.py (pandas-dev#22755)
  BUG/ENH: Handle NonexistentTimeError in date rounding (pandas-dev#23406)
  PERF: speed up concat on Series by making _get_axis_number() a classmethod (pandas-dev#23404)
  REF: Remove DatetimelikeArrayMixin._shallow_copy (pandas-dev#23430)
  REF: strictness/simplification in DatetimeArray/Index _simple_new (pandas-dev#23431)
  REF: cython cleanup, typing, optimizations (pandas-dev#23456)
  TST: tweak Hypothesis configuration and idioms (pandas-dev#23441)
  BUG: fix HDFStore.append with all empty strings error (GH12242) (pandas-dev#23435)
  TST: Skip 32bit failing IntervalTree tests (pandas-dev#23442)
  BUG: Deprecate nthreads argument (pandas-dev#23112)
  style: fix import format at pandas/core/reshape (pandas-dev#23387)
  ...
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rounding valid timestamps near daylight savings jumps should not throw NonExistentTimeError
5 participants