Skip to content

Commit 3f39a5b

Browse files
Merge branch 'pandas-dev:main' into raise-on-parse-int-overflow
2 parents 3d72cf2 + 8b72297 commit 3f39a5b

File tree

41 files changed

+876
-599
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+876
-599
lines changed

.pre-commit-config.yaml

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ repos:
1111
- id: absolufy-imports
1212
files: ^pandas/
1313
- repo: https://github.com/jendrikseipp/vulture
14-
rev: 'v2.4'
14+
rev: 'v2.5'
1515
hooks:
1616
- id: vulture
1717
entry: python scripts/run_vulture.py
@@ -46,20 +46,19 @@ repos:
4646
exclude: ^pandas/_libs/src/(klib|headers)/
4747
args: [--quiet, '--extensions=c,h', '--headers=h', --recursive, '--filter=-readability/casting,-runtime/int,-build/include_subdir']
4848
- repo: https://github.com/PyCQA/flake8
49-
rev: 4.0.1
49+
rev: 5.0.4
5050
hooks:
5151
- id: flake8
5252
additional_dependencies: &flake8_dependencies
53-
- flake8==4.0.1
54-
- flake8-comprehensions==3.7.0
55-
- flake8-bugbear==21.3.2
53+
- flake8==5.0.4
54+
- flake8-bugbear==22.7.1
5655
- pandas-dev-flaker==0.5.0
5756
- repo: https://github.com/PyCQA/isort
5857
rev: 5.10.1
5958
hooks:
6059
- id: isort
6160
- repo: https://github.com/asottile/pyupgrade
62-
rev: v2.34.0
61+
rev: v2.37.3
6362
hooks:
6463
- id: pyupgrade
6564
args: [--py38-plus]
@@ -239,8 +238,8 @@ repos:
239238
types: [pyi]
240239
language: python
241240
additional_dependencies:
242-
- flake8==4.0.1
243-
- flake8-pyi==22.7.0
241+
- flake8==5.0.4
242+
- flake8-pyi==22.8.1
244243
- id: future-annotations
245244
name: import annotations from __future__
246245
entry: 'from __future__ import annotations'

doc/source/whatsnew/v1.5.0.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -287,11 +287,13 @@ Other enhancements
287287
- ``times`` argument in :class:`.ExponentialMovingWindow` now accepts ``np.timedelta64`` (:issue:`47003`)
288288
- :class:`.DataError`, :class:`.SpecificationError`, :class:`.SettingWithCopyError`, :class:`.SettingWithCopyWarning`, :class:`.NumExprClobberingError`, :class:`.UndefinedVariableError`, :class:`.IndexingError`, :class:`.PyperclipException`, :class:`.PyperclipWindowsException`, :class:`.CSSWarning`, :class:`.PossibleDataLossError`, :class:`.ClosedFileError`, :class:`.IncompatibilityWarning`, :class:`.AttributeConflictWarning`, :class:`.DatabaseError, :class:`.PossiblePrecisionLoss, :class:`.ValueLabelTypeMismatch, :class:`.InvalidColumnName, and :class:`.CategoricalConversionWarning` are now exposed in ``pandas.errors`` (:issue:`27656`)
289289
- Added ``check_like`` argument to :func:`testing.assert_series_equal` (:issue:`47247`)
290+
- Add support for :meth:`GroupBy.ohlc` for extension array dtypes (:issue:`37493`)
290291
- Allow reading compressed SAS files with :func:`read_sas` (e.g., ``.sas7bdat.gz`` files)
291292
- :meth:`DatetimeIndex.astype` now supports casting timezone-naive indexes to ``datetime64[s]``, ``datetime64[ms]``, and ``datetime64[us]``, and timezone-aware indexes to the corresponding ``datetime64[unit, tzname]`` dtypes (:issue:`47579`)
292293
- :class:`Series` reducers (e.g. ``min``, ``max``, ``sum``, ``mean``) will now successfully operate when the dtype is numeric and ``numeric_only=True`` is provided; previously this would raise a ``NotImplementedError`` (:issue:`47500`)
293294
- :meth:`RangeIndex.union` now can return a :class:`RangeIndex` instead of a :class:`Int64Index` if the resulting values are equally spaced (:issue:`47557`, :issue:`43885`)
294295
- :meth:`DataFrame.compare` now accepts an argument ``result_names`` to allow the user to specify the result's names of both left and right DataFrame which are being compared. This is by default ``'self'`` and ``'other'`` (:issue:`44354`)
296+
- :class:`Interval` now supports checking whether one interval is contained by another interval (:issue:`46613`)
295297
- :meth:`Series.add_suffix`, :meth:`DataFrame.add_suffix`, :meth:`Series.add_prefix` and :meth:`DataFrame.add_prefix` support a ``copy`` argument. If ``False``, the underlying data is not copied in the returned object (:issue:`47934`)
296298
- :meth:`DataFrame.set_index` now supports a ``copy`` keyword. If ``False``, the underlying data is not copied when a new :class:`DataFrame` is returned (:issue:`48043`)
297299

@@ -845,6 +847,7 @@ Other Deprecations
845847
- Deprecated unused arguments ``encoding`` and ``verbose`` in :meth:`Series.to_excel` and :meth:`DataFrame.to_excel` (:issue:`47912`)
846848
- Deprecated producing a single element when iterating over a :class:`DataFrameGroupBy` or a :class:`SeriesGroupBy` that has been grouped by a list of length 1; A tuple of length one will be returned instead (:issue:`42795`)
847849
- Fixed up warning message of deprecation of :meth:`MultiIndex.lesort_depth` as public method, as the message previously referred to :meth:`MultiIndex.is_lexsorted` instead (:issue:`38701`)
850+
- Deprecated the ``sort_columns`` argument in :meth:`DataFrame.plot` and :meth:`Series.plot` (:issue:`47563`).
848851

849852
.. ---------------------------------------------------------------------------
850853
.. _whatsnew_150.performance:
@@ -899,6 +902,7 @@ Datetimelike
899902
- Bug in :meth:`DatetimeIndex.resolution` incorrectly returning "day" instead of "nanosecond" for nanosecond-resolution indexes (:issue:`46903`)
900903
- Bug in :class:`Timestamp` with an integer or float value and ``unit="Y"`` or ``unit="M"`` giving slightly-wrong results (:issue:`47266`)
901904
- Bug in :class:`.DatetimeArray` construction when passed another :class:`.DatetimeArray` and ``freq=None`` incorrectly inferring the freq from the given array (:issue:`47296`)
905+
- Bug in :func:`to_datetime` where ``OutOfBoundsDatetime`` would be thrown even if ``errors=coerce`` if there were more than 50 rows (:issue:`45319`)
902906
- Bug when adding a :class:`DateOffset` to a :class:`Series` would not add the ``nanoseconds`` field (:issue:`47856`)
903907
-
904908

@@ -1077,6 +1081,7 @@ Groupby/resample/rolling
10771081
- Bug in :meth:`.GroupBy.cummin` and :meth:`.GroupBy.cummax` with nullable dtypes incorrectly altering the original data in place (:issue:`46220`)
10781082
- Bug in :meth:`DataFrame.groupby` raising error when ``None`` is in first level of :class:`MultiIndex` (:issue:`47348`)
10791083
- Bug in :meth:`.GroupBy.cummax` with ``int64`` dtype with leading value being the smallest possible int64 (:issue:`46382`)
1084+
- Bug in :meth:`GroupBy.cumprod` ``NaN`` influences calculation in different columns with ``skipna=False`` (:issue:`48064`)
10801085
- Bug in :meth:`.GroupBy.max` with empty groups and ``uint64`` dtype incorrectly raising ``RuntimeError`` (:issue:`46408`)
10811086
- Bug in :meth:`.GroupBy.apply` would fail when ``func`` was a string and args or kwargs were supplied (:issue:`46479`)
10821087
- Bug in :meth:`SeriesGroupBy.apply` would incorrectly name its result when there was a unique group (:issue:`46369`)

environment.yml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,9 +85,8 @@ dependencies:
8585
# code checks
8686
- black=22.3.0
8787
- cpplint
88-
- flake8=4.0.1
89-
- flake8-bugbear=21.3.2 # used by flake8, find likely bugs
90-
- flake8-comprehensions=3.7.0 # used by flake8, linting of unnecessary comprehensions
88+
- flake8=5.0.4
89+
- flake8-bugbear=22.7.1 # used by flake8, find likely bugs
9190
- isort>=5.2.1 # check that imports are in the right order
9291
- mypy=0.971
9392
- pre-commit>=2.15.0

pandas/_libs/groupby.pyi

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,11 +86,13 @@ def group_mean(
8686
result_mask: np.ndarray | None = ...,
8787
) -> None: ...
8888
def group_ohlc(
89-
out: np.ndarray, # floating[:, ::1]
89+
out: np.ndarray, # floatingintuint_t[:, ::1]
9090
counts: np.ndarray, # int64_t[::1]
91-
values: np.ndarray, # ndarray[floating, ndim=2]
91+
values: np.ndarray, # ndarray[floatingintuint_t, ndim=2]
9292
labels: np.ndarray, # const intp_t[:]
9393
min_count: int = ...,
94+
mask: np.ndarray | None = ...,
95+
result_mask: np.ndarray | None = ...,
9496
) -> None: ...
9597
def group_quantile(
9698
out: npt.NDArray[np.float64],

pandas/_libs/groupby.pyx

Lines changed: 36 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,6 @@ def group_cumprod_float64(
204204
out[i, j] = NaN
205205
if not skipna:
206206
accum[lab, j] = NaN
207-
break
208207

209208

210209
@cython.boundscheck(False)
@@ -835,21 +834,32 @@ def group_mean(
835834
out[i, j] = sumx[i, j] / count
836835

837836

837+
ctypedef fused int64float_t:
838+
float32_t
839+
float64_t
840+
int64_t
841+
uint64_t
842+
843+
838844
@cython.wraparound(False)
839845
@cython.boundscheck(False)
840846
def group_ohlc(
841-
floating[:, ::1] out,
847+
int64float_t[:, ::1] out,
842848
int64_t[::1] counts,
843-
ndarray[floating, ndim=2] values,
849+
ndarray[int64float_t, ndim=2] values,
844850
const intp_t[::1] labels,
845851
Py_ssize_t min_count=-1,
852+
const uint8_t[:, ::1] mask=None,
853+
uint8_t[:, ::1] result_mask=None,
846854
) -> None:
847855
"""
848856
Only aggregates on axis=0
849857
"""
850858
cdef:
851859
Py_ssize_t i, j, N, K, lab
852-
floating val
860+
int64float_t val
861+
uint8_t[::1] first_element_set
862+
bint isna_entry, uses_mask = not mask is None
853863

854864
assert min_count == -1, "'min_count' only used in sum and prod"
855865

@@ -863,7 +873,15 @@ def group_ohlc(
863873

864874
if K > 1:
865875
raise NotImplementedError("Argument 'values' must have only one dimension")
866-
out[:] = np.nan
876+
877+
if int64float_t is float32_t or int64float_t is float64_t:
878+
out[:] = np.nan
879+
else:
880+
out[:] = 0
881+
882+
first_element_set = np.zeros((<object>counts).shape, dtype=np.uint8)
883+
if uses_mask:
884+
result_mask[:] = True
867885

868886
with nogil:
869887
for i in range(N):
@@ -873,11 +891,22 @@ def group_ohlc(
873891

874892
counts[lab] += 1
875893
val = values[i, 0]
876-
if val != val:
894+
895+
if uses_mask:
896+
isna_entry = mask[i, 0]
897+
elif int64float_t is float32_t or int64float_t is float64_t:
898+
isna_entry = val != val
899+
else:
900+
isna_entry = False
901+
902+
if isna_entry:
877903
continue
878904

879-
if out[lab, 0] != out[lab, 0]:
905+
if not first_element_set[lab]:
880906
out[lab, 0] = out[lab, 1] = out[lab, 2] = out[lab, 3] = val
907+
first_element_set[lab] = True
908+
if uses_mask:
909+
result_mask[lab] = False
881910
else:
882911
out[lab, 1] = max(out[lab, 1], val)
883912
out[lab, 2] = min(out[lab, 2], val)

pandas/_libs/interval.pyi

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,10 +79,17 @@ class Interval(IntervalMixin, Generic[_OrderableT]):
7979
def __hash__(self) -> int: ...
8080
@overload
8181
def __contains__(
82-
self: Interval[_OrderableTimesT], key: _OrderableTimesT
82+
self: Interval[Timedelta], key: Timedelta | Interval[Timedelta]
8383
) -> bool: ...
8484
@overload
85-
def __contains__(self: Interval[_OrderableScalarT], key: float) -> bool: ...
85+
def __contains__(
86+
self: Interval[Timestamp], key: Timestamp | Interval[Timestamp]
87+
) -> bool: ...
88+
@overload
89+
def __contains__(
90+
self: Interval[_OrderableScalarT],
91+
key: _OrderableScalarT | Interval[_OrderableScalarT],
92+
) -> bool: ...
8693
@overload
8794
def __add__(
8895
self: Interval[_OrderableTimesT], y: Timedelta

pandas/_libs/interval.pyx

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -299,10 +299,12 @@ cdef class Interval(IntervalMixin):
299299
>>> iv
300300
Interval(0, 5, inclusive='right')
301301
302-
You can check if an element belongs to it
302+
You can check if an element belongs to it, or if it contains another interval:
303303
304304
>>> 2.5 in iv
305305
True
306+
>>> pd.Interval(left=2, right=5, inclusive='both') in iv
307+
True
306308
307309
You can test the bounds (``inclusive='right'``, so ``0 < x <= 5``):
308310
@@ -412,7 +414,17 @@ cdef class Interval(IntervalMixin):
412414

413415
def __contains__(self, key) -> bool:
414416
if _interval_like(key):
415-
raise TypeError("__contains__ not defined for two intervals")
417+
key_closed_left = key.inclusive in ('left', 'both')
418+
key_closed_right = key.inclusive in ('right', 'both')
419+
if self.open_left and key_closed_left:
420+
left_contained = self.left < key.left
421+
else:
422+
left_contained = self.left <= key.left
423+
if self.open_right and key_closed_right:
424+
right_contained = key.right < self.right
425+
else:
426+
right_contained = key.right <= self.right
427+
return left_contained and right_contained
416428
return ((self.left < key if self.open_left else self.left <= key) and
417429
(key < self.right if self.open_right else key <= self.right))
418430

pandas/_libs/tslib.pyx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -799,6 +799,7 @@ cdef _array_to_datetime_object(
799799
# We return an object array and only attempt to parse:
800800
# 1) NaT or NaT-like values
801801
# 2) datetime strings, which we return as datetime.datetime
802+
# 3) special strings - "now" & "today"
802803
for i in range(n):
803804
val = values[i]
804805
if checknull_with_nat_and_na(val) or PyDateTime_Check(val):
@@ -817,7 +818,8 @@ cdef _array_to_datetime_object(
817818
yearfirst=yearfirst)
818819
pydatetime_to_dt64(oresult[i], &dts)
819820
check_dts_bounds(&dts)
820-
except (ValueError, OverflowError):
821+
except (ValueError, OverflowError) as ex:
822+
ex.args = (f"{ex} present at position {i}", )
821823
if is_coerce:
822824
oresult[i] = <object>NaT
823825
continue

pandas/_libs/tslibs/parsing.pyx

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,14 @@ def parse_datetime_string(
298298
if dt is not None:
299299
return dt
300300

301+
# Handling special case strings today & now
302+
if date_string == "now":
303+
dt = datetime.now()
304+
return dt
305+
elif date_string == "today":
306+
dt = datetime.today()
307+
return dt
308+
301309
try:
302310
dt, _ = _parse_dateabbr_string(date_string, _DEFAULT_DATETIME, freq=None)
303311
return dt

pandas/_testing/asserters.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -866,7 +866,7 @@ def assert_series_equal(
866866
left,
867867
right,
868868
check_dtype: bool | Literal["equiv"] = True,
869-
check_index_type="equiv",
869+
check_index_type: bool | Literal["equiv"] = "equiv",
870870
check_series_type=True,
871871
check_less_precise: bool | int | NoDefault = no_default,
872872
check_names=True,
@@ -1134,7 +1134,7 @@ def assert_frame_equal(
11341134
left,
11351135
right,
11361136
check_dtype: bool | Literal["equiv"] = True,
1137-
check_index_type="equiv",
1137+
check_index_type: bool | Literal["equiv"] = "equiv",
11381138
check_column_type="equiv",
11391139
check_frame_type=True,
11401140
check_less_precise=no_default,

0 commit comments

Comments
 (0)