Skip to content

Commit b639c09

Browse files
committed
Merge remote-tracking branch 'upstream/main' into warn-on-du-parse
2 parents 6ecd4fa + 1ab02e4 commit b639c09

File tree

104 files changed

+887
-307
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

104 files changed

+887
-307
lines changed

ci/deps/actions-310.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ dependencies:
4747
- scipy
4848
- sqlalchemy
4949
- tabulate
50+
- tzdata>=2022a
5051
- xarray
5152
- xlrd
5253
- xlsxwriter

ci/deps/actions-38-minimum_versions.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ dependencies:
4949
- scipy=1.7.1
5050
- sqlalchemy=1.4.16
5151
- tabulate=0.8.9
52+
- tzdata=2022a
5253
- xarray=0.19.0
5354
- xlrd=2.0.1
5455
- xlsxwriter=1.4.3

ci/deps/actions-39.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ dependencies:
4747
- scipy
4848
- sqlalchemy
4949
- tabulate
50+
- tzdata>=2022a
5051
- xarray
5152
- xlrd
5253
- xlsxwriter

doc/source/getting_started/install.rst

+17
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,23 @@ For example, :func:`pandas.read_hdf` requires the ``pytables`` package, while
270270
optional dependency is not installed, pandas will raise an ``ImportError`` when
271271
the method requiring that dependency is called.
272272

273+
Timezones
274+
^^^^^^^^^
275+
276+
========================= ========================= =============================================================
277+
Dependency Minimum Version Notes
278+
========================= ========================= =============================================================
279+
tzdata 2022.1(pypi)/ Allows the use of ``zoneinfo`` timezones with pandas.
280+
2022a(for system tzdata) **Note**: You only need to install the pypi package if your
281+
system does not already provide the IANA tz database.
282+
However, the minimum tzdata version still applies, even if it
283+
is not enforced through an error.
284+
285+
If you would like to keep your system tzdata version updated,
286+
it is recommended to use the ``tzdata`` package from
287+
conda-forge.
288+
========================= ========================= =============================================================
289+
273290
Visualization
274291
^^^^^^^^^^^^^
275292

doc/source/whatsnew/v1.5.0.rst

+4
Original file line numberDiff line numberDiff line change
@@ -932,6 +932,8 @@ Conversion
932932
- Bug in :meth:`DataFrame.to_dict` for ``orient="list"`` or ``orient="index"`` was not returning native types (:issue:`46751`)
933933
- Bug in :meth:`DataFrame.apply` that returns a :class:`DataFrame` instead of a :class:`Series` when applied to an empty :class:`DataFrame` and ``axis=1`` (:issue:`39111`)
934934
- Bug when inferring the dtype from an iterable that is *not* a NumPy ``ndarray`` consisting of all NumPy unsigned integer scalars did not result in an unsigned integer dtype (:issue:`47294`)
935+
- Bug in :meth:`DataFrame.eval` when pandas objects (e.g. ``'Timestamp'``) were column names (:issue:`44603`)
936+
-
935937

936938
Strings
937939
^^^^^^^
@@ -1066,6 +1068,7 @@ Groupby/resample/rolling
10661068
- Bug when using ``engine="numba"`` would return the same jitted function when modifying ``engine_kwargs`` (:issue:`46086`)
10671069
- Bug in :meth:`.DataFrameGroupBy.transform` fails when ``axis=1`` and ``func`` is ``"first"`` or ``"last"`` (:issue:`45986`)
10681070
- Bug in :meth:`DataFrameGroupBy.cumsum` with ``skipna=False`` giving incorrect results (:issue:`46216`)
1071+
- Bug in :meth:`GroupBy.sum` with integer dtypes losing precision (:issue:`37493`)
10691072
- Bug in :meth:`.GroupBy.cumsum` with ``timedelta64[ns]`` dtype failing to recognize ``NaT`` as a null value (:issue:`46216`)
10701073
- Bug in :meth:`.GroupBy.cummin` and :meth:`.GroupBy.cummax` with nullable dtypes incorrectly altering the original data in place (:issue:`46220`)
10711074
- Bug in :meth:`DataFrame.groupby` raising error when ``None`` is in first level of :class:`MultiIndex` (:issue:`47348`)
@@ -1095,6 +1098,7 @@ Reshaping
10951098
- Bug in :func:`concat` not sorting the column names when ``None`` is included (:issue:`47331`)
10961099
- Bug in :func:`concat` with identical key leads to error when indexing :class:`MultiIndex` (:issue:`46519`)
10971100
- Bug in :func:`pivot_table` raising ``TypeError`` when ``dropna=True`` and aggregation column has extension array dtype (:issue:`47477`)
1101+
- Bug in :func:`merge` raising error for ``how="cross"`` when using ``FIPS`` mode in ssl library (:issue:`48024`)
10981102
- Bug in :meth:`DataFrame.join` with a list when using suffixes to join DataFrames with duplicate column names (:issue:`46396`)
10991103
- Bug in :meth:`DataFrame.pivot_table` with ``sort=False`` results in sorted index (:issue:`17041`)
11001104
- Bug in :meth:`concat` when ``axis=1`` and ``sort=False`` where the resulting Index was a :class:`Int64Index` instead of a :class:`RangeIndex` (:issue:`46675`)

environment.yml

+1
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ dependencies:
4848
- scipy
4949
- sqlalchemy
5050
- tabulate
51+
- tzdata>=2022a
5152
- xarray
5253
- xlrd
5354
- xlsxwriter

pandas/_libs/groupby.pyi

+4-2
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,12 @@ def group_any_all(
5151
skipna: bool,
5252
) -> None: ...
5353
def group_sum(
54-
out: np.ndarray, # complexfloating_t[:, ::1]
54+
out: np.ndarray, # complexfloatingintuint_t[:, ::1]
5555
counts: np.ndarray, # int64_t[::1]
56-
values: np.ndarray, # ndarray[complexfloating_t, ndim=2]
56+
values: np.ndarray, # ndarray[complexfloatingintuint_t, ndim=2]
5757
labels: np.ndarray, # const intp_t[:]
58+
mask: np.ndarray | None,
59+
result_mask: np.ndarray | None = ...,
5860
min_count: int = ...,
5961
is_datetimelike: bool = ...,
6062
) -> None: ...

pandas/_libs/groupby.pyx

+44-7
Original file line numberDiff line numberDiff line change
@@ -513,6 +513,15 @@ ctypedef fused mean_t:
513513

514514
ctypedef fused sum_t:
515515
mean_t
516+
int8_t
517+
int16_t
518+
int32_t
519+
int64_t
520+
521+
uint8_t
522+
uint16_t
523+
uint32_t
524+
uint64_t
516525
object
517526

518527

@@ -523,6 +532,8 @@ def group_sum(
523532
int64_t[::1] counts,
524533
ndarray[sum_t, ndim=2] values,
525534
const intp_t[::1] labels,
535+
const uint8_t[:, :] mask,
536+
uint8_t[:, ::1] result_mask=None,
526537
Py_ssize_t min_count=0,
527538
bint is_datetimelike=False,
528539
) -> None:
@@ -535,6 +546,8 @@ def group_sum(
535546
sum_t[:, ::1] sumx, compensation
536547
int64_t[:, ::1] nobs
537548
Py_ssize_t len_values = len(values), len_labels = len(labels)
549+
bint uses_mask = mask is not None
550+
bint isna_entry
538551

539552
if len_values != len_labels:
540553
raise ValueError("len(index) != len(labels)")
@@ -572,7 +585,8 @@ def group_sum(
572585
for i in range(ncounts):
573586
for j in range(K):
574587
if nobs[i, j] < min_count:
575-
out[i, j] = NAN
588+
out[i, j] = None
589+
576590
else:
577591
out[i, j] = sumx[i, j]
578592
else:
@@ -590,11 +604,18 @@ def group_sum(
590604
# With dt64/td64 values, values have been cast to float64
591605
# instead if int64 for group_sum, but the logic
592606
# is otherwise the same as in _treat_as_na
593-
if val == val and not (
594-
sum_t is float64_t
595-
and is_datetimelike
596-
and val == <float64_t>NPY_NAT
597-
):
607+
if uses_mask:
608+
isna_entry = mask[i, j]
609+
elif (sum_t is float32_t or sum_t is float64_t
610+
or sum_t is complex64_t or sum_t is complex64_t):
611+
# avoid warnings because of equality comparison
612+
isna_entry = not val == val
613+
elif sum_t is int64_t and is_datetimelike and val == NPY_NAT:
614+
isna_entry = True
615+
else:
616+
isna_entry = False
617+
618+
if not isna_entry:
598619
nobs[lab, j] += 1
599620
y = val - compensation[lab, j]
600621
t = sumx[lab, j] + y
@@ -604,7 +625,23 @@ def group_sum(
604625
for i in range(ncounts):
605626
for j in range(K):
606627
if nobs[i, j] < min_count:
607-
out[i, j] = NAN
628+
# if we are integer dtype, not is_datetimelike, and
629+
# not uses_mask, then getting here implies that
630+
# counts[i] < min_count, which means we will
631+
# be cast to float64 and masked at the end
632+
# of WrappedCythonOp._call_cython_op. So we can safely
633+
# set a placeholder value in out[i, j].
634+
if uses_mask:
635+
result_mask[i, j] = True
636+
elif (sum_t is float32_t or sum_t is float64_t
637+
or sum_t is complex64_t or sum_t is complex64_t):
638+
out[i, j] = NAN
639+
elif sum_t is int64_t:
640+
out[i, j] = NPY_NAT
641+
else:
642+
# placeholder, see above
643+
out[i, j] = 0
644+
608645
else:
609646
out[i, j] = sumx[i, j]
610647

pandas/_libs/interval.pyx

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import inspect
12
import numbers
23
from operator import (
34
le,
@@ -45,6 +46,7 @@ cnp.import_array()
4546
import warnings
4647

4748
from pandas._libs import lib
49+
4850
from pandas._libs cimport util
4951
from pandas._libs.hashtable cimport Int64Vector
5052
from pandas._libs.tslibs.timedeltas cimport _Timedelta
@@ -394,7 +396,7 @@ cdef class Interval(IntervalMixin):
394396
warnings.warn(
395397
"Attribute `closed` is deprecated in favor of `inclusive`.",
396398
FutureWarning,
397-
stacklevel=find_stack_level(),
399+
stacklevel=find_stack_level(inspect.currentframe()),
398400
)
399401
return self.inclusive
400402

0 commit comments

Comments
 (0)