Skip to content

Commit ba40923

Browse files
Merge branch 'pandas-dev:main' into raise-on-parse-int-overflow
2 parents 3e5f929 + 08fd9c0 commit ba40923

File tree

111 files changed

+1054
-346
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

111 files changed

+1054
-346
lines changed

ci/deps/actions-310.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ dependencies:
4747
- scipy
4848
- sqlalchemy
4949
- tabulate
50+
- tzdata>=2022a
5051
- xarray
5152
- xlrd
5253
- xlsxwriter

ci/deps/actions-38-minimum_versions.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ dependencies:
4949
- scipy=1.7.1
5050
- sqlalchemy=1.4.16
5151
- tabulate=0.8.9
52+
- tzdata=2022a
5253
- xarray=0.19.0
5354
- xlrd=2.0.1
5455
- xlsxwriter=1.4.3

ci/deps/actions-39.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ dependencies:
4747
- scipy
4848
- sqlalchemy
4949
- tabulate
50+
- tzdata>=2022a
5051
- xarray
5152
- xlrd
5253
- xlsxwriter

doc/source/getting_started/install.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,23 @@ For example, :func:`pandas.read_hdf` requires the ``pytables`` package, while
270270
optional dependency is not installed, pandas will raise an ``ImportError`` when
271271
the method requiring that dependency is called.
272272

273+
Timezones
274+
^^^^^^^^^
275+
276+
========================= ========================= =============================================================
277+
Dependency Minimum Version Notes
278+
========================= ========================= =============================================================
279+
tzdata 2022.1(pypi)/ Allows the use of ``zoneinfo`` timezones with pandas.
280+
2022a(for system tzdata) **Note**: You only need to install the pypi package if your
281+
system does not already provide the IANA tz database.
282+
However, the minimum tzdata version still applies, even if it
283+
is not enforced through an error.
284+
285+
If you would like to keep your system tzdata version updated,
286+
it is recommended to use the ``tzdata`` package from
287+
conda-forge.
288+
========================= ========================= =============================================================
289+
273290
Visualization
274291
^^^^^^^^^^^^^
275292

doc/source/user_guide/visualization.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@
66
Chart visualization
77
*******************
88

9+
10+
.. note::
11+
12+
The examples below assume that you're using `Jupyter <https://jupyter.org/>`_.
13+
914
This section demonstrates visualization through charting. For information on
1015
visualization of tabular data please see the section on `Table Visualization <style.ipynb>`_.
1116

doc/source/whatsnew/v1.5.0.rst

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,16 @@ including other versions of pandas.
1414
Enhancements
1515
~~~~~~~~~~~~
1616

17+
.. _whatsnew_150.enhancements.pandas-stubs:
18+
19+
``pandas-stubs``
20+
^^^^^^^^^^^^^^^^
21+
22+
The ``pandas-stubs`` library is now supported by the pandas development team, providing type stubs for the pandas API. Please visit
23+
https://github.com/pandas-dev/pandas-stubs for more information.
24+
25+
We thank VirtusLab and Microsoft for their initial, significant contributions to ``pandas-stubs``
26+
1727
.. _whatsnew_150.enhancements.dataframe_interchange:
1828

1929
DataFrame interchange protocol implementation
@@ -282,6 +292,7 @@ Other enhancements
282292
- :class:`Series` reducers (e.g. ``min``, ``max``, ``sum``, ``mean``) will now successfully operate when the dtype is numeric and ``numeric_only=True`` is provided; previously this would raise a ``NotImplementedError`` (:issue:`47500`)
283293
- :meth:`RangeIndex.union` now can return a :class:`RangeIndex` instead of a :class:`Int64Index` if the resulting values are equally spaced (:issue:`47557`, :issue:`43885`)
284294
- :meth:`DataFrame.compare` now accepts an argument ``result_names`` to allow the user to specify the result's names of both left and right DataFrame which are being compared. This is by default ``'self'`` and ``'other'`` (:issue:`44354`)
295+
- :meth:`Series.add_suffix`, :meth:`DataFrame.add_suffix`, :meth:`Series.add_prefix` and :meth:`DataFrame.add_prefix` support a ``copy`` argument. If ``False``, the underlying data is not copied in the returned object (:issue:`47934`)
285296

286297
.. ---------------------------------------------------------------------------
287298
.. _whatsnew_150.notable_bug_fixes:
@@ -544,6 +555,14 @@ Other API changes
544555
Deprecations
545556
~~~~~~~~~~~~
546557

558+
.. warning::
559+
560+
In the next major version release, 2.0, several larger API changes are being considered without a formal deprecation such as
561+
making the standard library `zoneinfo <https://docs.python.org/3/library/zoneinfo.html>`_ the default timezone implementation instead of ``pytz``,
562+
having the :class:`Index` support all data types instead of having multiple subclasses (:class:`CategoricalIndex`, :class:`Int64Index`, etc.), and more.
563+
The changes under consideration are logged in `this Github issue <https://github.com/pandas-dev/pandas/issues/44823>`_, and any
564+
feedback or concerns are welcome.
565+
547566
.. _whatsnew_150.deprecations.int_slicing_series:
548567

549568
Label-based integer slicing on a Series with an Int64Index or RangeIndex
@@ -824,6 +843,7 @@ Other Deprecations
824843
- Deprecated setting a categorical's categories with ``cat.categories = ['a', 'b', 'c']``, use :meth:`Categorical.rename_categories` instead (:issue:`37643`)
825844
- Deprecated unused arguments ``encoding`` and ``verbose`` in :meth:`Series.to_excel` and :meth:`DataFrame.to_excel` (:issue:`47912`)
826845
- Deprecated producing a single element when iterating over a :class:`DataFrameGroupBy` or a :class:`SeriesGroupBy` that has been grouped by a list of length 1; A tuple of length one will be returned instead (:issue:`42795`)
846+
- Fixed up warning message of deprecation of :meth:`MultiIndex.lesort_depth` as public method, as the message previously referred to :meth:`MultiIndex.is_lexsorted` instead (:issue:`38701`)
827847

828848
.. ---------------------------------------------------------------------------
829849
.. _whatsnew_150.performance:
@@ -878,6 +898,7 @@ Datetimelike
878898
- Bug in :meth:`DatetimeIndex.resolution` incorrectly returning "day" instead of "nanosecond" for nanosecond-resolution indexes (:issue:`46903`)
879899
- Bug in :class:`Timestamp` with an integer or float value and ``unit="Y"`` or ``unit="M"`` giving slightly-wrong results (:issue:`47266`)
880900
- Bug in :class:`.DatetimeArray` construction when passed another :class:`.DatetimeArray` and ``freq=None`` incorrectly inferring the freq from the given array (:issue:`47296`)
901+
- Bug when adding a :class:`DateOffset` to a :class:`Series` would not add the ``nanoseconds`` field (:issue:`47856`)
881902
-
882903

883904
Timedelta
@@ -912,6 +933,8 @@ Conversion
912933
- Bug in :meth:`DataFrame.to_dict` for ``orient="list"`` or ``orient="index"`` was not returning native types (:issue:`46751`)
913934
- Bug in :meth:`DataFrame.apply` that returns a :class:`DataFrame` instead of a :class:`Series` when applied to an empty :class:`DataFrame` and ``axis=1`` (:issue:`39111`)
914935
- Bug when inferring the dtype from an iterable that is *not* a NumPy ``ndarray`` consisting of all NumPy unsigned integer scalars did not result in an unsigned integer dtype (:issue:`47294`)
936+
- Bug in :meth:`DataFrame.eval` when pandas objects (e.g. ``'Timestamp'``) were column names (:issue:`44603`)
937+
-
915938

916939
Strings
917940
^^^^^^^
@@ -932,8 +955,7 @@ Indexing
932955
- Bug in setting a NA value (``None`` or ``np.nan``) into a :class:`Series` with int-based :class:`IntervalDtype` incorrectly casting to object dtype instead of a float-based :class:`IntervalDtype` (:issue:`45568`)
933956
- Bug in indexing setting values into an ``ExtensionDtype`` column with ``df.iloc[:, i] = values`` with ``values`` having the same dtype as ``df.iloc[:, i]`` incorrectly inserting a new array instead of setting in-place (:issue:`33457`)
934957
- Bug in :meth:`Series.__setitem__` with a non-integer :class:`Index` when using an integer key to set a value that cannot be set inplace where a ``ValueError`` was raised instead of casting to a common dtype (:issue:`45070`)
935-
- Bug in :meth:`DataFrame.loc` raising ``NotImplementedError`` when setting value into one column :class:`DataFrame` with all null slice as column indexer (:issue:`45469`)
936-
- Bug in :meth:`DataFrame.loc` not casting ``None`` to ``NA`` when setting value a list into :class:`DataFrame` (:issue:`47987`)
958+
- Bug in :meth:`DataFrame.loc` not casting ``None`` to ``NA`` when setting value as a list into :class:`DataFrame` (:issue:`47987`)
937959
- Bug in :meth:`Series.__setitem__` when setting incompatible values into a ``PeriodDtype`` or ``IntervalDtype`` :class:`Series` raising when indexing with a boolean mask but coercing when indexing with otherwise-equivalent indexers; these now consistently coerce, along with :meth:`Series.mask` and :meth:`Series.where` (:issue:`45768`)
938960
- Bug in :meth:`DataFrame.where` with multiple columns with datetime-like dtypes failing to downcast results consistent with other dtypes (:issue:`45837`)
939961
- Bug in :func:`isin` upcasting to ``float64`` with unsigned integer dtype and list-like argument without a dtype (:issue:`46485`)
@@ -1049,6 +1071,7 @@ Groupby/resample/rolling
10491071
- Bug when using ``engine="numba"`` would return the same jitted function when modifying ``engine_kwargs`` (:issue:`46086`)
10501072
- Bug in :meth:`.DataFrameGroupBy.transform` fails when ``axis=1`` and ``func`` is ``"first"`` or ``"last"`` (:issue:`45986`)
10511073
- Bug in :meth:`DataFrameGroupBy.cumsum` with ``skipna=False`` giving incorrect results (:issue:`46216`)
1074+
- Bug in :meth:`GroupBy.sum` with integer dtypes losing precision (:issue:`37493`)
10521075
- Bug in :meth:`.GroupBy.cumsum` with ``timedelta64[ns]`` dtype failing to recognize ``NaT`` as a null value (:issue:`46216`)
10531076
- Bug in :meth:`.GroupBy.cummin` and :meth:`.GroupBy.cummax` with nullable dtypes incorrectly altering the original data in place (:issue:`46220`)
10541077
- Bug in :meth:`DataFrame.groupby` raising error when ``None`` is in first level of :class:`MultiIndex` (:issue:`47348`)
@@ -1078,6 +1101,7 @@ Reshaping
10781101
- Bug in :func:`concat` not sorting the column names when ``None`` is included (:issue:`47331`)
10791102
- Bug in :func:`concat` with identical key leads to error when indexing :class:`MultiIndex` (:issue:`46519`)
10801103
- Bug in :func:`pivot_table` raising ``TypeError`` when ``dropna=True`` and aggregation column has extension array dtype (:issue:`47477`)
1104+
- Bug in :func:`merge` raising error for ``how="cross"`` when using ``FIPS`` mode in ssl library (:issue:`48024`)
10811105
- Bug in :meth:`DataFrame.join` with a list when using suffixes to join DataFrames with duplicate column names (:issue:`46396`)
10821106
- Bug in :meth:`DataFrame.pivot_table` with ``sort=False`` results in sorted index (:issue:`17041`)
10831107
- Bug in :meth:`concat` when ``axis=1`` and ``sort=False`` where the resulting Index was a :class:`Int64Index` instead of a :class:`RangeIndex` (:issue:`46675`)

environment.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ dependencies:
4848
- scipy
4949
- sqlalchemy
5050
- tabulate
51+
- tzdata>=2022a
5152
- xarray
5253
- xlrd
5354
- xlsxwriter

pandas/_libs/groupby.pyi

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,12 @@ def group_any_all(
5151
skipna: bool,
5252
) -> None: ...
5353
def group_sum(
54-
out: np.ndarray, # complexfloating_t[:, ::1]
54+
out: np.ndarray, # complexfloatingintuint_t[:, ::1]
5555
counts: np.ndarray, # int64_t[::1]
56-
values: np.ndarray, # ndarray[complexfloating_t, ndim=2]
56+
values: np.ndarray, # ndarray[complexfloatingintuint_t, ndim=2]
5757
labels: np.ndarray, # const intp_t[:]
58+
mask: np.ndarray | None,
59+
result_mask: np.ndarray | None = ...,
5860
min_count: int = ...,
5961
is_datetimelike: bool = ...,
6062
) -> None: ...

pandas/_libs/groupby.pyx

Lines changed: 44 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -513,6 +513,15 @@ ctypedef fused mean_t:
513513

514514
ctypedef fused sum_t:
515515
mean_t
516+
int8_t
517+
int16_t
518+
int32_t
519+
int64_t
520+
521+
uint8_t
522+
uint16_t
523+
uint32_t
524+
uint64_t
516525
object
517526

518527

@@ -523,6 +532,8 @@ def group_sum(
523532
int64_t[::1] counts,
524533
ndarray[sum_t, ndim=2] values,
525534
const intp_t[::1] labels,
535+
const uint8_t[:, :] mask,
536+
uint8_t[:, ::1] result_mask=None,
526537
Py_ssize_t min_count=0,
527538
bint is_datetimelike=False,
528539
) -> None:
@@ -535,6 +546,8 @@ def group_sum(
535546
sum_t[:, ::1] sumx, compensation
536547
int64_t[:, ::1] nobs
537548
Py_ssize_t len_values = len(values), len_labels = len(labels)
549+
bint uses_mask = mask is not None
550+
bint isna_entry
538551

539552
if len_values != len_labels:
540553
raise ValueError("len(index) != len(labels)")
@@ -572,7 +585,8 @@ def group_sum(
572585
for i in range(ncounts):
573586
for j in range(K):
574587
if nobs[i, j] < min_count:
575-
out[i, j] = NAN
588+
out[i, j] = None
589+
576590
else:
577591
out[i, j] = sumx[i, j]
578592
else:
@@ -590,11 +604,18 @@ def group_sum(
590604
# With dt64/td64 values, values have been cast to float64
591605
# instead if int64 for group_sum, but the logic
592606
# is otherwise the same as in _treat_as_na
593-
if val == val and not (
594-
sum_t is float64_t
595-
and is_datetimelike
596-
and val == <float64_t>NPY_NAT
597-
):
607+
if uses_mask:
608+
isna_entry = mask[i, j]
609+
elif (sum_t is float32_t or sum_t is float64_t
610+
or sum_t is complex64_t or sum_t is complex64_t):
611+
# avoid warnings because of equality comparison
612+
isna_entry = not val == val
613+
elif sum_t is int64_t and is_datetimelike and val == NPY_NAT:
614+
isna_entry = True
615+
else:
616+
isna_entry = False
617+
618+
if not isna_entry:
598619
nobs[lab, j] += 1
599620
y = val - compensation[lab, j]
600621
t = sumx[lab, j] + y
@@ -604,7 +625,23 @@ def group_sum(
604625
for i in range(ncounts):
605626
for j in range(K):
606627
if nobs[i, j] < min_count:
607-
out[i, j] = NAN
628+
# if we are integer dtype, not is_datetimelike, and
629+
# not uses_mask, then getting here implies that
630+
# counts[i] < min_count, which means we will
631+
# be cast to float64 and masked at the end
632+
# of WrappedCythonOp._call_cython_op. So we can safely
633+
# set a placeholder value in out[i, j].
634+
if uses_mask:
635+
result_mask[i, j] = True
636+
elif (sum_t is float32_t or sum_t is float64_t
637+
or sum_t is complex64_t or sum_t is complex64_t):
638+
out[i, j] = NAN
639+
elif sum_t is int64_t:
640+
out[i, j] = NPY_NAT
641+
else:
642+
# placeholder, see above
643+
out[i, j] = 0
644+
608645
else:
609646
out[i, j] = sumx[i, j]
610647

pandas/_libs/interval.pyx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import inspect
12
import numbers
23
from operator import (
34
le,
@@ -45,6 +46,7 @@ cnp.import_array()
4546
import warnings
4647

4748
from pandas._libs import lib
49+
4850
from pandas._libs cimport util
4951
from pandas._libs.hashtable cimport Int64Vector
5052
from pandas._libs.tslibs.timedeltas cimport _Timedelta
@@ -394,7 +396,7 @@ cdef class Interval(IntervalMixin):
394396
warnings.warn(
395397
"Attribute `closed` is deprecated in favor of `inclusive`.",
396398
FutureWarning,
397-
stacklevel=find_stack_level(),
399+
stacklevel=find_stack_level(inspect.currentframe()),
398400
)
399401
return self.inclusive
400402

0 commit comments

Comments
 (0)