Skip to content

Commit 7cf208f

Browse files
Merge branch 'pandas-dev:main' into raise-on-parse-int-overflow
2 parents 3f39a5b + eb226bd commit 7cf208f

File tree

271 files changed

+5029
-2578
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

271 files changed

+5029
-2578
lines changed

.github/workflows/32-bit-linux.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,9 @@ jobs:
3939
. ~/virtualenvs/pandas-dev/bin/activate && \
4040
python -m pip install --no-deps -U pip wheel 'setuptools<60.0.0' && \
4141
pip install cython numpy python-dateutil pytz pytest pytest-xdist pytest-asyncio>=0.17 hypothesis && \
42-
python setup.py build_ext -q -j2 && \
42+
python setup.py build_ext -q -j1 && \
4343
python -m pip install --no-build-isolation --no-use-pep517 -e . && \
44+
python -m pip list && \
4445
export PANDAS_CI=1 && \
4546
pytest -m 'not slow and not network and not clipboard and not single_cpu' pandas --junitxml=test-data.xml"
4647

.github/workflows/python-dev.yml

Lines changed: 29 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,21 @@
1-
# This file is purposely frozen(does not run). DO NOT DELETE IT
2-
# Unfreeze(by commentingthe if: false() condition) once the
3-
# next Python Dev version has released beta 1 and both Cython and numpy support it
4-
# After that Python has released, migrate the workflows to the
5-
# posix GHA workflows and "freeze" this file by
6-
# uncommenting the if: false() condition
1+
# This workflow may or may not run depending on the state of the next
2+
# unreleased Python version. DO NOT DELETE IT.
3+
#
4+
# In general, this file will remain frozen(present, but not running) until:
5+
# - The next unreleased Python version has released beta 1
6+
# - This version should be available on Github Actions.
7+
# - Our required build/runtime dependencies(numpy, pytz, Cython, python-dateutil)
8+
# support that unreleased Python version.
9+
# To unfreeze, comment out the ``if: false`` condition, and make sure you update
10+
# the name of the workflow and Python version in actions/setup-python to: '3.12-dev'
11+
#
12+
# After it has been unfrozen, this file should remain unfrozen(present, and running) until:
13+
# - The next Python version has been officially released.
14+
# OR
15+
# - Most/All of our optional dependencies support Python 3.11 AND
16+
# - The next Python version has released a rc(we are guaranteed a stable ABI).
17+
# To freeze this file, uncomment out the ``if: false`` condition, and migrate the jobs
18+
# to the corresponding posix/windows-macos/sdist etc. workflows.
719
# Feel free to modify this comment as necessary.
820

921
name: Python Dev
@@ -32,7 +44,7 @@ permissions:
3244

3345
jobs:
3446
build:
35-
if: false # Comment this line out to "unfreeze"
47+
# if: false # Uncomment this to freeze the workflow, comment it to unfreeze
3648
runs-on: ${{ matrix.os }}
3749
strategy:
3850
fail-fast: false
@@ -53,27 +65,27 @@ jobs:
5365
fetch-depth: 0
5466

5567
- name: Set up Python Dev Version
56-
uses: actions/setup-python@v3
68+
uses: actions/setup-python@v4
5769
with:
5870
python-version: '3.11-dev'
5971

6072
- name: Install dependencies
61-
shell: bash -el {0}
6273
run: |
63-
python3 -m pip install --upgrade pip setuptools wheel
64-
python3 -m pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
65-
python3 -m pip install git+https://github.com/nedbat/coveragepy.git
66-
python3 -m pip install cython python-dateutil pytz hypothesis pytest>=6.2.5 pytest-xdist pytest-cov pytest-asyncio>=0.17
67-
python3 -m pip list
74+
python --version
75+
python -m pip install --upgrade pip setuptools wheel
76+
python -m pip install git+https://github.com/numpy/numpy.git
77+
python -m pip install git+https://github.com/nedbat/coveragepy.git
78+
python -m pip install python-dateutil pytz cython hypothesis==6.52.1 pytest>=6.2.5 pytest-xdist pytest-cov pytest-asyncio>=0.17
79+
python -m pip list
6880
6981
- name: Build Pandas
7082
run: |
71-
python3 setup.py build_ext -q -j2
72-
python3 -m pip install -e . --no-build-isolation --no-use-pep517
83+
python setup.py build_ext -q -j2
84+
python -m pip install -e . --no-build-isolation --no-use-pep517
7385
7486
- name: Build Version
7587
run: |
76-
python3 -c "import pandas; pandas.show_versions();"
88+
python -c "import pandas; pandas.show_versions();"
7789
7890
- name: Test
7991
uses: ./.github/actions/run-tests

.github/workflows/ubuntu.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,10 @@ jobs:
5252
extra_apt: "language-pack-zh-hans"
5353
lang: "zh_CN.utf8"
5454
lc_all: "zh_CN.utf8"
55+
- name: "Copy-on-Write"
56+
env_file: actions-310.yaml
57+
pattern: "not slow and not network and not single_cpu"
58+
pandas_copy_on_write: "1"
5559
- name: "Data Manager"
5660
env_file: actions-38.yaml
5761
pattern: "not slow and not network and not single_cpu"
@@ -64,7 +68,7 @@ jobs:
6468
env_file: actions-310-numpydev.yaml
6569
pattern: "not slow and not network and not single_cpu"
6670
pandas_testing_mode: "deprecate"
67-
test_args: "-W error::DeprecationWarning:numpy"
71+
test_args: "-W error::DeprecationWarning:numpy -W error::FutureWarning:numpy"
6872
exclude:
6973
- env_file: actions-39.yaml
7074
pyarrow_version: "6"
@@ -84,6 +88,7 @@ jobs:
8488
LC_ALL: ${{ matrix.lc_all || '' }}
8589
PANDAS_TESTING_MODE: ${{ matrix.pandas_testing_mode || '' }}
8690
PANDAS_DATA_MANAGER: ${{ matrix.pandas_data_manager || 'block' }}
91+
PANDAS_COPY_ON_WRITE: ${{ matrix.pandas_copy_on_write || '0' }}
8792
TEST_ARGS: ${{ matrix.test_args || '' }}
8893
PYTEST_WORKERS: ${{ contains(matrix.pattern, 'not single_cpu') && 'auto' || '1' }}
8994
PYTEST_TARGET: ${{ matrix.pytest_target || 'pandas' }}

Dockerfile

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM quay.io/condaforge/miniforge3
1+
FROM quay.io/condaforge/mambaforge
22

33
# if you forked pandas, you can pass in your own GitHub username to use your fork
44
# i.e. gh_username=myname
@@ -40,7 +40,6 @@ RUN mkdir "$pandas_home" \
4040
# we just update the base/root one from the 'environment.yml' file instead of creating a new one.
4141
#
4242
# Set up environment
43-
RUN conda install -y mamba
4443
RUN mamba env update -n base -f "$pandas_home/environment.yml"
4544

4645
# Build C extensions and pandas

asv_bench/benchmarks/hash_functions.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,21 @@ def time_unique(self, exponent):
3939
pd.unique(self.a2)
4040

4141

42+
class Unique:
43+
params = ["Int64", "Float64"]
44+
param_names = ["dtype"]
45+
46+
def setup(self, dtype):
47+
self.ser = pd.Series(([1, pd.NA, 2] + list(range(100_000))) * 3, dtype=dtype)
48+
self.ser_unique = pd.Series(list(range(300_000)) + [pd.NA], dtype=dtype)
49+
50+
def time_unique_with_duplicates(self, exponent):
51+
pd.unique(self.ser)
52+
53+
def time_unique(self, exponent):
54+
pd.unique(self.ser_unique)
55+
56+
4257
class NumericSeriesIndexing:
4358

4459
params = [

asv_bench/benchmarks/reshape.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -268,9 +268,7 @@ def setup(self, bins):
268268
self.datetime_series = pd.Series(
269269
np.random.randint(N, size=N), dtype="datetime64[ns]"
270270
)
271-
self.interval_bins = pd.IntervalIndex.from_breaks(
272-
np.linspace(0, N, bins), "right"
273-
)
271+
self.interval_bins = pd.IntervalIndex.from_breaks(np.linspace(0, N, bins))
274272

275273
def time_cut_int(self, bins):
276274
pd.cut(self.int_series, bins)

asv_bench/benchmarks/series_methods.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,16 @@ def time_clip(self, n):
144144
self.s.clip(0, 1)
145145

146146

147+
class ClipDt:
148+
def setup(self):
149+
dr = date_range("20220101", periods=100_000, freq="s", tz="UTC")
150+
self.clipper_dt = dr[0:1_000].repeat(100)
151+
self.s = Series(dr)
152+
153+
def time_clip(self):
154+
self.s.clip(upper=self.clipper_dt)
155+
156+
147157
class ValueCounts:
148158

149159
params = [[10**3, 10**4, 10**5], ["int", "uint", "float", "object"]]

doc/redirects.csv

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -741,11 +741,11 @@ generated/pandas.Index.values,../reference/api/pandas.Index.values
741741
generated/pandas.Index.view,../reference/api/pandas.Index.view
742742
generated/pandas.Index.where,../reference/api/pandas.Index.where
743743
generated/pandas.infer_freq,../reference/api/pandas.infer_freq
744-
generated/pandas.Interval.inclusive,../reference/api/pandas.Interval.inclusive
744+
generated/pandas.Interval.closed,../reference/api/pandas.Interval.closed
745745
generated/pandas.Interval.closed_left,../reference/api/pandas.Interval.closed_left
746746
generated/pandas.Interval.closed_right,../reference/api/pandas.Interval.closed_right
747747
generated/pandas.Interval,../reference/api/pandas.Interval
748-
generated/pandas.IntervalIndex.inclusive,../reference/api/pandas.IntervalIndex.inclusive
748+
generated/pandas.IntervalIndex.closed,../reference/api/pandas.IntervalIndex.closed
749749
generated/pandas.IntervalIndex.contains,../reference/api/pandas.IntervalIndex.contains
750750
generated/pandas.IntervalIndex.from_arrays,../reference/api/pandas.IntervalIndex.from_arrays
751751
generated/pandas.IntervalIndex.from_breaks,../reference/api/pandas.IntervalIndex.from_breaks
@@ -761,7 +761,6 @@ generated/pandas.IntervalIndex.mid,../reference/api/pandas.IntervalIndex.mid
761761
generated/pandas.IntervalIndex.overlaps,../reference/api/pandas.IntervalIndex.overlaps
762762
generated/pandas.IntervalIndex.right,../reference/api/pandas.IntervalIndex.right
763763
generated/pandas.IntervalIndex.set_closed,../reference/api/pandas.IntervalIndex.set_closed
764-
generated/pandas.IntervalIndex.set_inclusive,../reference/api/pandas.IntervalIndex.set_inclusive
765764
generated/pandas.IntervalIndex.to_tuples,../reference/api/pandas.IntervalIndex.to_tuples
766765
generated/pandas.IntervalIndex.values,../reference/api/pandas.IntervalIndex.values
767766
generated/pandas.Interval.left,../reference/api/pandas.Interval.left

doc/source/development/contributing_codebase.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ Otherwise, you need to do it manually:
122122
.. code-block:: python
123123
124124
import warnings
125+
from pandas.util._exceptions import find_stack_level
125126
126127
127128
def old_func():
@@ -130,7 +131,11 @@ Otherwise, you need to do it manually:
130131
.. deprecated:: 1.1.0
131132
Use new_func instead.
132133
"""
133-
warnings.warn('Use new_func instead.', FutureWarning, stacklevel=2)
134+
warnings.warn(
135+
'Use new_func instead.',
136+
FutureWarning,
137+
stacklevel=find_stack_level(inspect.currentframe()),
138+
)
134139
new_func()
135140
136141

doc/source/reference/arrays.rst

Lines changed: 53 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -19,19 +19,20 @@ objects contained with a :class:`Index`, :class:`Series`, or
1919
For some data types, pandas extends NumPy's type system. String aliases for these types
2020
can be found at :ref:`basics.dtypes`.
2121

22-
=================== ========================= ================== =============================
23-
Kind of Data pandas Data Type Scalar Array
24-
=================== ========================= ================== =============================
25-
TZ-aware datetime :class:`DatetimeTZDtype` :class:`Timestamp` :ref:`api.arrays.datetime`
26-
Timedeltas (none) :class:`Timedelta` :ref:`api.arrays.timedelta`
27-
Period (time spans) :class:`PeriodDtype` :class:`Period` :ref:`api.arrays.period`
28-
Intervals :class:`IntervalDtype` :class:`Interval` :ref:`api.arrays.interval`
29-
Nullable Integer :class:`Int64Dtype`, ... (none) :ref:`api.arrays.integer_na`
30-
Categorical :class:`CategoricalDtype` (none) :ref:`api.arrays.categorical`
31-
Sparse :class:`SparseDtype` (none) :ref:`api.arrays.sparse`
32-
Strings :class:`StringDtype` :class:`str` :ref:`api.arrays.string`
33-
Boolean (with NA) :class:`BooleanDtype` :class:`bool` :ref:`api.arrays.bool`
34-
=================== ========================= ================== =============================
22+
=================== ========================= ============================= =============================
23+
Kind of Data pandas Data Type Scalar Array
24+
=================== ========================= ============================= =============================
25+
TZ-aware datetime :class:`DatetimeTZDtype` :class:`Timestamp` :ref:`api.arrays.datetime`
26+
Timedeltas (none) :class:`Timedelta` :ref:`api.arrays.timedelta`
27+
Period (time spans) :class:`PeriodDtype` :class:`Period` :ref:`api.arrays.period`
28+
Intervals :class:`IntervalDtype` :class:`Interval` :ref:`api.arrays.interval`
29+
Nullable Integer :class:`Int64Dtype`, ... (none) :ref:`api.arrays.integer_na`
30+
Categorical :class:`CategoricalDtype` (none) :ref:`api.arrays.categorical`
31+
Sparse :class:`SparseDtype` (none) :ref:`api.arrays.sparse`
32+
Strings :class:`StringDtype` :class:`str` :ref:`api.arrays.string`
33+
Boolean (with NA) :class:`BooleanDtype` :class:`bool` :ref:`api.arrays.bool`
34+
PyArrow :class:`ArrowDtype` Python Scalars or :class:`NA` :ref:`api.arrays.arrow`
35+
=================== ========================= ============================= =============================
3536

3637
pandas and third-party libraries can extend NumPy's type system (see :ref:`extending.extension-types`).
3738
The top-level :meth:`array` method can be used to create a new array, which may be
@@ -42,6 +43,44 @@ stored in a :class:`Series`, :class:`Index`, or as a column in a :class:`DataFra
4243

4344
array
4445

46+
.. _api.arrays.arrow:
47+
48+
PyArrow
49+
-------
50+
51+
.. warning::
52+
53+
This feature is experimental, and the API can change in a future release without warning.
54+
55+
The :class:`arrays.ArrowExtensionArray` is backed by a :external+pyarrow:py:class:`pyarrow.ChunkedArray` with a
56+
:external+pyarrow:py:class:`pyarrow.DataType` instead of a NumPy array and data type. The ``.dtype`` of a :class:`arrays.ArrowExtensionArray`
57+
is an :class:`ArrowDtype`.
58+
59+
`Pyarrow <https://arrow.apache.org/docs/python/index.html>`__ provides similar array and `data type <https://arrow.apache.org/docs/python/api/datatypes.html>`__
60+
support as NumPy including first-class nullability support for all data types, immutability and more.
61+
62+
.. note::
63+
64+
For string types (``pyarrow.string()``, ``string[pyarrow]``), PyArrow support is still facilitated
65+
by :class:`arrays.ArrowStringArray` and ``StringDtype("pyarrow")``. See the :ref:`string section <api.arrays.string>`
66+
below.
67+
68+
While individual values in an :class:`arrays.ArrowExtensionArray` are stored as a PyArrow objects, scalars are **returned**
69+
as Python scalars corresponding to the data type, e.g. a PyArrow int64 will be returned as Python int, or :class:`NA` for missing
70+
values.
71+
72+
.. autosummary::
73+
:toctree: api/
74+
:template: autosummary/class_without_autosummary.rst
75+
76+
arrays.ArrowExtensionArray
77+
78+
.. autosummary::
79+
:toctree: api/
80+
:template: autosummary/class_without_autosummary.rst
81+
82+
ArrowDtype
83+
4584
.. _api.arrays.datetime:
4685

4786
Datetimes
@@ -303,7 +342,6 @@ Properties
303342
.. autosummary::
304343
:toctree: api/
305344

306-
Interval.inclusive
307345
Interval.closed
308346
Interval.closed_left
309347
Interval.closed_right
@@ -341,7 +379,7 @@ A collection of intervals may be stored in an :class:`arrays.IntervalArray`.
341379
342380
arrays.IntervalArray.left
343381
arrays.IntervalArray.right
344-
arrays.IntervalArray.inclusive
382+
arrays.IntervalArray.closed
345383
arrays.IntervalArray.mid
346384
arrays.IntervalArray.length
347385
arrays.IntervalArray.is_empty
@@ -352,7 +390,6 @@ A collection of intervals may be stored in an :class:`arrays.IntervalArray`.
352390
arrays.IntervalArray.contains
353391
arrays.IntervalArray.overlaps
354392
arrays.IntervalArray.set_closed
355-
arrays.IntervalArray.set_inclusive
356393
arrays.IntervalArray.to_tuples
357394

358395

doc/source/reference/indexing.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -242,7 +242,7 @@ IntervalIndex components
242242
IntervalIndex.left
243243
IntervalIndex.right
244244
IntervalIndex.mid
245-
IntervalIndex.inclusive
245+
IntervalIndex.closed
246246
IntervalIndex.length
247247
IntervalIndex.values
248248
IntervalIndex.is_empty
@@ -251,7 +251,6 @@ IntervalIndex components
251251
IntervalIndex.get_loc
252252
IntervalIndex.get_indexer
253253
IntervalIndex.set_closed
254-
IntervalIndex.set_inclusive
255254
IntervalIndex.contains
256255
IntervalIndex.overlaps
257256
IntervalIndex.to_tuples

doc/source/user_guide/advanced.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1020,7 +1020,7 @@ Trying to select an ``Interval`` that is not exactly contained in the ``Interval
10201020
10211021
In [7]: df.loc[pd.Interval(0.5, 2.5)]
10221022
---------------------------------------------------------------------------
1023-
KeyError: Interval(0.5, 2.5, inclusive='right')
1023+
KeyError: Interval(0.5, 2.5, closed='right')
10241024
10251025
Selecting all ``Intervals`` that overlap a given ``Interval`` can be performed using the
10261026
:meth:`~IntervalIndex.overlaps` method to create a boolean indexer.
@@ -1082,14 +1082,14 @@ of :ref:`frequency aliases <timeseries.offset_aliases>` with datetime-like inter
10821082
10831083
pd.interval_range(start=pd.Timedelta("0 days"), periods=3, freq="9H")
10841084
1085-
Additionally, the ``inclusive`` parameter can be used to specify which side(s) the intervals
1086-
are closed on. Intervals are closed on the both side by default.
1085+
Additionally, the ``closed`` parameter can be used to specify which side(s) the intervals
1086+
are closed on. Intervals are closed on the right side by default.
10871087

10881088
.. ipython:: python
10891089
1090-
pd.interval_range(start=0, end=4, inclusive="both")
1090+
pd.interval_range(start=0, end=4, closed="both")
10911091
1092-
pd.interval_range(start=0, end=4, inclusive="neither")
1092+
pd.interval_range(start=0, end=4, closed="neither")
10931093
10941094
Specifying ``start``, ``end``, and ``periods`` will generate a range of evenly spaced
10951095
intervals from ``start`` to ``end`` inclusively, with ``periods`` number of elements

doc/source/user_guide/indexing.rst

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1723,13 +1723,12 @@ the given columns to a MultiIndex:
17231723
frame
17241724
17251725
Other options in ``set_index`` allow you not drop the index columns or to add
1726-
the index in-place (without creating a new object):
1726+
the index without creating a copy of the underlying data:
17271727

17281728
.. ipython:: python
17291729
17301730
data.set_index('c', drop=False)
1731-
data.set_index(['a', 'b'], inplace=True)
1732-
data
1731+
data.set_index(['a', 'b'], copy=False)
17331732
17341733
Reset the index
17351734
~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)