Skip to content

Commit 4a19678

Browse files
committed
Merge branch 'main' into fix-rolling-std-custom-weights
2 parents cce2dbe + cc58350 commit 4a19678

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+713
-201
lines changed

.github/workflows/code-checks.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ jobs:
3333

3434
steps:
3535
- name: Checkout
36-
uses: actions/checkout@v3
36+
uses: actions/checkout@v4
3737
with:
3838
fetch-depth: 0
3939

@@ -109,7 +109,7 @@ jobs:
109109

110110
steps:
111111
- name: Checkout
112-
uses: actions/checkout@v3
112+
uses: actions/checkout@v4
113113
with:
114114
fetch-depth: 0
115115

@@ -143,7 +143,7 @@ jobs:
143143
run: docker image prune -f
144144

145145
- name: Checkout
146-
uses: actions/checkout@v3
146+
uses: actions/checkout@v4
147147
with:
148148
fetch-depth: 0
149149

@@ -164,7 +164,7 @@ jobs:
164164

165165
steps:
166166
- name: Checkout
167-
uses: actions/checkout@v3
167+
uses: actions/checkout@v4
168168
with:
169169
fetch-depth: 0
170170

.github/workflows/codeql.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ jobs:
2727
- python
2828

2929
steps:
30-
- uses: actions/checkout@v3
30+
- uses: actions/checkout@v4
3131
- uses: github/codeql-action/init@v2
3232
with:
3333
languages: ${{ matrix.language }}

.github/workflows/comment-commands.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ jobs:
5151

5252
steps:
5353
- name: Checkout
54-
uses: actions/checkout@v3
54+
uses: actions/checkout@v4
5555
with:
5656
fetch-depth: 0
5757

.github/workflows/docbuild-and-upload.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ jobs:
3636

3737
steps:
3838
- name: Checkout
39-
uses: actions/checkout@v3
39+
uses: actions/checkout@v4
4040
with:
4141
fetch-depth: 0
4242

.github/workflows/package-checks.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434

3535
steps:
3636
- name: Checkout
37-
uses: actions/checkout@v3
37+
uses: actions/checkout@v4
3838
with:
3939
fetch-depth: 0
4040

@@ -62,7 +62,7 @@ jobs:
6262
cancel-in-progress: true
6363
steps:
6464
- name: Checkout
65-
uses: actions/checkout@v3
65+
uses: actions/checkout@v4
6666
with:
6767
fetch-depth: 0
6868

.github/workflows/unit-tests.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ jobs:
136136

137137
steps:
138138
- name: Checkout
139-
uses: actions/checkout@v3
139+
uses: actions/checkout@v4
140140
with:
141141
fetch-depth: 0
142142

@@ -194,7 +194,7 @@ jobs:
194194

195195
steps:
196196
- name: Checkout
197-
uses: actions/checkout@v3
197+
uses: actions/checkout@v4
198198
with:
199199
fetch-depth: 0
200200

@@ -330,7 +330,7 @@ jobs:
330330
PYTEST_TARGET: pandas
331331

332332
steps:
333-
- uses: actions/checkout@v3
333+
- uses: actions/checkout@v4
334334
with:
335335
fetch-depth: 0
336336

.github/workflows/wheels.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ jobs:
4848
sdist_file: ${{ steps.save-path.outputs.sdist_name }}
4949
steps:
5050
- name: Checkout pandas
51-
uses: actions/checkout@v3
51+
uses: actions/checkout@v4
5252
with:
5353
fetch-depth: 0
5454

@@ -103,7 +103,7 @@ jobs:
103103
IS_SCHEDULE_DISPATCH: ${{ github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' }}
104104
steps:
105105
- name: Checkout pandas
106-
uses: actions/checkout@v3
106+
uses: actions/checkout@v4
107107
with:
108108
fetch-depth: 0
109109

ci/deps/actions-310.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ dependencies:
4646
- pymysql>=1.0.2
4747
- pyreadstat>=1.1.5
4848
- pytables>=3.7.0
49+
- python-calamine>=0.1.6
4950
- pyxlsb>=1.0.9
5051
- s3fs>=2022.05.0
5152
- scipy>=1.8.1

ci/deps/actions-311-downstream_compat.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ dependencies:
4747
- pymysql>=1.0.2
4848
- pyreadstat>=1.1.5
4949
- pytables>=3.7.0
50+
- python-calamine>=0.1.6
5051
- pyxlsb>=1.0.9
5152
- s3fs>=2022.05.0
5253
- scipy>=1.8.1

ci/deps/actions-311.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ dependencies:
4646
- pymysql>=1.0.2
4747
- pyreadstat>=1.1.5
4848
# - pytables>=3.7.0, 3.8.0 is first version that supports 3.11
49+
- python-calamine>=0.1.6
4950
- pyxlsb>=1.0.9
5051
- s3fs>=2022.05.0
5152
- scipy>=1.8.1

ci/deps/actions-39-minimum_versions.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ dependencies:
4848
- pymysql=1.0.2
4949
- pyreadstat=1.1.5
5050
- pytables=3.7.0
51+
- python-calamine=0.1.6
5152
- pyxlsb=1.0.9
5253
- s3fs=2022.05.0
5354
- scipy=1.8.1

ci/deps/actions-39.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ dependencies:
4646
- pymysql>=1.0.2
4747
- pyreadstat>=1.1.5
4848
- pytables>=3.7.0
49+
- python-calamine>=0.1.6
4950
- pyxlsb>=1.0.9
5051
- s3fs>=2022.05.0
5152
- scipy>=1.8.1

ci/deps/circle-310-arm64.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ dependencies:
4747
- pymysql>=1.0.2
4848
# - pyreadstat>=1.1.5 not available on ARM
4949
- pytables>=3.7.0
50+
- python-calamine>=0.1.6
5051
- pyxlsb>=1.0.9
5152
- s3fs>=2022.05.0
5253
- scipy>=1.8.1

doc/source/getting_started/install.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -281,6 +281,7 @@ xlrd 2.0.1 excel Reading Excel
281281
xlsxwriter 3.0.3 excel Writing Excel
282282
openpyxl 3.0.10 excel Reading / writing for xlsx files
283283
pyxlsb 1.0.9 excel Reading for xlsb files
284+
python-calamine 0.1.6 excel Reading for xls/xlsx/xlsb/ods files
284285
========================= ================== =============== =============================================================
285286

286287
HTML

doc/source/user_guide/io.rst

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3453,7 +3453,8 @@ Excel files
34533453
The :func:`~pandas.read_excel` method can read Excel 2007+ (``.xlsx``) files
34543454
using the ``openpyxl`` Python module. Excel 2003 (``.xls``) files
34553455
can be read using ``xlrd``. Binary Excel (``.xlsb``)
3456-
files can be read using ``pyxlsb``.
3456+
files can be read using ``pyxlsb``. All formats can be read
3457+
using :ref:`calamine<io.calamine>` engine.
34573458
The :meth:`~DataFrame.to_excel` instance method is used for
34583459
saving a ``DataFrame`` to Excel. Generally the semantics are
34593460
similar to working with :ref:`csv<io.read_csv_table>` data.
@@ -3494,6 +3495,9 @@ using internally.
34943495

34953496
* For the engine odf, pandas is using :func:`odf.opendocument.load` to read in (``.ods``) files.
34963497

3498+
* For the engine calamine, pandas is using :func:`python_calamine.load_workbook`
3499+
to read in (``.xlsx``), (``.xlsm``), (``.xls``), (``.xlsb``), (``.ods``) files.
3500+
34973501
.. code-block:: python
34983502
34993503
# Returns a DataFrame
@@ -3935,7 +3939,8 @@ The :func:`~pandas.read_excel` method can also read binary Excel files
39353939
using the ``pyxlsb`` module. The semantics and features for reading
39363940
binary Excel files mostly match what can be done for `Excel files`_ using
39373941
``engine='pyxlsb'``. ``pyxlsb`` does not recognize datetime types
3938-
in files and will return floats instead.
3942+
in files and will return floats instead (you can use :ref:`calamine<io.calamine>`
3943+
if you need recognize datetime types).
39393944

39403945
.. code-block:: python
39413946
@@ -3947,6 +3952,20 @@ in files and will return floats instead.
39473952
Currently pandas only supports *reading* binary Excel files. Writing
39483953
is not implemented.
39493954

3955+
.. _io.calamine:
3956+
3957+
Calamine (Excel and ODS files)
3958+
------------------------------
3959+
3960+
The :func:`~pandas.read_excel` method can read Excel file (``.xlsx``, ``.xlsm``, ``.xls``, ``.xlsb``)
3961+
and OpenDocument spreadsheets (``.ods``) using the ``python-calamine`` module.
3962+
This module is a binding for Rust library `calamine <https://crates.io/crates/calamine>`__
3963+
and is faster than other engines in most cases. The optional dependency 'python-calamine' needs to be installed.
3964+
3965+
.. code-block:: python
3966+
3967+
# Returns a DataFrame
3968+
pd.read_excel("path_to_file.xlsb", engine="calamine")
39503969
39513970
.. _io.clipboard:
39523971

doc/source/whatsnew/v0.10.0.rst

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -180,19 +180,36 @@ labeled the aggregated group with the end of the interval: the next day).
180180
DataFrame constructor with no columns specified. The v0.9.0 behavior (names
181181
``X0``, ``X1``, ...) can be reproduced by specifying ``prefix='X'``:
182182

183-
.. ipython:: python
184-
:okexcept:
185-
186-
import io
187-
188-
data = """
189-
a,b,c
190-
1,Yes,2
191-
3,No,4
192-
"""
193-
print(data)
194-
pd.read_csv(io.StringIO(data), header=None)
195-
pd.read_csv(io.StringIO(data), header=None, prefix="X")
183+
.. code-block:: ipython
184+
185+
In [6]: import io
186+
187+
In [7]: data = """
188+
...: a,b,c
189+
...: 1,Yes,2
190+
...: 3,No,4
191+
...: """
192+
...:
193+
194+
In [8]: print(data)
195+
196+
a,b,c
197+
1,Yes,2
198+
3,No,4
199+
200+
In [9]: pd.read_csv(io.StringIO(data), header=None)
201+
Out[9]:
202+
0 1 2
203+
0 a b c
204+
1 1 Yes 2
205+
2 3 No 4
206+
207+
In [10]: pd.read_csv(io.StringIO(data), header=None, prefix="X")
208+
Out[10]:
209+
X0 X1 X2
210+
0 a b c
211+
1 1 Yes 2
212+
2 3 No 4
196213
197214
- Values like ``'Yes'`` and ``'No'`` are not interpreted as boolean by default,
198215
though this can be controlled by new ``true_values`` and ``false_values``

doc/source/whatsnew/v2.1.1.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Bug fixes
3535
~~~~~~~~~
3636
- Fixed bug for :class:`ArrowDtype` raising ``NotImplementedError`` for fixed-size list (:issue:`55000`)
3737
- Fixed bug in :meth:`DataFrame.stack` with ``future_stack=True`` and columns a non-:class:`MultiIndex` consisting of tuples (:issue:`54948`)
38+
- Fixed bug in :meth:`Series.dt.tz` with :class:`ArrowDtype` where a string was returned instead of a ``tzinfo`` object (:issue:`55003`)
3839
- Fixed bug in :meth:`Series.pct_change` and :meth:`DataFrame.pct_change` showing unnecessary ``FutureWarning`` (:issue:`54981`)
3940

4041
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v2.2.0.rst

Lines changed: 28 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,27 @@ including other versions of pandas.
1414
Enhancements
1515
~~~~~~~~~~~~
1616

17-
.. _whatsnew_220.enhancements.enhancement1:
17+
.. _whatsnew_220.enhancements.calamine:
1818

19-
enhancement1
20-
^^^^^^^^^^^^
19+
Calamine engine for :func:`read_excel`
20+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
21+
22+
The ``calamine`` engine was added to :func:`read_excel`.
23+
It uses ``python-calamine``, which provides Python bindings for the Rust library `calamine <https://crates.io/crates/calamine>`__.
24+
This engine supports Excel files (``.xlsx``, ``.xlsm``, ``.xls``, ``.xlsb``) and OpenDocument spreadsheets (``.ods``) (:issue:`50395`).
25+
26+
There are two advantages of this engine:
27+
28+
1. Calamine is often faster than other engines, some benchmarks show results up to 5x faster than 'openpyxl', 20x - 'odf', 4x - 'pyxlsb', and 1.5x - 'xlrd'.
29+
But, 'openpyxl' and 'pyxlsb' are faster in reading a few rows from large files because of lazy iteration over rows.
30+
2. Calamine supports the recognition of datetime in ``.xlsb`` files, unlike 'pyxlsb' which is the only other engine in pandas that can read ``.xlsb`` files.
31+
32+
.. code-block:: python
33+
34+
pd.read_excel("path_to_file.xlsb", engine="calamine")
35+
36+
37+
For more, see :ref:`io.calamine` in the user guide on IO tools.
2138

2239
.. _whatsnew_220.enhancements.enhancement2:
2340

@@ -28,7 +45,7 @@ enhancement2
2845

2946
Other enhancements
3047
^^^^^^^^^^^^^^^^^^
31-
-
48+
- DataFrame.apply now allows the usage of numba (via ``engine="numba"``) to JIT compile the passed function, allowing for potential speedups (:issue:`54666`)
3249
-
3350

3451
.. ---------------------------------------------------------------------------
@@ -158,9 +175,13 @@ Deprecations
158175

159176
Performance improvements
160177
~~~~~~~~~~~~~~~~~~~~~~~~
178+
- Performance improvement in :func:`concat` with ``axis=1`` and objects with unaligned indexes (:issue:`55084`)
161179
- Performance improvement in :func:`to_dict` on converting DataFrame to dictionary (:issue:`50990`)
180+
- Performance improvement in :meth:`DataFrame.groupby` when aggregating pyarrow timestamp and duration dtypes (:issue:`55031`)
162181
- Performance improvement in :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` when indexed by a :class:`MultiIndex` (:issue:`54835`)
182+
- Performance improvement in :meth:`Index.difference` (:issue:`55108`)
163183
- Performance improvement when indexing with more than 4 keys (:issue:`54550`)
184+
-
164185

165186
.. ---------------------------------------------------------------------------
166187
.. _whatsnew_220.bug_fixes:
@@ -169,6 +190,7 @@ Bug fixes
169190
~~~~~~~~~
170191
- Bug in :class:`AbstractHolidayCalendar` where timezone data was not propagated when computing holiday observances (:issue:`54580`)
171192
- Bug in :class:`pandas.core.window.Rolling` where duplicate datetimelike indexes are treated as consecutive rather than equal with ``closed='left'`` and ``closed='neither'`` (:issue:`20712`)
193+
- Bug in :meth:`DataFrame.apply` where passing ``raw=True`` ignored ``args`` passed to the applied function (:issue:`55009`)
172194

173195
Categorical
174196
^^^^^^^^^^^
@@ -229,6 +251,7 @@ I/O
229251
^^^
230252
- Bug in :func:`read_csv` where ``on_bad_lines="warn"`` would write to ``stderr`` instead of raise a Python warning. This now yields a :class:`.errors.ParserWarning` (:issue:`54296`)
231253
- Bug in :func:`read_excel`, with ``engine="xlrd"`` (``xls`` files) erroring when file contains NaNs/Infs (:issue:`54564`)
254+
- Bug in :func:`to_excel`, with ``OdsWriter`` (``ods`` files) writing boolean/string value (:issue:`54994`)
232255

233256
Period
234257
^^^^^^
@@ -247,8 +270,8 @@ Groupby/resample/rolling
247270

248271
Reshaping
249272
^^^^^^^^^
273+
- Bug in :func:`concat` ignoring ``sort`` parameter when passed :class:`DatetimeIndex` indexes (:issue:`54769`)
250274
- Bug in :func:`merge` returning columns in incorrect order when left and/or right is empty (:issue:`51929`)
251-
-
252275

253276
Sparse
254277
^^^^^^

environment.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ dependencies:
4747
- pymysql>=1.0.2
4848
- pyreadstat>=1.1.5
4949
- pytables>=3.7.0
50+
- python-calamine>=0.1.6
5051
- pyxlsb>=1.0.9
5152
- s3fs>=2022.05.0
5253
- scipy>=1.8.1
@@ -105,7 +106,7 @@ dependencies:
105106
- ipykernel
106107

107108
# web
108-
- jinja2 # in optional dependencies, but documented here as needed
109+
# - jinja2 # already listed in optional dependencies, but documented here for reference
109110
- markdown
110111
- feedparser
111112
- pyyaml

0 commit comments

Comments
 (0)