These are the changes in pandas 3.1.0. See :ref:`release` for a full changelog including other versions of pandas.
{{ header }}
- :class:`Period` now supports f-string formatting via
__format__, e.g.f"{period:%Y-%m}"(:issue:`48536`) - :meth:`.DataFrameGroupBy.agg` now allows for the provided
functo return a NumPy array (:issue:`63957`) - Added :meth:`ExtensionArray.count` (:issue:`64450`)
- Display formatting for float sequences in DataFrame cells now respects the
display.precisionoption (:issue:`60503`). - Improved the precision of float parsing in :func:`read_csv` (:issue:`64395`)
- Improved the string
reprof :class:`pd.core.arrays.SparseArray` (:issue:`64547`)
These are bug fixes that might have notable behavior changes.
Some minimum supported versions of dependencies were updated. If installed, we now require:
| Package | Minimum Version | Required | Changed |
|---|---|---|---|
| X | X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
| Package | Minimum Version | Changed |
|---|---|---|
| X |
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
- APIs that accept an
engine="numba"parameter withengine_kwargswill no longer pass through anopythonargument tonumba.jit. This argument has had no effect since numba 0.59.0 (:issue:`64483`).
- Deprecated :attr:`Timestamp.dayofweek`, :attr:`Timestamp.dayofyear`, :attr:`Timestamp.daysinmonth` in favor of :attr:`Timestamp.day_of_week`, :attr:`Timestamp.day_of_year`, :attr:`Timestamp.days_in_month`, respectively. The same deprecation applies to the corresponding attributes on :class:`Period`, :class:`DatetimeIndex`, :class:`PeriodIndex`, and :attr:`Series.dt` (:issue:`46768`)
- Deprecated :meth:`.DataFrameGroupBy.agg` and :meth:`.Resampler.agg` unpacking a scalar when the provided
funcreturns a Series or array of length 1; in the future this will result in the Series or array being in the result. Users should unpack the scalar infuncitself (:issue:`64014`) - Deprecated :meth:`ExcelFile.parse`, use :func:`read_excel` instead (:issue:`58247`)
- Deprecated
engine="fastparquet"andengine="auto"in :func:`read_parquet` and :meth:`DataFrame.to_parquet`. Thefastparquetlibrary has been retired; useengine="pyarrow"or do not passengineto use the default. (:issue:`64597`) - Deprecated arithmetic operations between pandas objects (:class:`DataFrame`, :class:`Series`, :class:`Index`, and pandas-implemented :class:`ExtensionArray` subclasses) and list-likes other than
list,np.ndarray, :class:`ExtensionArray`, :class:`Index`, :class:`Series`, :class:`DataFrame`. For e.g.tupleorrange, explicitly cast these to a supported object instead. In a future version, these will be treated as scalar-like for pointwise operation (:issue:`62423`) - Deprecated automatic dtype promotion when reindexing with a
fill_valuethat cannot be held by the original dtype. Explicitly cast to a common dtype instead (:issue:`53910`) - Deprecated passing a
dictto :meth:`DataFrame.from_records`, use the :class:`DataFrame` constructor or :meth:`DataFrame.from_dict` instead (:issue:`22025`) - Deprecated passing a non-dict (e.g. a list of dicts) to :meth:`DataFrame.from_dict`. Use the :class:`DataFrame` constructor instead (:issue:`58862`)
- Deprecated passing unnecessary
*argsand**kwargsto :meth:`.GroupBy.cumsum`, :meth:`.GroupBy.cumprod`, :meth:`.GroupBy.cummin`, :meth:`.GroupBy.cummax`, :meth:`.SeriesGroupBy.skew`, :meth:`.DataFrameGroupBy.skew`, :meth:`.SeriesGroupBy.take`, and :meth:`.DataFrameGroupBy.take`. Theskipnaparameter for the cum* methods is now an explicit keyword argument (:issue:`50407`) - Deprecated setting values with :meth:`DataFrame.at` and :meth:`Series.at` when the key does not exist in the index, which previously expanded the object. Use
.locinstead (:issue:`48323`) - Deprecated the
.nameproperty of offset objects (e.g., :class:`~pandas.tseries.offsets.Day`, :class:`~pandas.tseries.offsets.Hour`). Use.rule_codeinstead (:issue:`64207`) - Deprecated the
dropnakeyword in :meth:`DataFrame.to_hdf`, :meth:`HDFStore.put`, :meth:`HDFStore.append`, and :meth:`HDFStore.append_to_multiple`, and theio.hdf.dropna_tableoption. Use :meth:`DataFrame.dropna` before writing instead (:issue:`32038`) - Deprecated the
float_precisionargument in :func:`read_csv`, :func:`read_table`, and :func:`read_fwf`. All float precision modes now use the same converter (:issue:`64395`) - Deprecated the
weekdayproperty on :class:`DatetimeIndex`, :class:`.DatetimeArray`, :class:`PeriodIndex`, :class:`.PeriodArray`, and :class:`Period`. Useday_of_weekinstead.Timestamp.weekday()remains a method consistent with :meth:`datetime.datetime.weekday` (:issue:`12816`) - Deprecated the
xlrdandpyxlsbengines in :func:`read_excel`. Useengine="calamine"instead (:issue:`56542`) - Deprecated the default value of
exactin :func:`assert_index_equal`; in a future version this will default toTrueinstead of "equiv" (:issue:`57436`) - Deprecated the default value of
track_timesin :meth:`HDFStore.put`. In a future version, the default will change fromTruetoFalseso that HDF5 files are deterministic by default (:issue:`51456`)
- Performance improvement in casting integer and boolean dtypes to
string[pyarrow]by using PyArrow's native cast instead of element-wise conversion (:issue:`56505`) - Performance improvement in :meth:`DataFrame.__getitem__` when selecting a single column by label on a :class:`DataFrame` with duplicate column names. (:issue:`64126`).
- Performance improvement in :attr:`Series.is_monotonic_increasing` and :attr:`Series.is_monotonic_decreasing` for :class:`ArrowDtype` and masked dtypes by dispatching to the :class:`ExtensionArray` (:issue:`56619`)
- Performance improvement in :class:`GroupBy` reductions and transformations for :class:`SparseDtype` columns, which now use Cython instead of falling back to slow Python aggregation (:issue:`36123`)
- Performance improvement in :func:`bdate_range` and :func:`date_range` with
freq="B"orfreq="C"(business day frequencies) (:issue:`16463`) - Performance improvement in :func:`infer_freq` (:issue:`64463`)
- Performance improvement in :func:`merge` and :meth:`DataFrame.join` for many-to-many joins with
sort=False(:issue:`56564`) - Performance improvement in :func:`merge` with
how="cross"(:issue:`38082`) - Performance improvement in :func:`merge` with
how="left"(:issue:`64370`) - Performance improvement in :func:`merge` with
sort=Falsefor single-keyhow="left"/how="right"joins when the opposite join key is sorted, unique, and range-like (:issue:`64146`) - Performance improvement in :func:`read_csv` with
engine="c"when reading from binary file-like objects (e.g. PyArrow S3 file handles) by avoiding unnecessaryTextIOWrapperwrapping (:issue:`46823`) - Performance improvement in :func:`read_html` and the Python CSV parser when
thousandsis set, fixing catastrophic regex backtracking on cells with many comma-separated digit groups followed by non-numeric text (:issue:`52619`) - Performance improvement in :func:`read_sas` by reading page header fields directly in Cython instead of falling back to Python (:issue:`47339`)
- Performance improvement in :func:`read_sas` for SAS7BDAT files by pre-computing date/datetime column classification once during metadata parsing instead of per chunk (:issue:`47339`)
- Performance improvement in :func:`read_sas` for compressed SAS7BDAT files by reusing the decompression buffer instead of allocating per row (:issue:`47339`)
- Performance improvement in :func:`read_sas` when decoding strings (:issue:`47339`)
- Performance improvement in :func:`util.hash_pandas_object` for PyArrow-backed string and binary types by using PyArrow's
dictionary_encodeinstead of converting to NumPy for factorization (:issue:`48964`) - Performance improvement in :meth:`DataFrame.corr` and :meth:`DataFrame.cov` when data contains no NaN values (:issue:`64857`)
- Performance improvement in :meth:`DataFrame.fillna` and :meth:`Series.fillna` with scalar fill value for float, object, nullable, and datetime-like dtypes (:issue:`42147`)
- Performance improvement in :meth:`DataFrame.from_records` when passing a 2D :class:`numpy.ndarray` (:issue:`22025`)
- Performance improvement in :meth:`DataFrame.insert` when the number of blocks is small (:issue:`57641`)
- Performance improvement in :meth:`DataFrame.loc` with non-unique masked index (:issue:`56759`)
- Performance improvement in :meth:`DataFrame.query` and :meth:`DataFrame.eval` when the :class:`DataFrame` contains :class:`PeriodDtype` or :class:`IntervalDtype` columns (:issue:`35247`)
- Performance improvement in :meth:`DataFrame.rank` and :meth:`Series.rank` by skipping unnecessary
putmaskfor non-nullable dtypes (:issue:`64857`) - Performance improvement in :meth:`DataFrame.sort_values` with multiple numeric columns by avoiding unnecessary :class:`Categorical` conversion (:issue:`15389`)
- Performance improvement in :meth:`DataFrame.to_stata` when writing object-dtype datetime columns with date formats that require year/month extraction (:issue:`64555`)
- Performance improvement in :meth:`GroupBy.any` and :meth:`GroupBy.all` for boolean-dtype columns (:issue:`37850`)
- Performance improvement in :meth:`GroupBy.first` and :meth:`GroupBy.last` for Extension Array dtypes, which no longer fall back to a slow
apply-based implementation (:issue:`57591`) - Performance improvement in :meth:`GroupBy.quantile` (:issue:`64330`)
- Performance improvement in :meth:`Index.get_indexer` for large monotonic indexes, which now uses binary search instead of building a hash table when the number of targets is small (:issue:`14273`)
- Performance improvement in :meth:`Index.join` and :meth:`Index.union` for :class:`RangeIndex` by avoiding unnecessary memory allocation in the libjoin fastpath (:issue:`54646`)
- Performance improvement in :meth:`IntervalIndex.get_indexer` for monotonic non-overlapping indexes, which now uses binary search instead of the interval tree (:issue:`47614`)
- Performance improvement in :meth:`NDFrame.__finalize__`, :meth:`Series.to_numpy`, :attr:`DataFrame.dtypes`, and :meth:`DataFrame.__getitem__` by reducing overhead from metadata propagation, memory sharing checks, and attribute setting (:issue:`57431`)
- Performance improvement in :meth:`arrays.SparseArray.isna` by avoiding a dense-then-resparsify round-trip (:issue:`41023`)
- Performance improvement in datetime/timedelta unit conversion (e.g.
datetime64[s]todatetime64[ns]) (:issue:`35025`) - Performance improvement in indexing a :class:`DataFrame` with a :class:`CategoricalIndex` of :class:`Interval` categories (:issue:`61928`)
- Performance improvement in indexing a :class:`MultiIndex` with a list-like indexer (:issue:`55786`)
- Performance improvement in partial-string indexing on a monotonic decreasing :class:`DatetimeIndex` or :class:`PeriodIndex` (:issue:`64811`)
- Performance improvement in plotting :class:`DatetimeIndex` with multiplied frequencies (e.g.
"1000ms","100s") (:issue:`50355`) - Performance improvement in reading zip-compressed files (e.g. :func:`read_pickle`, :func:`read_csv`) on Python < 3.12 (:issue:`59279`)
- Performance improvement in repr of :class:`Series` and :class:`DataFrame` containing third-party array-like objects (e.g. xarray
DataArray) in object dtype columns (:issue:`61809`) - Performance improvement in :meth:`DataFrame.loc` and :meth:`DataFrame.iloc` setitem with a 2D list-of-lists value by avoiding a wasteful round-trip through an intermediate object array (:issue:`64229`).
- Fixed bug in :class:`Index` repr where attributes were not wrapped to respect
display.width(:issue:`11552`) - Fixed bug in :func:`to_timedelta` and :class:`Timedelta` not accepting Day offsets (:issue:`64240`)
- Bug in :meth:`Categorical.__repr__` where the values and categories lines could exceed
display.width(:issue:`12066`) - Bug in :meth:`CategoricalIndex.union` and :meth:`CategoricalIndex.intersection` giving incorrect results when the two indexes have the same unordered categories in different orders (:issue:`55335`)
- Bug in :meth:`Index.fillna` raising
TypeErrorwhen filling with a tuple value (e.g. on object-dtype or :class:`CategoricalIndex` with tuple categories) (:issue:`37681`)
- Bug in :class:`DatetimeIndex` constructor raising
ValueErrorwhen passing equivalent but not equal frequencies (e.g.QS-FEBvsQS-MAY) (:issue:`61086`) - Bug in :class:`DatetimeIndex` raising
AttributeErrorwhen comparing against Arrow date types (date32, date64) (:issue:`62051`) - Bug in :class:`Timestamp` constructor where passing
np.str_objects would fail in Cython string parsing (:issue:`48974`) - Bug in :class:`Timestamp` constructor, :class:`Timedelta` constructor, :func:`to_datetime`, and :func:`to_timedelta` with non-round
floatinput andunitfailing to raise when the value is just outside the representable bounds (:issue:`57366`) - Bug in :func:`date_range` where
inclusive="left"andinclusive="right"returned a single-element result instead of empty whenstartequalsend(:issue:`55293`) - Bug in :func:`date_range` where
inclusiveparameter failed to filter endpoints when onlystartandperiodsorendandperiodswere specified (:issue:`46331`) - Bug in :func:`date_range` where
periods=1with offsets that disallown=0(e.g. :class:`offsets.LastWeekOfMonth`, :class:`offsets.FY5253`) raisedValueError(:issue:`41563`) - Bug in :func:`date_range` where calendar-based offsets (e.g.
MS,ME,QS,YS) could exclude the last offset boundary whenend's time-of-day was earlier thanstart's (:issue:`35342`) - Bug in :func:`to_datetime` and :func:`to_timedelta` on ARM platforms where round
floatvalues outside the int64 domain (e.g.float(2**63)) could silently produce incorrect results instead of raising (:issue:`64619`) - Bug in :func:`to_datetime` and :func:`to_timedelta` where
uint64values greater thanint64max silently overflowed instead of raising :class:`OutOfBoundsDatetime` or :class:`OutOfBoundsTimedelta` (:issue:`60677`) - Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` raising
AssertionErrorinstead of :class:`OutOfBoundsDatetime` when replacing with adatetimevalue outside thedatetime64[ns]range (:issue:`61671`) - Bug in :meth:`DatetimeArray.isin` and :meth:`TimedeltaArray.isin` where mismatched resolutions could silently truncate finer-resolution values, leading to false matches (:issue:`64545`)
- Bug in adding non-nano :class:`DatetimeIndex` with non-vectorized offsets (e.g. :class:`CustomBusinessDay`, :class:`CustomBusinessMonthEnd`) having a sub-unit
offsetparameter incorrectly truncating the result or raisingAttributeError(:issue:`56586`) - Bug in subtracting :class:`BusinessHour` (or :class:`CustomBusinessHour`) from a :class:`Timestamp` giving incorrect results when the subtraction would land exactly on the business-hour opening time (:issue:`33682`)
- Bug in :class:`DateOffset` where
DateOffset(1)andDateOffset(days=1)returned different results near daylight saving time transitions (:issue:`61862`) - Bug in :func:`to_timedelta` where passing
np.str_objects would fail in Cython string parsing (:issue:`48974`) - Fixed regression in :meth:`Timedelta.round`, :meth:`Timedelta.floor`, and :meth:`Timedelta.ceil` raising
ZeroDivisionErrorfor sub-secondfreq(:issue:`64828`)
- Bug in :class:`DatetimeIndex` addition with a :class:`DateOffset` that has only timedelta components (e.g.
DateOffset(hours=-2)) raisingValueErrornear DST transitions, while scalar :class:`Timestamp` addition worked correctly (:issue:`28610`)
- Fixed bug in :func:`read_excel` where having a column with mixture of numeric and boolean values will typecast the values based on the first appearance data type since 1==True and 0==False (:issue:`60088`)
- Fixed bug in :meth:`Series.clip` where passing a scalar numpy array (e.g.
np.array(0)) would raise aTypeError(:issue:`59053`) - Fixed bug in :meth:`Series.mean` and :meth:`Series.sum` (and their :class:`DataFrame` counterparts) overflowing for
float16dtypes instead of upcasting tofloat64(:issue:`43929`) - Fixed bug in :meth:`Series.skew` and :meth:`Series.kurt` (and their :class:`DataFrame` counterparts) returning
0.0for degenerate distributions; these now returnNaN(:issue:`62864`) - Fixed bug where :class:`DataFrame` arithmetic operations with :class:`Series` did not support the fill_value parameter(:issue:`61581`)
- Bug in :class:`DataFrame` constructor where
NaTin a :class:`TimedeltaIndex` row was incorrectly inferred asdatetime64instead oftimedelta64(:issue:`23985`) - Bug in :class:`DataFrame` constructor where constructing from a list of uniform-dtype arrays (e.g. pyarrow, :class:`CategoricalDtype`, nullable dtypes) lost the dtype (:issue:`49593`)
- Bug in :func:`pd.array` silently converting NaN to a nonsensical integer when given float data containing NaN and a NumPy integer dtype (:issue:`41724`)
- Fixed :func:`pandas.array` to preserve mask information when converting NumPy masked arrays, converting masked values to missing values (:issue:`63879`).
- Fixed bug in :meth:`DataFrame.from_records` where
excludewas ignored whendatawas an iterator andnrows=0(:issue:`63774`)
- Bug in :meth:`DataFrame.replace` with
regex=Truemutating the underlying :class:`StringArray` when the replacement value was not a string (:issue:`57733`)
- Bug in :meth:`DataFrame.loc` returning incorrect dtype when the column key is a
slice(:issue:`63071`) - Bug in :meth:`Index.get_indexer` where
method="pad","backfill", or"nearest"returned incorrect results when the target containedNaTorNaNinstead of-1(:issue:`32572`) - Bugs in setitem-with-expansion when adding new rows failing to keep the original dtype in some cases (:issue:`32346`, :issue:`15231`, :issue:`47503`, :issue:`6485`, :issue:`25383`, :issue:`52235`, :issue:`17026`, :issue:`56010`)
- Bug in :meth:`DataFrame.__getitem__` raising
InvalidIndexErrorwhen indexing with a tuple containing asliceon a :class:`DataFrame` with :class:`MultiIndex` columns (e.g.,df[:, "t1"]) (:issue:`26511`) - Bug in :meth:`DataFrame.iloc` setitem raising
AttributeErrorwhen assigning a :class:`Series` or :class:`Index` with a nullable EA dtype (e.g.Int64,Float64,boolean) into a column with a NumPy dtype (:issue:`47776`) - Bug in :meth:`DataFrame.mask` with
inplace=Truewhere incorrect values were produced whenotherwas a :class:`Series` with :class:`ExtensionArray` values (:issue:`64635`) - Bug in :meth:`DataFrame.where` and :meth:`DataFrame.mask` raising
TypeErrorwhencondis a :class:`Series` andaxis=1(:issue:`58190`) - Bug in :meth:`DataFrame.xs` where
drop_level=Falsewas ignored for fully specified :class:`MultiIndex` keys whenlevelwas not explicitly provided (:issue:`6507`) - Bug in :meth:`Index.get_level_values` mishandling boolean, NA-like (
np.nan,pd.NA,pd.NaT) and integer index names (:issue:`62169`) - Bug in :meth:`Index.get_loc` raising
KeyErrorwhen looking up a tuple in an object-dtype :class:`Index` with duplicates (:issue:`37800`) - Bug in :meth:`Index.insert` silently casting booleans to numeric when used with nullable numeric dtypes like
Float64orInt64(:issue:`61709`)
- Bug in :meth:`DataFrame.fillna` with a dict value raising
RecursionErrorwhen columns are a :class:`MultiIndex` with duplicate entries (:issue:`53498`)
- Bug in :meth:`DataFrame.loc` with a :class:`MultiIndex` where using a tuple indexer with a scalar and a list (e.g.,
(scalar, list)) did not drop the scalar-indexed level (:issue:`18631`) - Bug in :meth:`MultiIndex.sortlevel` not raising
TypeErrorwhen sorting a level with incomparable types (e.g.,Timestampandstr) (:issue:`21136`)
- Fixed bug in :func:`read_csv` with the
cengine where an embedded\rfollowed by a space in an unquoted field could cause an infinite re-parsing loop, producing spurious rows or a buffer overflow (:issue:`51141`) - Fixed bug in :func:`read_excel` where usage of
skiprowscould lead to an infinite loop (:issue:`64027`) - Fixed :func:`read_json` with
lines=Trueandchunksizeto respectnrowswhen the requested row count is not a multiple of the chunk size (:issue:`64025`) - Bug in :meth:`DataFrame.__repr__` where horizontally truncated output could exceed the terminal width by up to 4 characters because the
" ..."separator column was not accounted for in the width budget (:issue:`32461`) - Bug in :meth:`DataFrame.to_stata` raising
KeyErrorwhen column names require renaming andconvert_datesis specified for a different column (:issue:`60536`) - Fixed :func:`read_json` with
lines=Trueandnrows=0to return an empty DataFrame (:issue:`64025`) - Fixed bug in :meth:`HDFStore.select` where passing
whereas a list of conditions referencing caller-scope variables failed on Python 3.12+ due to PEP 709 inlining list comprehension stack frames (:issue:`64881`)
- Bug in :class:`Period` constructor where passing
np.str_objects would fail in Cython string parsing (:issue:`48974`)
- Bug in :meth:`DataFrame.plot.hexbin` ignoring
rcParams["image.cmap"]and always defaulting to"BuGn"when no colormap was specified (:issue:`31871`)
- Bug in :meth:`.DataFrameGroupBy.agg` when there are no groups, multiple keys, and
group_keys=False(:issue:`51445`) - Bug in :meth:`.DataFrameGroupBy.agg` would operate on the group as a whole when
argsorkwargsare supplied for the providedfunc; now this method only operates on each Series of the group (:issue:`39169`) - Bug in :meth:`.Rolling.skew` and :meth:`.Rolling.kurt` (and their :class:`GroupBy` counterparts) returning
0.0and-3.0respectively for degenerate windows or groups; these now returnNaN(:issue:`62864`) - Bug in :meth:`.Rolling.skew` and :meth:`.Rolling.kurt` returning
NaNfor low-variance windows (:issue:`62946`) - Bug in :meth:`DataFrame.groupby` with a :class:`Grouper` with
freqraisingAttributeErrorwhen all grouping keys areNaT(:issue:`43486`) - Bug in :meth:`Series.resample` and :meth:`DataFrame.resample` where same-frequency resampling with monthly, quarterly, or annual frequencies bypassed aggregation, returning the original values instead of the aggregation result (:issue:`18553`)
- Bug in :func:`merge` where merging on a :class:`MultiIndex` containing
NaNvalues mappedNaNkeys to the last level value instead ofNaN(:issue:`64492`) - Bug in :meth:`Index.union` where the result could be unsorted when both inputs were monotonic increasing but disjoint, when
sortwas notFalse(:issue:`54646`) - In :func:`pivot_table`, when
valuesis empty, the aggregation will be computed on a Series of all NA values (:issue:`46475`)
- Bug in :meth:`SparseArray.astype` where converting a datetime64 :class:`SparseArray` with
NaTfill value to"Sparse[int64]"silently replaced the fill value with0instead ofiNaT(:issue:`49631`) - Bug in indexing a :class:`SparseArray` with an out-of-bounds integer with the value of the length of the array returning the fill value instead of raising an
IndexError(:issue:`64183`).
- Fixed bug in :meth:`Series.apply` and :meth:`Series.map` where nullable integer dtypes were converted to float, causing precision loss for large integers; now the nullable dtype will be preserved (:issue:`63903`).