Skip to content

Commit a177975

Browse files
author
MarcoGorelli
committed
Merge remote-tracking branch 'upstream/main' into share-datetime-parsing-format-paths
2 parents d5c584b + ca3e0c8 commit a177975

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+381
-597
lines changed

Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,6 @@ RUN apt-get install -y build-essential
88
RUN apt-get install -y libhdf5-dev
99

1010
RUN python -m pip install --upgrade pip
11-
RUN python -m pip install --use-deprecated=legacy-resolver \
11+
RUN python -m pip install \
1212
-r https://raw.githubusercontent.com/pandas-dev/pandas/main/requirements-dev.txt
1313
CMD ["/bin/bash"]

doc/source/whatsnew/v2.0.0.rst

+11-4
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@ Other enhancements
9393
- :func:`timedelta_range` now supports a ``unit`` keyword ("s", "ms", "us", or "ns") to specify the desired resolution of the output index (:issue:`49824`)
9494
- :meth:`DataFrame.to_json` now supports a ``mode`` keyword with supported inputs 'w' and 'a'. Defaulting to 'w', 'a' can be used when lines=True and orient='records' to append record oriented json lines to an existing json file. (:issue:`35849`)
9595
- Added ``name`` parameter to :meth:`IntervalIndex.from_breaks`, :meth:`IntervalIndex.from_arrays` and :meth:`IntervalIndex.from_tuples` (:issue:`48911`)
96+
- Improve exception message when using :func:`assert_frame_equal` on a :class:`DataFrame` to include the column that is compared (:issue:`50323`)
9697
- Improved error message for :func:`merge_asof` when join-columns were duplicated (:issue:`50102`)
9798
- Added :meth:`Index.infer_objects` analogous to :meth:`Series.infer_objects` (:issue:`50034`)
9899
- Added ``copy`` parameter to :meth:`Series.infer_objects` and :meth:`DataFrame.infer_objects`, passing ``False`` will avoid making copies for series or columns that are already non-object or where no better dtype can be inferred (:issue:`50096`)
@@ -721,6 +722,7 @@ Removal of prior version deprecations/changes
721722
- When providing a list of columns of length one to :meth:`DataFrame.groupby`, the keys that are returned by iterating over the resulting :class:`DataFrameGroupBy` object will now be tuples of length one (:issue:`47761`)
722723
- Removed deprecated methods :meth:`ExcelWriter.write_cells`, :meth:`ExcelWriter.save`, :meth:`ExcelWriter.cur_sheet`, :meth:`ExcelWriter.handles`, :meth:`ExcelWriter.path` (:issue:`45795`)
723724
- The :class:`ExcelWriter` attribute ``book`` can no longer be set; it is still available to be accessed and mutated (:issue:`48943`)
725+
- Removed unused ``*args`` and ``**kwargs`` in :class:`Rolling`, :class:`Expanding`, and :class:`ExponentialMovingWindow` ops (:issue:`47851`)
724726
-
725727

726728
.. ---------------------------------------------------------------------------
@@ -740,6 +742,7 @@ Performance improvements
740742
- Performance improvement in :meth:`Index.union` and :meth:`MultiIndex.union` when index contains duplicates (:issue:`48900`)
741743
- Performance improvement in :meth:`Series.rank` for pyarrow-backed dtypes (:issue:`50264`)
742744
- Performance improvement in :meth:`Series.fillna` for extension array dtypes (:issue:`49722`, :issue:`50078`)
745+
- Performance improvement in :meth:`Index.join`, :meth:`Index.intersection` and :meth:`Index.union` for masked dtypes when :class:`Index` is monotonic (:issue:`50310`)
743746
- Performance improvement for :meth:`Series.value_counts` with nullable dtype (:issue:`48338`)
744747
- Performance improvement for :class:`Series` constructor passing integer numpy array with nullable dtype (:issue:`48338`)
745748
- Performance improvement for :class:`DatetimeIndex` constructor passing a list (:issue:`48609`)
@@ -790,8 +793,7 @@ Datetimelike
790793
- Bug in :func:`to_datetime` was raising ``ValueError`` when parsing :class:`Timestamp`, ``datetime.datetime``, ``datetime.date``, or ``np.datetime64`` objects when non-ISO8601 ``format`` was passed (:issue:`49298`, :issue:`50036`)
791794
- Bug in :func:`to_datetime` was raising ``ValueError`` when parsing empty string and non-ISO8601 format was passed. Now, empty strings will be parsed as :class:`NaT`, for compatibility with how is done for ISO8601 formats (:issue:`50251`)
792795
- Bug in :class:`Timestamp` was showing ``UserWarning``, which was not actionable by users, when parsing non-ISO8601 delimited date strings (:issue:`50232`)
793-
- Bug in :class:`Timestamp` was showing ``UserWarning`` which was not actionable by users (:issue:`50232`)
794-
- Bug in :func:`to_datetime` was raising ``ValueError`` when parsing empty string and non-ISO8601 format was passed. Now, empty strings will be parsed as :class:`NaT`, for compatibility with how is done for ISO8601 formats (:issue:`50251`)
796+
- Bug in :func:`to_datetime` was showing misleading ``ValueError`` when parsing dates with format containing ISO week directive and ISO weekday directive (:issue:`50308`)
795797
-
796798

797799
Timedelta
@@ -810,12 +812,14 @@ Timezones
810812
Numeric
811813
^^^^^^^
812814
- Bug in :meth:`DataFrame.add` cannot apply ufunc when inputs contain mixed DataFrame type and Series type (:issue:`39853`)
815+
- Bug in arithmetic operations on :class:`Series` not propagating mask when combining masked dtypes and numpy dtypes (:issue:`45810`, :issue:`42630`)
813816
- Bug in DataFrame reduction methods (e.g. :meth:`DataFrame.sum`) with object dtype, ``axis=1`` and ``numeric_only=False`` would not be coerced to float (:issue:`49551`)
814817
- Bug in :meth:`DataFrame.sem` and :meth:`Series.sem` where an erroneous ``TypeError`` would always raise when using data backed by an :class:`ArrowDtype` (:issue:`49759`)
815818

816819
Conversion
817820
^^^^^^^^^^
818821
- Bug in constructing :class:`Series` with ``int64`` dtype from a string list raising instead of casting (:issue:`44923`)
822+
- Bug in constructing :class:`Series` with masked dtype and boolean values with ``NA`` raising (:issue:`42137`)
819823
- Bug in :meth:`DataFrame.eval` incorrectly raising an ``AttributeError`` when there are negative values in function call (:issue:`46471`)
820824
- Bug in :meth:`Series.convert_dtypes` not converting dtype to nullable dtype when :class:`Series` contains ``NA`` and has dtype ``object`` (:issue:`48791`)
821825
- Bug where any :class:`ExtensionDtype` subclass with ``kind="M"`` would be interpreted as a timezone type (:issue:`34986`)
@@ -880,15 +884,18 @@ I/O
880884
- Bug when a pickling a subset PyArrow-backed data that would serialize the entire data instead of the subset (:issue:`42600`)
881885
- Bug in :func:`read_sql_query` ignoring ``dtype`` argument when ``chunksize`` is specified and result is empty (:issue:`50245`)
882886
- Bug in :func:`read_csv` for a single-line csv with fewer columns than ``names`` raised :class:`.errors.ParserError` with ``engine="c"`` (:issue:`47566`)
887+
- Bug in :func:`read_json` raising with ``orient="table"`` and ``NA`` value (:issue:`40255`)
883888
- Bug in displaying ``string`` dtypes not showing storage option (:issue:`50099`)
884-
- Bug in :func:`DataFrame.to_string` with ``header=False`` that printed the index name on the same line as the first row of the data (:issue:`49230`)
889+
- Bug in :meth:`DataFrame.to_string` with ``header=False`` that printed the index name on the same line as the first row of the data (:issue:`49230`)
890+
- Bug in :meth:`DataFrame.to_string` ignoring float formatter for extension arrays (:issue:`39336`)
885891
- Fixed memory leak which stemmed from the initialization of the internal JSON module (:issue:`49222`)
886892
- Fixed issue where :func:`json_normalize` would incorrectly remove leading characters from column names that matched the ``sep`` argument (:issue:`49861`)
887893
- Bug in :meth:`DataFrame.to_json` where it would segfault when failing to encode a string (:issue:`50307`)
888894

889895
Period
890896
^^^^^^
891897
- Bug in :meth:`Period.strftime` and :meth:`PeriodIndex.strftime`, raising ``UnicodeDecodeError`` when a locale-specific directive was passed (:issue:`46319`)
898+
- Bug in adding a :class:`Period` object to an array of :class:`DateOffset` objects incorrectly raising ``TypeError`` (:issue:`50162`)
892899
-
893900

894901
Plotting
@@ -935,7 +942,7 @@ ExtensionArray
935942
- Bug in :meth:`Series.mean` overflowing unnecessarily with nullable integers (:issue:`48378`)
936943
- Bug in :meth:`Series.tolist` for nullable dtypes returning numpy scalars instead of python scalars (:issue:`49890`)
937944
- Bug when concatenating an empty DataFrame with an ExtensionDtype to another DataFrame with the same ExtensionDtype, the resulting dtype turned into object (:issue:`48510`)
938-
-
945+
- Bug in :meth:`array.PandasArray.to_numpy` raising with ``NA`` value when ``na_value`` is specified (:issue:`40638`)
939946

940947
Styler
941948
^^^^^^

pandas/_libs/tslibs/period.pyx

+16
Original file line numberDiff line numberDiff line change
@@ -1741,6 +1741,11 @@ cdef class _Period(PeriodMixin):
17411741
raise TypeError(f"unsupported operand type(s) for +: '{sname}' "
17421742
f"and '{oname}'")
17431743

1744+
elif util.is_array(other):
1745+
if other.dtype == object:
1746+
# GH#50162
1747+
return np.array([self + x for x in other], dtype=object)
1748+
17441749
return NotImplemented
17451750

17461751
def __radd__(self, other):
@@ -1767,11 +1772,22 @@ cdef class _Period(PeriodMixin):
17671772
elif other is NaT:
17681773
return NaT
17691774

1775+
elif util.is_array(other):
1776+
if other.dtype == object:
1777+
# GH#50162
1778+
return np.array([self - x for x in other], dtype=object)
1779+
17701780
return NotImplemented
17711781

17721782
def __rsub__(self, other):
17731783
if other is NaT:
17741784
return NaT
1785+
1786+
elif util.is_array(other):
1787+
if other.dtype == object:
1788+
# GH#50162
1789+
return np.array([x - self for x in other], dtype=object)
1790+
17751791
return NotImplemented
17761792

17771793
def asfreq(self, freq, how="E") -> "Period":

pandas/_libs/tslibs/strptime.pyx

+32-20
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,38 @@ def array_strptime(
127127
raise ValueError("Cannot use '%W' or '%U' without day and year")
128128
elif "%Z" in fmt and "%z" in fmt:
129129
raise ValueError("Cannot parse both %Z and %z")
130+
elif "%j" in fmt and "%G" in fmt:
131+
raise ValueError("Day of the year directive '%j' is not "
132+
"compatible with ISO year directive '%G'. "
133+
"Use '%Y' instead.")
134+
elif "%G" in fmt and (
135+
"%V" not in fmt
136+
or not (
137+
"%A" in fmt
138+
or "%a" in fmt
139+
or "%w" in fmt
140+
or "%u" in fmt
141+
)
142+
):
143+
raise ValueError("ISO year directive '%G' must be used with "
144+
"the ISO week directive '%V' and a weekday "
145+
"directive '%A', '%a', '%w', or '%u'.")
146+
elif "%V" in fmt and "%Y" in fmt:
147+
raise ValueError("ISO week directive '%V' is incompatible with "
148+
"the year directive '%Y'. Use the ISO year "
149+
"'%G' instead.")
150+
elif "%V" in fmt and (
151+
"%G" not in fmt
152+
or not (
153+
"%A" in fmt
154+
or "%a" in fmt
155+
or "%w" in fmt
156+
or "%u" in fmt
157+
)
158+
):
159+
raise ValueError("ISO week directive '%V' must be used with "
160+
"the ISO year directive '%G' and a weekday "
161+
"directive '%A', '%a', '%w', or '%u'.")
130162

131163
global _TimeRE_cache, _regex_cache
132164
with _cache_lock:
@@ -365,26 +397,6 @@ def array_strptime(
365397
weekday = int(found_dict["u"])
366398
weekday -= 1
367399

368-
# don't assume default values for ISO week/year
369-
if iso_year != -1:
370-
if iso_week == -1 or weekday == -1:
371-
raise ValueError("ISO year directive '%G' must be used with "
372-
"the ISO week directive '%V' and a weekday "
373-
"directive '%A', '%a', '%w', or '%u'.")
374-
if julian != -1:
375-
raise ValueError("Day of the year directive '%j' is not "
376-
"compatible with ISO year directive '%G'. "
377-
"Use '%Y' instead.")
378-
elif year != -1 and week_of_year == -1 and iso_week != -1:
379-
if weekday == -1:
380-
raise ValueError("ISO week directive '%V' must be used with "
381-
"the ISO year directive '%G' and a weekday "
382-
"directive '%A', '%a', '%w', or '%u'.")
383-
else:
384-
raise ValueError("ISO week directive '%V' is incompatible with "
385-
"the year directive '%Y'. Use the ISO year "
386-
"'%G' instead.")
387-
388400
# If we know the wk of the year and what day of that wk, we can figure
389401
# out the Julian day of the year.
390402
if julian == -1 and weekday != -1:

pandas/_testing/asserters.py

+18-5
Original file line numberDiff line numberDiff line change
@@ -680,6 +680,7 @@ def assert_extension_array_equal(
680680
check_exact: bool = False,
681681
rtol: float = 1.0e-5,
682682
atol: float = 1.0e-8,
683+
obj: str = "ExtensionArray",
683684
) -> None:
684685
"""
685686
Check that left and right ExtensionArrays are equal.
@@ -702,6 +703,11 @@ def assert_extension_array_equal(
702703
Absolute tolerance. Only used when check_exact is False.
703704
704705
.. versionadded:: 1.1.0
706+
obj : str, default 'ExtensionArray'
707+
Specify object name being compared, internally used to show appropriate
708+
assertion message.
709+
710+
.. versionadded:: 2.0.0
705711
706712
Notes
707713
-----
@@ -719,7 +725,7 @@ def assert_extension_array_equal(
719725
assert isinstance(left, ExtensionArray), "left is not an ExtensionArray"
720726
assert isinstance(right, ExtensionArray), "right is not an ExtensionArray"
721727
if check_dtype:
722-
assert_attr_equal("dtype", left, right, obj="ExtensionArray")
728+
assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}")
723729

724730
if (
725731
isinstance(left, DatetimeLikeArrayMixin)
@@ -729,21 +735,24 @@ def assert_extension_array_equal(
729735
# Avoid slow object-dtype comparisons
730736
# np.asarray for case where we have a np.MaskedArray
731737
assert_numpy_array_equal(
732-
np.asarray(left.asi8), np.asarray(right.asi8), index_values=index_values
738+
np.asarray(left.asi8),
739+
np.asarray(right.asi8),
740+
index_values=index_values,
741+
obj=obj,
733742
)
734743
return
735744

736745
left_na = np.asarray(left.isna())
737746
right_na = np.asarray(right.isna())
738747
assert_numpy_array_equal(
739-
left_na, right_na, obj="ExtensionArray NA mask", index_values=index_values
748+
left_na, right_na, obj=f"{obj} NA mask", index_values=index_values
740749
)
741750

742751
left_valid = left[~left_na].to_numpy(dtype=object)
743752
right_valid = right[~right_na].to_numpy(dtype=object)
744753
if check_exact:
745754
assert_numpy_array_equal(
746-
left_valid, right_valid, obj="ExtensionArray", index_values=index_values
755+
left_valid, right_valid, obj=obj, index_values=index_values
747756
)
748757
else:
749758
_testing.assert_almost_equal(
@@ -752,7 +761,7 @@ def assert_extension_array_equal(
752761
check_dtype=bool(check_dtype),
753762
rtol=rtol,
754763
atol=atol,
755-
obj="ExtensionArray",
764+
obj=obj,
756765
index_values=index_values,
757766
)
758767

@@ -909,6 +918,7 @@ def assert_series_equal(
909918
right_values,
910919
check_dtype=check_dtype,
911920
index_values=np.asarray(left.index),
921+
obj=str(obj),
912922
)
913923
else:
914924
assert_numpy_array_equal(
@@ -955,6 +965,7 @@ def assert_series_equal(
955965
atol=atol,
956966
check_dtype=check_dtype,
957967
index_values=np.asarray(left.index),
968+
obj=str(obj),
958969
)
959970
elif is_extension_array_dtype_and_needs_i8_conversion(
960971
left.dtype, right.dtype
@@ -964,6 +975,7 @@ def assert_series_equal(
964975
right._values,
965976
check_dtype=check_dtype,
966977
index_values=np.asarray(left.index),
978+
obj=str(obj),
967979
)
968980
elif needs_i8_conversion(left.dtype) and needs_i8_conversion(right.dtype):
969981
# DatetimeArray or TimedeltaArray
@@ -972,6 +984,7 @@ def assert_series_equal(
972984
right._values,
973985
check_dtype=check_dtype,
974986
index_values=np.asarray(left.index),
987+
obj=str(obj),
975988
)
976989
else:
977990
_testing.assert_almost_equal(

pandas/compat/numpy/function.py

-45
Original file line numberDiff line numberDiff line change
@@ -335,51 +335,6 @@ def validate_take_with_convert(convert: ndarray | bool | None, args, kwargs) ->
335335
)
336336

337337

338-
def validate_window_func(name, args, kwargs) -> None:
339-
numpy_args = ("axis", "dtype", "out")
340-
msg = (
341-
f"numpy operations are not valid with window objects. "
342-
f"Use .{name}() directly instead "
343-
)
344-
345-
if len(args) > 0:
346-
raise UnsupportedFunctionCall(msg)
347-
348-
for arg in numpy_args:
349-
if arg in kwargs:
350-
raise UnsupportedFunctionCall(msg)
351-
352-
353-
def validate_rolling_func(name, args, kwargs) -> None:
354-
numpy_args = ("axis", "dtype", "out")
355-
msg = (
356-
f"numpy operations are not valid with window objects. "
357-
f"Use .rolling(...).{name}() instead "
358-
)
359-
360-
if len(args) > 0:
361-
raise UnsupportedFunctionCall(msg)
362-
363-
for arg in numpy_args:
364-
if arg in kwargs:
365-
raise UnsupportedFunctionCall(msg)
366-
367-
368-
def validate_expanding_func(name, args, kwargs) -> None:
369-
numpy_args = ("axis", "dtype", "out")
370-
msg = (
371-
f"numpy operations are not valid with window objects. "
372-
f"Use .expanding(...).{name}() instead "
373-
)
374-
375-
if len(args) > 0:
376-
raise UnsupportedFunctionCall(msg)
377-
378-
for arg in numpy_args:
379-
if arg in kwargs:
380-
raise UnsupportedFunctionCall(msg)
381-
382-
383338
def validate_groupby_func(name, args, kwargs, allowed=None) -> None:
384339
"""
385340
'args' and 'kwargs' should be empty, except for allowed kwargs because all

pandas/core/arrays/masked.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -609,9 +609,13 @@ def _propagate_mask(
609609
if other is libmissing.NA:
610610
# GH#45421 don't alter inplace
611611
mask = mask | True
612+
elif is_list_like(other) and len(other) == len(mask):
613+
mask = mask | isna(other)
612614
else:
613615
mask = self._mask | mask
614-
return mask
616+
# Incompatible return value type (got "Optional[ndarray[Any, dtype[bool_]]]",
617+
# expected "ndarray[Any, dtype[bool_]]")
618+
return mask # type: ignore[return-value]
615619

616620
def _arith_method(self, other, op):
617621
op_name = op.__name__

pandas/core/arrays/numeric.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -172,9 +172,7 @@ def _coerce_to_data_and_mask(values, mask, dtype, copy, dtype_cls, default_dtype
172172
inferred_type = None
173173
if is_object_dtype(values.dtype) or is_string_dtype(values.dtype):
174174
inferred_type = lib.infer_dtype(values, skipna=True)
175-
if inferred_type == "empty":
176-
pass
177-
elif inferred_type == "boolean":
175+
if inferred_type == "boolean" and dtype is None:
178176
name = dtype_cls.__name__.strip("_")
179177
raise TypeError(f"{values.dtype} cannot be converted to {name}")
180178

pandas/core/arrays/numpy_.py

+9-5
Original file line numberDiff line numberDiff line change
@@ -389,13 +389,17 @@ def to_numpy(
389389
copy: bool = False,
390390
na_value: object = lib.no_default,
391391
) -> np.ndarray:
392-
result = np.asarray(self._ndarray, dtype=dtype)
392+
mask = self.isna()
393+
if na_value is not lib.no_default and mask.any():
394+
result = self._ndarray.copy()
395+
result[mask] = na_value
396+
else:
397+
result = self._ndarray
393398

394-
if (copy or na_value is not lib.no_default) and result is self._ndarray:
395-
result = result.copy()
399+
result = np.asarray(result, dtype=dtype)
396400

397-
if na_value is not lib.no_default:
398-
result[self.isna()] = na_value
401+
if copy and result is self._ndarray:
402+
result = result.copy()
399403

400404
return result
401405

pandas/core/arrays/string_.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -266,7 +266,7 @@ class StringArray(BaseStringArray, PandasArray):
266266
267267
See Also
268268
--------
269-
array
269+
:func:`pandas.array`
270270
The recommended function for creating a StringArray.
271271
Series.str
272272
The string methods are available on Series backed by

0 commit comments

Comments
 (0)