Skip to content

Commit ade2efe

Browse files
author
luke
committed
Merge branch 'bug-agg-nonunique-col' of https://github.com/luke396/pandas into bug-agg-nonunique-col
2 parents e533f43 + d592fdf commit ade2efe

40 files changed

+258
-146
lines changed

doc/source/development/contributing.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -349,7 +349,7 @@ If using :ref:`mamba <contributing.mamba>`, do::
349349
If using :ref:`pip <contributing.pip>` , do::
350350

351351
# activate the virtual environment based on your platform
352-
pythom -m pip install --upgrade -r requirements-dev.txt
352+
python -m pip install --upgrade -r requirements-dev.txt
353353

354354
Tips for a successful pull request
355355
==================================

doc/source/user_guide/basics.rst

-10
Original file line numberDiff line numberDiff line change
@@ -329,16 +329,6 @@ You can test if a pandas object is empty, via the :attr:`~DataFrame.empty` prope
329329
df.empty
330330
pd.DataFrame(columns=list("ABC")).empty
331331
332-
To evaluate single-element pandas objects in a boolean context, use the method
333-
:meth:`~DataFrame.bool`:
334-
335-
.. ipython:: python
336-
337-
pd.Series([True]).bool()
338-
pd.Series([False]).bool()
339-
pd.DataFrame([[True]]).bool()
340-
pd.DataFrame([[False]]).bool()
341-
342332
.. warning::
343333

344334
You might be tempted to do the following:

doc/source/user_guide/gotchas.rst

-10
Original file line numberDiff line numberDiff line change
@@ -121,16 +121,6 @@ Below is how to check if any of the values are ``True``:
121121
if pd.Series([False, True, False]).any():
122122
print("I am any")
123123
124-
To evaluate single-element pandas objects in a boolean context, use the method
125-
:meth:`~DataFrame.bool`:
126-
127-
.. ipython:: python
128-
129-
pd.Series([True]).bool()
130-
pd.Series([False]).bool()
131-
pd.DataFrame([[True]]).bool()
132-
pd.DataFrame([[False]]).bool()
133-
134124
Bitwise boolean
135125
~~~~~~~~~~~~~~~
136126

doc/source/whatsnew/v0.13.0.rst

+9-5
Original file line numberDiff line numberDiff line change
@@ -153,12 +153,16 @@ API changes
153153
154154
Added the ``.bool()`` method to ``NDFrame`` objects to facilitate evaluating of single-element boolean Series:
155155

156-
.. ipython:: python
156+
.. code-block:: python
157157
158-
pd.Series([True]).bool()
159-
pd.Series([False]).bool()
160-
pd.DataFrame([[True]]).bool()
161-
pd.DataFrame([[False]]).bool()
158+
>>> pd.Series([True]).bool()
159+
True
160+
>>> pd.Series([False]).bool()
161+
False
162+
>>> pd.DataFrame([[True]]).bool()
163+
True
164+
>>> pd.DataFrame([[False]]).bool()
165+
False
162166
163167
- All non-Index NDFrames (``Series``, ``DataFrame``, ``Panel``, ``Panel4D``,
164168
``SparsePanel``, etc.), now support the entire set of arithmetic operators

doc/source/whatsnew/v2.1.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@ Deprecations
125125
- Deprecated the 'axis' keyword in :meth:`.GroupBy.idxmax`, :meth:`.GroupBy.idxmin`, :meth:`.GroupBy.fillna`, :meth:`.GroupBy.take`, :meth:`.GroupBy.skew`, :meth:`.GroupBy.rank`, :meth:`.GroupBy.cumprod`, :meth:`.GroupBy.cumsum`, :meth:`.GroupBy.cummax`, :meth:`.GroupBy.cummin`, :meth:`.GroupBy.pct_change`, :meth:`GroupBy.diff`, :meth:`.GroupBy.shift`, and :meth:`DataFrameGroupBy.corrwith`; for ``axis=1`` operate on the underlying :class:`DataFrame` instead (:issue:`50405`, :issue:`51046`)
126126
- Deprecated passing a dictionary to :meth:`.SeriesGroupBy.agg`; pass a list of aggregations instead (:issue:`50684`)
127127
- Deprecated logical operations (``|``, ``&``, ``^``) between pandas objects and dtype-less sequences (e.g. ``list``, ``tuple``), wrap a sequence in a :class:`Series` or numpy array before operating instead (:issue:`51521`)
128+
- Deprecated the methods :meth:`Series.bool` and :meth:`DataFrame.bool` (:issue:`51749`)
128129
- Deprecated :meth:`DataFrame.swapaxes` and :meth:`Series.swapaxes`, use :meth:`DataFrame.transpose` or :meth:`Series.transpose` instead (:issue:`51946`)
129130
- Deprecated parameter ``convert_type`` in :meth:`Series.apply` (:issue:`52140`)
130131
-

pandas/_libs/tslibs/offsets.pyx

+8-8
Original file line numberDiff line numberDiff line change
@@ -1350,18 +1350,18 @@ class DateOffset(RelativeDeltaOffset, metaclass=OffsetMeta):
13501350
valid dates. For example, Bday(2) can be added to a date to move
13511351
it two business days forward. If the date does not start on a
13521352
valid date, first it is moved to a valid date. Thus pseudo code
1353-
is:
1353+
is::
13541354
1355-
def __add__(date):
1356-
date = rollback(date) # does nothing if date is valid
1357-
return date + <n number of periods>
1355+
def __add__(date):
1356+
date = rollback(date) # does nothing if date is valid
1357+
return date + <n number of periods>
13581358
13591359
When a date offset is created for a negative number of periods,
1360-
the date is first rolled forward. The pseudo code is:
1360+
the date is first rolled forward. The pseudo code is::
13611361
1362-
def __add__(date):
1363-
date = rollforward(date) # does nothing is date is valid
1364-
return date + <n number of periods>
1362+
def __add__(date):
1363+
date = rollforward(date) # does nothing if date is valid
1364+
return date + <n number of periods>
13651365
13661366
Zero presents a problem. Should it roll forward or back? We
13671367
arbitrarily have it rollforward:

pandas/_libs/tslibs/timestamps.pyx

+3
Original file line numberDiff line numberDiff line change
@@ -1293,6 +1293,9 @@ class Timestamp(_Timestamp):
12931293
Unit used for conversion if ts_input is of type int or float. The
12941294
valid values are 'D', 'h', 'm', 's', 'ms', 'us', and 'ns'. For
12951295
example, 's' means seconds and 'ms' means milliseconds.
1296+
1297+
For float inputs, the result will be stored in nanoseconds, and
1298+
the unit attribute will be set as ``'ns'``.
12961299
fold : {0, 1}, default None, keyword-only
12971300
Due to daylight saving time, one wall clock time can occur twice
12981301
when shifting from summer to winter time; fold describes whether the

pandas/conftest.py

+5
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,11 @@ def pytest_collection_modifyitems(items, config) -> None:
137137
ignored_doctest_warnings = [
138138
# Docstring divides by zero to show behavior difference
139139
("missing.mask_zero_div_zero", "divide by zero encountered"),
140+
(
141+
"pandas.core.generic.NDFrame.bool",
142+
"(Series|DataFrame).bool is now deprecated and will be removed "
143+
"in future version of pandas",
144+
),
140145
]
141146

142147
for item in items:

pandas/core/algorithms.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -838,7 +838,7 @@ def value_counts(
838838
if bins is not None:
839839
from pandas.core.reshape.tile import cut
840840

841-
values = Series(values)
841+
values = Series(values, copy=False)
842842
try:
843843
ii = cut(values, bins, include_lowest=True)
844844
except TypeError as err:
@@ -861,7 +861,7 @@ def value_counts(
861861
else:
862862
if is_extension_array_dtype(values):
863863
# handle Categorical and sparse,
864-
result = Series(values)._values.value_counts(dropna=dropna)
864+
result = Series(values, copy=False)._values.value_counts(dropna=dropna)
865865
result.name = name
866866
result.index.name = index_name
867867
counts = result._values
@@ -893,7 +893,7 @@ def value_counts(
893893
idx = idx.astype(object)
894894
idx.name = index_name
895895

896-
result = Series(counts, index=idx, name=name)
896+
result = Series(counts, index=idx, name=name, copy=False)
897897

898898
if sort:
899899
result = result.sort_values(ascending=ascending)

pandas/core/arrays/_mixins.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -445,7 +445,7 @@ def value_counts(self, dropna: bool = True) -> Series:
445445

446446
index_arr = self._from_backing_data(np.asarray(result.index._data))
447447
index = Index(index_arr, name=result.index.name)
448-
return Series(result._values, index=index, name=result.name)
448+
return Series(result._values, index=index, name=result.name, copy=False)
449449

450450
def _quantile(
451451
self,

pandas/core/arrays/arrow/array.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -1124,7 +1124,7 @@ def value_counts(self, dropna: bool = True) -> Series:
11241124

11251125
index = Index(type(self)(values))
11261126

1127-
return Series(counts, index=index, name="count")
1127+
return Series(counts, index=index, name="count", copy=False)
11281128

11291129
@classmethod
11301130
def _concat_same_type(cls, to_concat) -> Self:
@@ -1961,6 +1961,10 @@ def _str_wrap(self, width: int, **kwargs):
19611961
"str.wrap not supported with pd.ArrowDtype(pa.string())."
19621962
)
19631963

1964+
@property
1965+
def _dt_year(self):
1966+
return type(self)(pc.year(self._pa_array))
1967+
19641968
@property
19651969
def _dt_day(self):
19661970
return type(self)(pc.day(self._pa_array))

pandas/core/arrays/categorical.py

+6-2
Original file line numberDiff line numberDiff line change
@@ -1535,7 +1535,9 @@ def value_counts(self, dropna: bool = True) -> Series:
15351535
ix = coerce_indexer_dtype(ix, self.dtype.categories)
15361536
ix = self._from_backing_data(ix)
15371537

1538-
return Series(count, index=CategoricalIndex(ix), dtype="int64", name="count")
1538+
return Series(
1539+
count, index=CategoricalIndex(ix), dtype="int64", name="count", copy=False
1540+
)
15391541

15401542
# error: Argument 2 of "_empty" is incompatible with supertype
15411543
# "NDArrayBackedExtensionArray"; supertype defines the argument type as
@@ -1793,7 +1795,9 @@ def _values_for_rank(self):
17931795
# reorder the categories (so rank can use the float codes)
17941796
# instead of passing an object array to rank
17951797
values = np.array(
1796-
self.rename_categories(Series(self.categories).rank().values)
1798+
self.rename_categories(
1799+
Series(self.categories, copy=False).rank().values
1800+
)
17971801
)
17981802
return values
17991803

pandas/core/arrays/masked.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -995,7 +995,7 @@ def value_counts(self, dropna: bool = True) -> Series:
995995
)
996996

997997
if dropna:
998-
res = Series(value_counts, index=keys, name="count")
998+
res = Series(value_counts, index=keys, name="count", copy=False)
999999
res.index = res.index.astype(self.dtype)
10001000
res = res.astype("Int64")
10011001
return res
@@ -1011,7 +1011,7 @@ def value_counts(self, dropna: bool = True) -> Series:
10111011
mask = np.zeros(len(counts), dtype="bool")
10121012
counts_array = IntegerArray(counts, mask)
10131013

1014-
return Series(counts_array, index=index, name="count")
1014+
return Series(counts_array, index=index, name="count", copy=False)
10151015

10161016
@doc(ExtensionArray.equals)
10171017
def equals(self, other) -> bool:

pandas/core/arrays/sparse/accessor.py

+1
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,7 @@ def to_dense(self) -> Series:
219219
self._parent.array.to_dense(),
220220
index=self._parent.index,
221221
name=self._parent.name,
222+
copy=False,
222223
)
223224

224225

pandas/core/arrays/sparse/array.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -886,7 +886,7 @@ def value_counts(self, dropna: bool = True) -> Series:
886886
index = Index(keys)
887887
else:
888888
index = keys
889-
return Series(counts, index=index)
889+
return Series(counts, index=index, copy=False)
890890

891891
# --------
892892
# Indexing

pandas/core/arrays/sparse/scipy_sparse.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ def coo_to_sparse_series(
195195
from pandas import SparseDtype
196196

197197
try:
198-
ser = Series(A.data, MultiIndex.from_arrays((A.row, A.col)))
198+
ser = Series(A.data, MultiIndex.from_arrays((A.row, A.col)), copy=False)
199199
except AttributeError as err:
200200
raise TypeError(
201201
f"Expected coo_matrix. Got {type(A).__name__} instead."

pandas/core/base.py

+2-7
Original file line numberDiff line numberDiff line change
@@ -907,17 +907,12 @@ def _map_values(self, mapper, na_action=None, convert: bool = True):
907907
If the function returns a tuple with more than one element
908908
a MultiIndex will be returned.
909909
"""
910-
arr = extract_array(self, extract_numpy=True, extract_range=True)
910+
arr = self._values
911911

912912
if isinstance(arr, ExtensionArray):
913913
return arr.map(mapper, na_action=na_action)
914914

915-
# Argument 1 to "map_array" has incompatible type
916-
# "Union[IndexOpsMixin, ndarray[Any, Any]]";
917-
# expected "Union[ExtensionArray, ndarray[Any, Any]]
918-
return algorithms.map_array(
919-
arr, mapper, na_action=na_action, convert=convert # type: ignore[arg-type]
920-
)
915+
return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
921916

922917
@final
923918
def value_counts(

pandas/core/construction.py

+28-12
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,6 @@
5555
ABCDataFrame,
5656
ABCExtensionArray,
5757
ABCIndex,
58-
ABCPandasArray,
59-
ABCRangeIndex,
6058
ABCSeries,
6159
)
6260
from pandas.core.dtypes.missing import isna
@@ -379,6 +377,21 @@ def array(
379377
return PandasArray._from_sequence(data, dtype=dtype, copy=copy)
380378

381379

380+
_typs = frozenset(
381+
{
382+
"index",
383+
"rangeindex",
384+
"multiindex",
385+
"datetimeindex",
386+
"timedeltaindex",
387+
"periodindex",
388+
"categoricalindex",
389+
"intervalindex",
390+
"series",
391+
}
392+
)
393+
394+
382395
@overload
383396
def extract_array(
384397
obj: Series | Index, extract_numpy: bool = ..., extract_range: bool = ...
@@ -438,19 +451,22 @@ def extract_array(
438451
>>> extract_array(pd.Series([1, 2, 3]), extract_numpy=True)
439452
array([1, 2, 3])
440453
"""
441-
if isinstance(obj, (ABCIndex, ABCSeries)):
442-
if isinstance(obj, ABCRangeIndex):
454+
typ = getattr(obj, "_typ", None)
455+
if typ in _typs:
456+
# i.e. isinstance(obj, (ABCIndex, ABCSeries))
457+
if typ == "rangeindex":
443458
if extract_range:
444-
return obj._values
445-
# https://github.com/python/mypy/issues/1081
446-
# error: Incompatible return value type (got "RangeIndex", expected
447-
# "Union[T, Union[ExtensionArray, ndarray[Any, Any]]]")
448-
return obj # type: ignore[return-value]
459+
# error: "T" has no attribute "_values"
460+
return obj._values # type: ignore[attr-defined]
461+
return obj
449462

450-
return obj._values
463+
# error: "T" has no attribute "_values"
464+
return obj._values # type: ignore[attr-defined]
451465

452-
elif extract_numpy and isinstance(obj, ABCPandasArray):
453-
return obj.to_numpy()
466+
elif extract_numpy and typ == "npy_extension":
467+
# i.e. isinstance(obj, ABCPandasArray)
468+
# error: "T" has no attribute "to_numpy"
469+
return obj.to_numpy() # type: ignore[attr-defined]
454470

455471
return obj
456472

pandas/core/generic.py

+19-4
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@
103103
validate_inclusive,
104104
)
105105

106+
from pandas.core.dtypes.astype import astype_is_view
106107
from pandas.core.dtypes.common import (
107108
ensure_object,
108109
ensure_platform_int,
@@ -1513,6 +1514,13 @@ def bool(self) -> bool_t:
15131514
>>> pd.DataFrame({'col': [False]}).bool()
15141515
False
15151516
"""
1517+
1518+
warnings.warn(
1519+
f"{type(self).__name__}.bool is now deprecated and will be removed "
1520+
"in future version of pandas",
1521+
FutureWarning,
1522+
stacklevel=find_stack_level(),
1523+
)
15161524
v = self.squeeze()
15171525
if isinstance(v, (bool, np.bool_)):
15181526
return bool(v)
@@ -2005,10 +2013,17 @@ def empty(self) -> bool_t:
20052013
def __array__(self, dtype: npt.DTypeLike | None = None) -> np.ndarray:
20062014
values = self._values
20072015
arr = np.asarray(values, dtype=dtype)
2008-
if arr is values and using_copy_on_write():
2009-
# TODO(CoW) also properly handle extension dtypes
2010-
arr = arr.view()
2011-
arr.flags.writeable = False
2016+
if (
2017+
astype_is_view(values.dtype, arr.dtype)
2018+
and using_copy_on_write()
2019+
and self._mgr.is_single_block
2020+
):
2021+
# Check if both conversions can be done without a copy
2022+
if astype_is_view(self.dtypes.iloc[0], values.dtype) and astype_is_view(
2023+
values.dtype, arr.dtype
2024+
):
2025+
arr = arr.view()
2026+
arr.flags.writeable = False
20122027
return arr
20132028

20142029
@final

pandas/core/groupby/generic.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -720,7 +720,7 @@ def value_counts(
720720
llab = lambda lab, inc: lab[inc]
721721
else:
722722
# lab is a Categorical with categories an IntervalIndex
723-
cat_ser = cut(Series(val), bins, include_lowest=True)
723+
cat_ser = cut(Series(val, copy=False), bins, include_lowest=True)
724724
cat_obj = cast("Categorical", cat_ser._values)
725725
lev = cat_obj.categories
726726
lab = lev.take(

pandas/core/groupby/groupby.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1570,7 +1570,7 @@ def _agg_py_fallback(
15701570

15711571
if values.ndim == 1:
15721572
# For DataFrameGroupBy we only get here with ExtensionArray
1573-
ser = Series(values)
1573+
ser = Series(values, copy=False)
15741574
else:
15751575
# We only get here with values.dtype == object
15761576
# TODO: special case not needed with ArrayManager

pandas/core/indexes/accessors.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ def _delegate_property_get(self, name: str): # type: ignore[override]
105105
index = self.orig.index
106106
else:
107107
index = self._parent.index
108-
# return the result as a Series, which is by definition a copy
108+
# return the result as a Series
109109
result = Series(result, index=index, name=self.name).__finalize__(self._parent)
110110

111111
# setting this object will show a SettingWithCopyWarning/Error

0 commit comments

Comments
 (0)