Skip to content

Commit 3f5fa5f

Browse files
author
MomIsBestFriend
committed
Merge remote-tracking branch 'upstream/master' into STY-repr-batch-5
2 parents 8047860 + 83812e1 commit 3f5fa5f

26 files changed

+320
-212
lines changed

doc/source/user_guide/integer_na.rst

+27-9
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,7 @@ numbers.
2525

2626
Pandas can represent integer data with possibly missing values using
2727
:class:`arrays.IntegerArray`. This is an :ref:`extension types <extending.extension-types>`
28-
implemented within pandas. It is not the default dtype for integers, and will not be inferred;
29-
you must explicitly pass the dtype into :meth:`array` or :class:`Series`:
28+
implemented within pandas.
3029

3130
.. ipython:: python
3231
@@ -50,24 +49,43 @@ NumPy array.
5049
You can also pass the list-like object to the :class:`Series` constructor
5150
with the dtype.
5251

53-
.. ipython:: python
52+
.. warning::
5453

55-
s = pd.Series([1, 2, np.nan], dtype="Int64")
56-
s
54+
Currently :meth:`pandas.array` and :meth:`pandas.Series` use different
55+
rules for dtype inference. :meth:`pandas.array` will infer a nullable-
56+
integer dtype
5757

58-
By default (if you don't specify ``dtype``), NumPy is used, and you'll end
59-
up with a ``float64`` dtype Series:
58+
.. ipython:: python
6059
61-
.. ipython:: python
60+
pd.array([1, None])
61+
pd.array([1, 2])
62+
63+
For backwards-compatibility, :class:`Series` infers these as either
64+
integer or float dtype
65+
66+
.. ipython:: python
67+
68+
pd.Series([1, None])
69+
pd.Series([1, 2])
6270
63-
pd.Series([1, 2, np.nan])
71+
We recommend explicitly providing the dtype to avoid confusion.
72+
73+
.. ipython:: python
74+
75+
pd.array([1, None], dtype="Int64")
76+
pd.Series([1, None], dtype="Int64")
77+
78+
In the future, we may provide an option for :class:`Series` to infer a
79+
nullable-integer dtype.
6480

6581
Operations involving an integer array will behave similar to NumPy arrays.
6682
Missing values will be propagated, and the data will be coerced to another
6783
dtype if needed.
6884

6985
.. ipython:: python
7086
87+
s = pd.Series([1, 2, None], dtype="Int64")
88+
7189
# arithmetic
7290
s + 1
7391

doc/source/user_guide/style.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -677,7 +677,7 @@
677677
"cell_type": "markdown",
678678
"metadata": {},
679679
"source": [
680-
"Notice that you're able share the styles even though they're data aware. The styles are re-evaluated on the new DataFrame they've been `use`d upon."
680+
"Notice that you're able to share the styles even though they're data aware. The styles are re-evaluated on the new DataFrame they've been `use`d upon."
681681
]
682682
},
683683
{

doc/source/whatsnew/v1.0.0.rst

+54-1
Original file line numberDiff line numberDiff line change
@@ -303,6 +303,58 @@ The following methods now also correctly output values for unobserved categories
303303
304304
df.groupby(["cat_1", "cat_2"], observed=False)["value"].count()
305305
306+
:meth:`pandas.array` inference changes
307+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
308+
309+
:meth:`pandas.array` now infers pandas' new extension types in several cases (:issue:`29791`):
310+
311+
1. String data (including missing values) now returns a :class:`arrays.StringArray`.
312+
2. Integer data (including missing values) now returns a :class:`arrays.IntegerArray`.
313+
3. Boolean data (including missing values) now returns the new :class:`arrays.BooleanArray`
314+
315+
*pandas 0.25.x*
316+
317+
.. code-block:: python
318+
319+
>>> pd.array(["a", None])
320+
<PandasArray>
321+
['a', None]
322+
Length: 2, dtype: object
323+
324+
>>> pd.array([1, None])
325+
<PandasArray>
326+
[1, None]
327+
Length: 2, dtype: object
328+
329+
330+
*pandas 1.0.0*
331+
332+
.. ipython:: python
333+
334+
pd.array(["a", None])
335+
pd.array([1, None])
336+
337+
As a reminder, you can specify the ``dtype`` to disable all inference.
338+
339+
By default :meth:`Categorical.min` now returns the minimum instead of np.nan
340+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
341+
342+
When :class:`Categorical` contains ``np.nan``,
343+
:meth:`Categorical.min` no longer return ``np.nan`` by default (skipna=True) (:issue:`25303`)
344+
345+
*pandas 0.25.x*
346+
347+
.. code-block:: ipython
348+
349+
In [1]: pd.Categorical([1, 2, np.nan], ordered=True).min()
350+
Out[1]: nan
351+
352+
353+
*pandas 1.0.0*
354+
355+
.. ipython:: python
356+
357+
pd.Categorical([1, 2, np.nan], ordered=True).min()
306358
307359
.. _whatsnew_1000.api_breaking.deps:
308360

@@ -388,7 +440,6 @@ Other API changes
388440
- :meth:`Series.dropna` has dropped its ``**kwargs`` argument in favor of a single ``how`` parameter.
389441
Supplying anything else than ``how`` to ``**kwargs`` raised a ``TypeError`` previously (:issue:`29388`)
390442
- When testing pandas, the new minimum required version of pytest is 5.0.1 (:issue:`29664`)
391-
-
392443

393444

394445
.. _whatsnew_1000.api.documentation:
@@ -410,6 +461,8 @@ Deprecations
410461
- :func:`is_extension_type` is deprecated, :func:`is_extension_array_dtype` should be used instead (:issue:`29457`)
411462
- :func:`eval` keyword argument "truediv" is deprecated and will be removed in a future version (:issue:`29812`)
412463
- :meth:`Categorical.take_nd` is deprecated, use :meth:`Categorical.take` instead (:issue:`27745`)
464+
- The parameter ``numeric_only`` of :meth:`Categorical.min` and :meth:`Categorical.max` is deprecated and replaced with ``skipna`` (:issue:`25303`)
465+
-
413466

414467
.. _whatsnew_1000.prior_deprecations:
415468

pandas/_libs/lib.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -1313,7 +1313,7 @@ def infer_dtype(value: object, skipna: bool = True) -> str:
13131313

13141314
elif isinstance(val, str):
13151315
if is_string_array(values, skipna=skipna):
1316-
return 'string'
1316+
return "string"
13171317

13181318
elif isinstance(val, bytes):
13191319
if is_bytes_array(values, skipna=skipna):

pandas/core/arrays/base.py

+7-6
Original file line numberDiff line numberDiff line change
@@ -451,7 +451,9 @@ def _values_for_argsort(self) -> np.ndarray:
451451
# Note: this is used in `ExtensionArray.argsort`.
452452
return np.array(self)
453453

454-
def argsort(self, ascending=True, kind="quicksort", *args, **kwargs):
454+
def argsort(
455+
self, ascending: bool = True, kind: str = "quicksort", *args, **kwargs
456+
) -> np.ndarray:
455457
"""
456458
Return the indices that would sort this array.
457459
@@ -467,7 +469,7 @@ def argsort(self, ascending=True, kind="quicksort", *args, **kwargs):
467469
468470
Returns
469471
-------
470-
index_array : ndarray
472+
ndarray
471473
Array of indices that sort ``self``. If NaN values are contained,
472474
NaN values are placed at the end.
473475
@@ -1198,10 +1200,9 @@ def _maybe_convert(arr):
11981200

11991201
if op.__name__ in {"divmod", "rdivmod"}:
12001202
a, b = zip(*res)
1201-
res = _maybe_convert(a), _maybe_convert(b)
1202-
else:
1203-
res = _maybe_convert(res)
1204-
return res
1203+
return _maybe_convert(a), _maybe_convert(b)
1204+
1205+
return _maybe_convert(res)
12051206

12061207
op_name = ops._get_op_name(op, True)
12071208
return set_function_name(_binop, op_name, cls)

pandas/core/arrays/categorical.py

+20-18
Original file line numberDiff line numberDiff line change
@@ -2123,7 +2123,8 @@ def _reduce(self, name, axis=0, **kwargs):
21232123
raise TypeError(f"Categorical cannot perform the operation {name}")
21242124
return func(**kwargs)
21252125

2126-
def min(self, numeric_only=None, **kwargs):
2126+
@deprecate_kwarg(old_arg_name="numeric_only", new_arg_name="skipna")
2127+
def min(self, skipna=True):
21272128
"""
21282129
The minimum value of the object.
21292130
@@ -2139,17 +2140,18 @@ def min(self, numeric_only=None, **kwargs):
21392140
min : the minimum of this `Categorical`
21402141
"""
21412142
self.check_for_ordered("min")
2142-
if numeric_only:
2143-
good = self._codes != -1
2144-
pointer = self._codes[good].min(**kwargs)
2145-
else:
2146-
pointer = self._codes.min(**kwargs)
2147-
if pointer == -1:
2148-
return np.nan
2143+
good = self._codes != -1
2144+
if not good.all():
2145+
if skipna:
2146+
pointer = self._codes[good].min()
2147+
else:
2148+
return np.nan
21492149
else:
2150-
return self.categories[pointer]
2150+
pointer = self._codes.min()
2151+
return self.categories[pointer]
21512152

2152-
def max(self, numeric_only=None, **kwargs):
2153+
@deprecate_kwarg(old_arg_name="numeric_only", new_arg_name="skipna")
2154+
def max(self, skipna=True):
21532155
"""
21542156
The maximum value of the object.
21552157
@@ -2165,15 +2167,15 @@ def max(self, numeric_only=None, **kwargs):
21652167
max : the maximum of this `Categorical`
21662168
"""
21672169
self.check_for_ordered("max")
2168-
if numeric_only:
2169-
good = self._codes != -1
2170-
pointer = self._codes[good].max(**kwargs)
2171-
else:
2172-
pointer = self._codes.max(**kwargs)
2173-
if pointer == -1:
2174-
return np.nan
2170+
good = self._codes != -1
2171+
if not good.all():
2172+
if skipna:
2173+
pointer = self._codes[good].max()
2174+
else:
2175+
return np.nan
21752176
else:
2176-
return self.categories[pointer]
2177+
pointer = self._codes.max()
2178+
return self.categories[pointer]
21772179

21782180
def mode(self, dropna=True):
21792181
"""

0 commit comments

Comments
 (0)