Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
In [1]: import pandas as pd
...: df = pd.DataFrame({
...: 'int': [1,2,3],
...: 'flt': [3.14,1.0,2.0],
...: 'str': ['z', 'a', 'b'],
...: 'bool': [True, False, True],
...: }, index=['a','a','b'])
...: df
Out[1]:
int flt str bool
a 1 3.14 z True
a 2 1.00 a False
b 3 2.00 b True
In [2]: df.dtypes
Out[2]:
int int64
flt float64
str object
bool bool
dtype: object
In [3]: df.sum(numeric_only=True) # WAI without level= arg
Out[3]:
int 6.00
flt 6.14
bool 2.00
dtype: float64
In [4]: df.sum()
Out[4]:
int 6
flt 6.14
str zab
bool 2
dtype: object
In [5]: df.sum(level=0, numeric_only=None) # behaves as if numeric_only is True no matter the input
Out[5]:
int flt bool
a 3 4.14 1
b 3 2.00 1
In [6]: df.sum(level=0, numeric_only=True)
Out[6]:
int flt bool
a 3 4.14 1
b 3 2.00 1
In [8]: df.max(level=0, numeric_only=None) # behaves as if numeric_only is None no matter the input
Out[8]:
int flt str bool
a 2 3.14 z True
b 3 2.00 b True
In [9]: df.max(level=0, numeric_only=True)
Out[9]:
int flt str bool
a 2 3.14 z True
b 3 2.00 b True
Problem description
The documentation for numeric aggregation methods on DataFrame (mean, median, sum, max, ..) indicates that numeric_only
will drop all but float, int, and boolean columns. This works as expected in the traditional mode, but the argument seems to be ignored when doing a grouped aggregation via the level=
arg. Some methods (e.g. sum) behave as if numeric_only is always True, some (e.g. max) behave as if its always None.
Expected Output
I'd expect that str
would be dropped iff numeric_only=True
, all other examples would include the str
column.
Output of pd.show_versions()
pandas : 1.2.3
numpy : 1.19.5
pytz : 2019.3
dateutil : 2.8.1
pip : 20.2.1
setuptools : 49.2.1
Cython : 0.29.13
pytest : 4.6.11
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.20
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None