Skip to content

BUG: DataFrame aggregation methods ignore numeric_only= when level= is specified #40788

Closed
@TheNeuralBit

Description

@TheNeuralBit
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

notebook

In [1]: import pandas as pd                                                                                         
   ...: df = pd.DataFrame({                                                                                         
   ...:     'int': [1,2,3],                               
   ...:     'flt': [3.14,1.0,2.0],                        
   ...:     'str': ['z', 'a', 'b'],                       
   ...:     'bool': [True, False, True],                  
   ...: }, index=['a','a','b'])                           
   ...: df                                                
Out[1]:                                                   
   int   flt str   bool                                   
a    1  3.14   z   True                                   
a    2  1.00   a  False                                   
b    3  2.00   b   True                                   
                                                          
In [2]: df.dtypes                                         
Out[2]:                                                   
int       int64                                           
flt     float64                                           
str      object                                           
bool       bool                                           
dtype: object                                             

In [3]: df.sum(numeric_only=True) # WAI without level= arg
Out[3]: 
int     6.00
flt     6.14
bool    2.00
dtype: float64

In [4]: df.sum()
Out[4]: 
int        6
flt     6.14
str      zab
bool       2
dtype: object

In [5]: df.sum(level=0, numeric_only=None) # behaves as if numeric_only is True no matter the input
Out[5]: 
   int   flt  bool
a    3  4.14     1
b    3  2.00     1

In [6]: df.sum(level=0, numeric_only=True)
Out[6]: 
   int   flt  bool
a    3  4.14     1
b    3  2.00     1

In [8]: df.max(level=0, numeric_only=None) # behaves as if numeric_only is None no matter the input
Out[8]: 
   int   flt str  bool
a    2  3.14   z  True
b    3  2.00   b  True

In [9]: df.max(level=0, numeric_only=True)
Out[9]: 
   int   flt str  bool
a    2  3.14   z  True
b    3  2.00   b  True

Problem description

The documentation for numeric aggregation methods on DataFrame (mean, median, sum, max, ..) indicates that numeric_only will drop all but float, int, and boolean columns. This works as expected in the traditional mode, but the argument seems to be ignored when doing a grouped aggregation via the level= arg. Some methods (e.g. sum) behave as if numeric_only is always True, some (e.g. max) behave as if its always None.

Expected Output

I'd expect that str would be dropped iff numeric_only=True, all other examples would include the str column.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : f2c8480 python : 3.8.6.final.0 python-bits : 64 OS : Linux OS-release : 5.7.17-1rodete5-amd64 Version : #1 SMP Debian 5.7.17-1rodete5 (2021-01-08) machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.2.3
numpy : 1.19.5
pytz : 2019.3
dateutil : 2.8.1
pip : 20.2.1
setuptools : 49.2.1
Cython : 0.29.13
pytest : 4.6.11
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.20
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions