-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: DataFrameGroupBy.sum() drops column names when applied to an empty dataframe #46375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can confirm this issue exists on main. The issue also arises with |
It looks like this has something to do with the note: I left off In [1]: import pandas as pd, numpy as np
In [2]: df = pd.DataFrame(columns=['a', 'b', 'c'])
In [3]: df.groupby('a').sum()
Out[3]:
Empty DataFrame
Columns: []
Index: []
In [4]: df.groupby('a').min()
Out[4]:
Empty DataFrame
Columns: [b, c]
Index: []
In [5]: df.groupby('a').min(numeric_only=True)
Out[5]:
Empty DataFrame
Columns: []
Index: []
In [6]: df.groupby('a').sum(numeric_only=False)
Out[6]:
Empty DataFrame
Columns: [b, c]
Index: [] I am not sure if the default behavior of these methods is intended to be different, so I'll leave it to the maintainers to direct closing this issue. |
take |
take |
@ryansdowning's examples are very helpful! I definitely agree that this is caused by In [0]: df = pd.DataFrame(
{
"a": [0, 0, 1, 1],
"b": [1, "x", 2, "y"],
"c": [1, 1, 2, 2],
}
)
Out [0]:
| a | b | c
0 | 1 | 1
0 | x | 1
1 | 2 | 2
1 | y | 2
In [1]: df.groupby('a').first(numeric_only=False)
Out [1]:
| b | c
a _______
0 1 | 1
1 2 | 2
In [2]: df.groupby('a').first(numeric_only=None)
Out [2]:
| b | c
a _______
0 1 | 1
1 2 | 2 while In [3]: df.groupby('a').first(numeric_only=True)
Out [3]:
| c
a ____
0 1
1 2 It looks like any columns whose elements aren't strictly numeric are getting pruned. Edit: actually, I'm not sure if that's the correct behavior for |
Will this be fixed automatically in 2.0 when the numeric_only default/behavior changes? |
@rhshadrach is this closed by the numeric_only deprecation? |
Yes - closing. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Only first column (groupby key) is preserved:
Expected Behavior
All columns of original dataframe should be preserved:
Installed Versions
pandas : 1.1.4
numpy : 1.19.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 47.3.1.post20210215
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.30.0
sphinx : 3.0.3
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : 0.10.1
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.17.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 2.0.0
pytables : None
pyxlsb : 1.0.9
s3fs : None
scipy : 1.5.4
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2
The text was updated successfully, but these errors were encountered: