Skip to content

Groupby on multiindex with missing data in group keys raises IndexError #20519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pluspku opened this issue Mar 28, 2018 · 3 comments · Fixed by #28097
Closed

Groupby on multiindex with missing data in group keys raises IndexError #20519

pluspku opened this issue Mar 28, 2018 · 3 comments · Fixed by #28097
Labels
Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex
Milestone

Comments

@pluspku
Copy link

pluspku commented Mar 28, 2018

Code Sample, a copy-pastable example if possible

foo= pd.DataFrame([['x', np.nan, 1]], columns = ['A', 'B', 'C']).set_index(['A', 'B'])
foo.groupby(level = ['A', 'B']).C.sum()
IndexError: cannot do a non-empty take from an empty axes.

# while this is expected, as a comparison
bar = pd.DataFrame([['x', np.nan, 1]], columns = ['A', 'B', 'C'])
bar.groupby(level = ['A', 'B']).C.sum()
Series([], Name: C, dtype: int64)

Problem description

It would be better if pandas can either return the same result as the 'bar' case, or raise another meaningful exception and document this case somewhere (e.g. here))

Expected Output

Series([], Name: C, dtype: int64)

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-229.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: None
setuptools: None
Cython: None
numpy: 1.10.4
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: None
numexpr: 2.4.6
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: 0.9.2
apiclient: 1.2
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: 2.22.1
pandas_datareader: None

@TomAugspurger TomAugspurger added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Mar 28, 2018
@TomAugspurger
Copy link
Contributor

Yeah, I think this should return an empty series.

@jreback jreback added this to the Next Major Release milestone Mar 30, 2018
@proost
Copy link
Contributor

proost commented Aug 15, 2019

May i take a look?

@minhoryang
Copy link

minhoryang commented Aug 21, 2019

There are 2 IndexError: cannot do a non-empty take from an empty axes. in codebase:

Why don't we create a new Exception such as EmptyAxesTakeFailedIndexError from this IndexError and just catch this at groupers? (Same strategy with #27945, but narrow down.)

proost pushed a commit to proost/pandas that referenced this issue Aug 22, 2019
proost pushed a commit to proost/pandas that referenced this issue Aug 22, 2019
proost added a commit to proost/pandas that referenced this issue Sep 10, 2019
…exError (pandas-dev#20519)

* If all index values in some level is NA, fill with NaN
proost added a commit to proost/pandas that referenced this issue Sep 10, 2019
…exError (pandas-dev#20519)

* If all index values in some level is NA, fill with NaN
proost added a commit to proost/pandas that referenced this issue Sep 10, 2019
…exError (pandas-dev#20519)

* If all index values in some level is NA, fill with NaN
proost added a commit to proost/pandas that referenced this issue Sep 14, 2019
IndexError (pandas-dev#20519)

* if all the values in a level of a MultiIndex were missing, fill with
numpy nan
proost added a commit to proost/pandas that referenced this issue Sep 14, 2019
IndexError (pandas-dev#20519)

* if all the values in a level of a MultiIndex were missing, fill with
numpy nan
proost added a commit to proost/pandas that referenced this issue Sep 14, 2019
IndexError (pandas-dev#20519)

*if all the values in a level of a MultiIndex were missing, fill with
numpy nan
proost added a commit to proost/pandas that referenced this issue Sep 14, 2019
…exError (pandas-dev#20519)

* if all the values in a level of a MultiIndex were missing, fill with numpy nan
proost added a commit to proost/pandas that referenced this issue Sep 14, 2019
…exError (pandas-dev#20519)

* if all the values in a level of a MultiIndex were missing, fill with numpy nan
proost added a commit to proost/pandas that referenced this issue Sep 14, 2019
…exError (pandas-dev#20519)

* if all the values in a level of a MultiIndex were missing, fill with numpy nan
proost added a commit to proost/pandas that referenced this issue Sep 14, 2019
…exError (pandas-dev#20519)

* if all the values in a level of a MultiIndex were missing, fill with numpy nan
proost added a commit to proost/pandas that referenced this issue Sep 14, 2019
…exError (pandas-dev#20519)

* if all the values in a level of a MultiIndex were missing, fill with numpy nan
proost added a commit to proost/pandas that referenced this issue Sep 18, 2019
…exError (pandas-dev#20519)

* if all the values in a level of a MultiIndex were missing, fill with numpy nan
proost added a commit to proost/pandas that referenced this issue Oct 22, 2019
…exError (pandas-dev#20519)

* if all the values in a level of a MultiIndex were missing, fill with numpy nan
proost added a commit to proost/pandas that referenced this issue Oct 24, 2019
…exError (pandas-dev#20519)

* if all the values in a level of a MultiIndex were missing, fill with numpy nan
@jreback jreback modified the milestones: Contributions Welcome, 1.0 Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate MultiIndex
Projects
None yet
6 participants