Bug: GroupBy raising error with None in first level of MultiIndex #47351

phofl · 2022-06-14T18:20:28Z

closes BUG: Series.groupby fails when grouping on MultiIndex with nulls in first level #47348 (Replace xxxx with the Github issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@rhshadrach Would you mind having a look? That key=None matches seems to be an accident to me.

rhshadrach

A couple of thoughts below, I'm going to take another look in the next day or two.

rhshadrach · 2022-06-14T23:37:27Z

pandas/core/groupby/grouper.py

+        if key is None:
+            return False


With this change the following (which I think is technically valid) now raises:

df = DataFrame({None: [1, 1, 2, 2], 'b': [1, 1, 2, 3], 'c': [4, 5, 6, 7]}) print(df.groupby(by=[None]).sum()) # Without this change: # b c # 1 2 9 # 2 5 13

A few other thoughts...

It looks like this method is the only use of _is_label_like which explicitly excludes None from being "label-like".

Also, in our tests the items.get_loc(key) is never successful in the case of SeriesGroupBy (obj.ndim == 1). In fact, I'm not sure why you'd be looking in the index for the key except maybe axis=1. But axis=1 seems useless when working with a Series.

Conerted to draft for now. Can not think of a reason either why we would check the index for a Series

I think the right thing to do here is to have if obj.ndim == 1: return False inside the not-label-like block. The current behavior of finding something in the Series index leads to a scalar grouper that will always raise because it's not one dimensional:

class A: def __str__(self): return 'cA' a = A() ser = pd.DataFrame({'a': [1, 1, a], 'b': [3, 4, 5]}).set_index('a')['b'] gb = ser.groupby([a])

raises ValueError: Grouper for 'cA' not 1-dimensional, the grouper in this case being the numpy int 5.

Added this and the testcase you posted above

rhshadrach

lgtm, just a nit.

rhshadrach · 2022-06-24T18:36:57Z

doc/source/whatsnew/v1.5.0.rst

@@ -917,6 +917,7 @@ Groupby/resample/rolling
 - Bug in :meth:`DataFrameGroupby.cumsum` with ``skipna=False`` giving incorrect results (:issue:`46216`)
 - Bug in :meth:`.GroupBy.cumsum` with ``timedelta64[ns]`` dtype failing to recognize ``NaT`` as a null value (:issue:`46216`)
 - Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` with nullable dtypes incorrectly altering the original data in place (:issue:`46220`)
+- Bug in :meth:`DataFrame.GroupBy` raising error when ``None`` is in first level of :class:`MultiIndex` (:issue:`47348`)


Thx for your help here. Changed whatsnew

# Conflicts: # doc/source/whatsnew/v1.5.0.rst

rhshadrach

lgtm

mroeschke · 2022-06-27T22:55:45Z

Thanks @phofl and @rhshadrach

…ndas-dev#47351) * Bug: GroupBy raising error with None in first level of MultiIndex * Add test * Change whatsnew

Bug: GroupBy raising error with None in first level of MultiIndex

66ea20e

rhshadrach reviewed Jun 14, 2022

View reviewed changes

phofl marked this pull request as draft June 15, 2022 07:28

phofl added 2 commits June 24, 2022 16:44

Merge remote-tracking branch 'upstream/main' into 47348

f644ecb

Add test

c48478d

rhshadrach requested changes Jun 24, 2022

View reviewed changes

Change whatsnew

0fae4f6

phofl marked this pull request as ready for review June 25, 2022 14:27

Merge remote-tracking branch 'upstream/main' into 47348

4bdb382

# Conflicts: # doc/source/whatsnew/v1.5.0.rst

rhshadrach approved these changes Jun 25, 2022

View reviewed changes

mroeschke added the Groupby label Jun 27, 2022

mroeschke added this to the 1.5 milestone Jun 27, 2022

mroeschke approved these changes Jun 27, 2022

View reviewed changes

mroeschke merged commit 2bcbd25 into pandas-dev:main Jun 27, 2022

phofl deleted the 47348 branch June 28, 2022 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: GroupBy raising error with None in first level of MultiIndex #47351

Bug: GroupBy raising error with None in first level of MultiIndex #47351

phofl commented Jun 14, 2022

rhshadrach left a comment

rhshadrach Jun 14, 2022 •

edited

Loading

phofl Jun 15, 2022

rhshadrach Jun 16, 2022

phofl Jun 24, 2022

rhshadrach left a comment

rhshadrach Jun 24, 2022

phofl Jun 25, 2022

rhshadrach left a comment

mroeschke commented Jun 27, 2022

Bug: GroupBy raising error with None in first level of MultiIndex #47351

Bug: GroupBy raising error with None in first level of MultiIndex #47351

Conversation

phofl commented Jun 14, 2022

rhshadrach left a comment

Choose a reason for hiding this comment

rhshadrach Jun 14, 2022 • edited Loading

Choose a reason for hiding this comment

phofl Jun 15, 2022

Choose a reason for hiding this comment

rhshadrach Jun 16, 2022

Choose a reason for hiding this comment

phofl Jun 24, 2022

Choose a reason for hiding this comment

rhshadrach left a comment

Choose a reason for hiding this comment

rhshadrach Jun 24, 2022

Choose a reason for hiding this comment

phofl Jun 25, 2022

Choose a reason for hiding this comment

rhshadrach left a comment

Choose a reason for hiding this comment

mroeschke commented Jun 27, 2022

rhshadrach Jun 14, 2022 •

edited

Loading