-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Bug: GroupBy raising error with None in first level of MultiIndex #47351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of thoughts below, I'm going to take another look in the next day or two.
pandas/core/groupby/grouper.py
Outdated
if key is None: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change the following (which I think is technically valid) now raises:
df = DataFrame({None: [1, 1, 2, 2], 'b': [1, 1, 2, 3], 'c': [4, 5, 6, 7]})
print(df.groupby(by=[None]).sum())
# Without this change:
# b c
# 1 2 9
# 2 5 13
A few other thoughts...
It looks like this method is the only use of _is_label_like
which explicitly excludes None
from being "label-like".
Also, in our tests the items.get_loc(key)
is never successful in the case of SeriesGroupBy (obj.ndim == 1). In fact, I'm not sure why you'd be looking in the index for the key except maybe axis=1. But axis=1 seems useless when working with a Series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conerted to draft for now. Can not think of a reason either why we would check the index for a Series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the right thing to do here is to have if obj.ndim == 1: return False
inside the not-label-like block. The current behavior of finding something in the Series index leads to a scalar grouper that will always raise because it's not one dimensional:
class A:
def __str__(self):
return 'cA'
a = A()
ser = pd.DataFrame({'a': [1, 1, a], 'b': [3, 4, 5]}).set_index('a')['b']
gb = ser.groupby([a])
raises ValueError: Grouper for 'cA' not 1-dimensional
, the grouper in this case being the numpy int 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this and the testcase you posted above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just a nit.
doc/source/whatsnew/v1.5.0.rst
Outdated
@@ -917,6 +917,7 @@ Groupby/resample/rolling | |||
- Bug in :meth:`DataFrameGroupby.cumsum` with ``skipna=False`` giving incorrect results (:issue:`46216`) | |||
- Bug in :meth:`.GroupBy.cumsum` with ``timedelta64[ns]`` dtype failing to recognize ``NaT`` as a null value (:issue:`46216`) | |||
- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` with nullable dtypes incorrectly altering the original data in place (:issue:`46220`) | |||
- Bug in :meth:`DataFrame.GroupBy` raising error when ``None`` is in first level of :class:`MultiIndex` (:issue:`47348`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.groupby
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for your help here. Changed whatsnew
# Conflicts: # doc/source/whatsnew/v1.5.0.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @phofl and @rhshadrach |
…ndas-dev#47351) * Bug: GroupBy raising error with None in first level of MultiIndex * Add test * Change whatsnew
Series.groupby
fails when grouping onMultiIndex
with nulls in first level #47348 (Replace xxxx with the Github issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.@rhshadrach Would you mind having a look? That
key=None
matches seems to be an accident to me.