Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -904,6 +904,7 @@ Groupby/resample/rolling
- Bug in :meth:`DataFrameGroupby.cumsum` with ``skipna=False`` giving incorrect results (:issue:`46216`)
- Bug in :meth:`.GroupBy.cumsum` with ``timedelta64[ns]`` dtype failing to recognize ``NaT`` as a null value (:issue:`46216`)
- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` with nullable dtypes incorrectly altering the original data in place (:issue:`46220`)
- Bug in :meth:`DataFrame.GroupBy` raising error when ``None`` is in first level of :class:`MultiIndex` (:issue:`47348`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.groupby

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for your help here. Changed whatsnew

- Bug in :meth:`GroupBy.cummax` with ``int64`` dtype with leading value being the smallest possible int64 (:issue:`46382`)
- Bug in :meth:`GroupBy.max` with empty groups and ``uint64`` dtype incorrectly raising ``RuntimeError`` (:issue:`46408`)
- Bug in :meth:`.GroupBy.apply` would fail when ``func`` was a string and args or kwargs were supplied (:issue:`46479`)
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/groupby/grouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -835,6 +835,9 @@ def get_grouper(

# if the actual grouper should be obj[key]
def is_in_axis(key) -> bool:
if key is None:
return False
Copy link
Member

@rhshadrach rhshadrach Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change the following (which I think is technically valid) now raises:

df = DataFrame({None: [1, 1, 2, 2], 'b': [1, 1, 2, 3], 'c': [4, 5, 6, 7]})
print(df.groupby(by=[None]).sum())

# Without this change:
#    b   c
# 1  2   9
# 2  5  13

A few other thoughts...

It looks like this method is the only use of _is_label_like which explicitly excludes None from being "label-like".

Also, in our tests the items.get_loc(key) is never successful in the case of SeriesGroupBy (obj.ndim == 1). In fact, I'm not sure why you'd be looking in the index for the key except maybe axis=1. But axis=1 seems useless when working with a Series.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conerted to draft for now. Can not think of a reason either why we would check the index for a Series

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the right thing to do here is to have if obj.ndim == 1: return False inside the not-label-like block. The current behavior of finding something in the Series index leads to a scalar grouper that will always raise because it's not one dimensional:

class A:
    def __str__(self):
        return 'cA'
a = A()
ser = pd.DataFrame({'a': [1, 1, a], 'b': [3, 4, 5]}).set_index('a')['b']
gb = ser.groupby([a])

raises ValueError: Grouper for 'cA' not 1-dimensional, the grouper in this case being the numpy int 5.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this and the testcase you posted above


if not _is_label_like(key):
# items -> .columns for DataFrame, .index for Series
items = obj.axes[-1]
Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2776,3 +2776,14 @@ def test_by_column_values_with_same_starting_value():
).set_index("Name")

tm.assert_frame_equal(result, expected_result)


def test_groupby_none_in_first_mi_level():
# GH#47348
arr = [[None, 1, 0, 1], [2, 3, 2, 3]]
ser = Series(1, index=MultiIndex.from_arrays(arr, names=["a", "b"]))
result = ser.groupby(level=[0, 1]).sum()
expected = Series(
[1, 2], MultiIndex.from_tuples([(0.0, 2), (1.0, 3)], names=["a", "b"])
)
tm.assert_series_equal(result, expected)