Skip to content

Groupby dictionary aggregation with renaming do not inserts 'by' columns when 'as_index=False' #2543

Closed
@dchigarev

Description

@dchigarev

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Any
  • Modin version (modin.__version__): f98a7b3
  • Python version: 3.7.5
  • Code we can use to reproduce:
import pandas
import modin.pandas as pd
from modin.pandas.test.utils import test_data, df_equals, create_test_dfs
import numpy as np
import pandas

md_df, pd_df = create_test_dfs(test_data["int_data"])

by = [md_df.columns[0], md_df.columns[1]]
agg_dict = {
    "max": (md_df.columns[2], "max"),
    "sum": (md_df.columns[3], "sum"),
}

as_index = False
md_res = md_df.groupby(by, as_index=as_index).agg(**agg_dict)
pd_res = pd_df.groupby(by, as_index=as_index).agg(**agg_dict)
df_equals(md_res, pd_res) # AssertionError: shape missmatch
Output
Traceback (most recent call last):
  File "../rofl.py", line 18, in <module>
    df_equals(md_res, pd_res)
  File "/localdisk/dchigare/repos/modin_bp/modin/pandas/test/utils.py", line 527, in df_equals
    check_categorical=False,
  File "/localdisk/dchigare/miniconda3/envs/modin_tests/lib/python3.7/site-packages/pandas/_testing.py", line 1562, in assert_frame_equal
    obj, f"{obj} shape mismatch", f"{repr(left.shape)}", f"{repr(right.shape)}",
  File "/localdisk/dchigare/miniconda3/envs/modin_tests/lib/python3.7/site-packages/pandas/_testing.py", line 1036, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame are different

DataFrame shape mismatch
[left]:  (251, 2)
[right]: (251, 4)

Describe the problem

This seems to happen because of that logic of renaming, for the renaming we're picking only columns to aggregate and discarding new inserted 'by' columns:

if relabeling_required:
result = result.iloc[:, order]
result.columns = new_columns
return result

Metadata

Metadata

Assignees

Labels

bug 🦗Something isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions