Closed
Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Any
- Modin version (
modin.__version__
): f98a7b3 - Python version: 3.7.5
- Code we can use to reproduce:
import pandas
import modin.pandas as pd
from modin.pandas.test.utils import test_data, df_equals, create_test_dfs
import numpy as np
import pandas
md_df, pd_df = create_test_dfs(test_data["int_data"])
by = [md_df.columns[0], md_df.columns[1]]
agg_dict = {
"max": (md_df.columns[2], "max"),
"sum": (md_df.columns[3], "sum"),
}
as_index = False
md_res = md_df.groupby(by, as_index=as_index).agg(**agg_dict)
pd_res = pd_df.groupby(by, as_index=as_index).agg(**agg_dict)
df_equals(md_res, pd_res) # AssertionError: shape missmatch
Output
Traceback (most recent call last):
File "../rofl.py", line 18, in <module>
df_equals(md_res, pd_res)
File "/localdisk/dchigare/repos/modin_bp/modin/pandas/test/utils.py", line 527, in df_equals
check_categorical=False,
File "/localdisk/dchigare/miniconda3/envs/modin_tests/lib/python3.7/site-packages/pandas/_testing.py", line 1562, in assert_frame_equal
obj, f"{obj} shape mismatch", f"{repr(left.shape)}", f"{repr(right.shape)}",
File "/localdisk/dchigare/miniconda3/envs/modin_tests/lib/python3.7/site-packages/pandas/_testing.py", line 1036, in raise_assert_detail
raise AssertionError(msg)
AssertionError: DataFrame are different
DataFrame shape mismatch
[left]: (251, 2)
[right]: (251, 4)
Describe the problem
This seems to happen because of that logic of renaming, for the renaming we're picking only columns to aggregate and discarding new inserted 'by' columns:
Lines 426 to 429 in f98a7b3