-
Notifications
You must be signed in to change notification settings - Fork 367
Fix GroupBy apply, filter, and head to ignore temp columns when ops from different DataFrames. #1488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GroupBy apply, filter, and head to ignore temp columns when ops from different DataFrames. #1488
Conversation
…rom different DataFrames.
| agg_columns = [ | ||
| kdf._kser_for(label) | ||
| for label in kdf._internal.column_labels | ||
| if label not in self._ignore_column_labels | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
head has another issue. I'll submit a following PR and add tests there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I submitted #1490.
| self._groupkeys = by | ||
| self._groupkeys_scols = [s.spark_column for s in self._groupkeys] | ||
| self._as_index = as_index | ||
| self._ignore_column_labels = ignore_column_labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no big deal but what about renaming it column_labels_to_exlcude or tmp_column_labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename it in #1491.
|
I will merge this to unblock but I think we should have a separate class that contained grouping related columns .. it's getting very, very complicated to read .. |
|
Sorry I wasn't clear. I agree with the structures and change here. I was talking about a different idea and the fact that we have many grouping related attributes that might have to group up separately. |
Fixing
GroupBy.apply,filter, andheadto ignore temp columns when ops from different DataFrames, otherwise the result includes the temp columns.