Skip to content

Conversation

@sllynn
Copy link
Contributor

@sllynn sllynn commented Aug 4, 2020

(supersedes previous #1697)

This PR attempts to address an issue I see with column selection of Koalas DataFrames. Pandas allows use of a Pandas Index object for selecting columns, e.g.

column_mask = entity.df.columns.isin([variable.id for variable in entity.variables])
index_cols = entity.df.columns[column_mask] 
dtypes = entity.df[index_cols].dtypes.astype(str).to_dict() 

will work fine in Pandas but fail with Koalas as the type check in the DataFrame's getitem method does not include the Pandas Index type.

The changes here extend column indexing to all list like objects.

I've added to the test_dataframe test to reflect.

@sllynn
Copy link
Contributor Author

sllynn commented Aug 4, 2020

OK, this breaks the Koalas Dataframe groupby logic as when the tuple representing a column name for grouping is passed in to getitem this now is wrapped in a list and getitem returns a Dataframe, rather than a Series as expected.

@sllynn sllynn self-assigned this Aug 4, 2020
@sllynn sllynn requested a review from ueshin August 4, 2020 16:49
Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM.

@ueshin
Copy link
Collaborator

ueshin commented Aug 4, 2020

Thanks! merging.

@ueshin ueshin merged commit 8c840d2 into databricks:master Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants