Rename data columns prior to filter to make sure the column names are as expected. #1283

ueshin · 2020-02-14T22:07:08Z

This is a follow-up of #1273.
The Spark column names are not always the same as its column label.
This PR is to rename data columns prior to filter to make sure the column names are as expected.

… as expected.

codecov-io · 2020-02-14T22:29:09Z

Codecov Report

Merging #1283 into master will increase coverage by <.01%.
The diff coverage is 97.33%.

@@            Coverage Diff             @@
##           master    #1283      +/-   ##
==========================================
+ Coverage   95.11%   95.11%   +<.01%     
==========================================
  Files          34       34              
  Lines        7220     7272      +52     
==========================================
+ Hits         6867     6917      +50     
- Misses        353      355       +2

Impacted Files	Coverage Δ
databricks/koalas/series.py	`96.39% <100%> (+0.02%)`	⬆️
databricks/koalas/indexing.py	`95.96% <100%> (ø)`	⬆️
databricks/koalas/groupby.py	`91.43% <100%> (ø)`	⬆️
databricks/koalas/utils.py	`95.45% <100%> (+0.1%)`	⬆️
databricks/koalas/plot.py	`94.28% <100%> (ø)`	⬆️
databricks/koalas/indexes.py	`95.9% <100%> (ø)`	⬆️
databricks/koalas/internal.py	`96.07% <100%> (+0.08%)`	⬆️
databricks/koalas/frame.py	`96.51% <96%> (-0.05%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4c87b7...24dfef9. Read the comment docs.

ueshin · 2020-02-15T00:28:11Z

databricks/koalas/frame.py


-        sdf = self._sdf.filter(expr)
-        internal = self._internal.copy(sdf=sdf)
+        data_columns = [label[0] for label in self._internal.column_labels]


If we support multi-index column later, we need to rename to fit the pandas' requirement.

ueshin · 2020-02-18T23:09:17Z

also cc @itholic

itholic · 2020-02-19T01:46:11Z

@ueshin

Thanks for cc me!

The Spark column names are not always the same as its column label.

Does it mean that there are cases where data_columns and column_labels are different?

ueshin · 2020-02-19T02:45:39Z

yes, I guess you've seen such cases several times?

itholic · 2020-02-19T03:07:42Z

@ueshin

yeah, i think i had seen before,

so i'm trying to reproduce such cases, but couldn't yet (even after column re-naming)

>>> df
     name   class  max_speed
0  falcon    bird      389.0
2  parrot    bird       24.0
3    lion  mammal       80.5
1  monkey  mammal        NaN

>>> df._internal.data_columns
['name', 'class', 'max_speed']
>>> df._internal.column_labels
[('name',), ('class',), ('max_speed',)]

>>> df.rename(columns={'name': 'renamed'}, inplace=True)
  renamed   class  max_speed
0  falcon    bird      389.0
2  parrot    bird       24.0
3    lion  mammal       80.5
1  monkey  mammal        NaN

>>> df._internal.data_columns
['renamed', 'class', 'max_speed']
>>> df._internal.column_labels
[('renamed',), ('class',), ('max_speed',)]

could you show me a simple example when you available ?

itholic · 2020-02-19T03:29:30Z

Anyway, LGTM if the cases could be happened!

ueshin · 2020-02-19T22:25:37Z

e.g.,:

>>> kdf = ks.DataFrame({('x','a'): [1,2,3], ('x','b'): [4,5,6], ('y','c'): [7,8,9]})
>>> kdf['x']
   a  b
0  1  4
1  2  5
2  3  6
>>> kdf['x'].query('a > 1')
Traceback (most recent call last):
...
pyspark.sql.utils.AnalysisException: "cannot resolve '`a`' given input columns: [__index_level_0__, (x, a), (x, b), __natural_order__]; ...

ueshin · 2020-02-19T22:38:21Z

Thanks! I'd merge this now. Please feel free to leave comments if any.

itholic · 2020-02-19T23:33:51Z

e.g.,:

>>> kdf = ks.DataFrame({('x','a'): [1,2,3], ('x','b'): [4,5,6], ('y','c'): [7,8,9]})
>>> kdf['x']
   a  b
0  1  4
1  2  5
2  3  6
>>> kdf['x'].query('a > 1')
Traceback (most recent call last):
...
pyspark.sql.utils.AnalysisException: "cannot resolve '`a`' given input columns: [__index_level_0__, (x, a), (x, b), __natural_order__]; ...

Thanks !!

Rename data columns prior to filter to make sure the column names are…

bb1134a

… as expected.

ueshin requested a review from HyukjinKwon February 14, 2020 22:07

Fix.

e4c87b7

ueshin commented Feb 15, 2020

View reviewed changes

Merge branch 'master' into query

24dfef9

ueshin merged commit a45e484 into databricks:master Feb 19, 2020

ueshin deleted the query branch February 19, 2020 22:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rename data columns prior to filter to make sure the column names are as expected. #1283

Rename data columns prior to filter to make sure the column names are as expected. #1283

Uh oh!

ueshin commented Feb 14, 2020

Uh oh!

codecov-io commented Feb 14, 2020 •

edited

Loading

Uh oh!

ueshin Feb 15, 2020

Uh oh!

ueshin commented Feb 18, 2020

Uh oh!

itholic commented Feb 19, 2020 •

edited

Loading

Uh oh!

ueshin commented Feb 19, 2020 •

edited

Loading

Uh oh!

itholic commented Feb 19, 2020 •

edited

Loading

Uh oh!

itholic commented Feb 19, 2020

Uh oh!

ueshin commented Feb 19, 2020

Uh oh!

ueshin commented Feb 19, 2020

Uh oh!

itholic commented Feb 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rename data columns prior to filter to make sure the column names are as expected. #1283

Rename data columns prior to filter to make sure the column names are as expected. #1283

Uh oh!

Conversation

ueshin commented Feb 14, 2020

Uh oh!

codecov-io commented Feb 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ueshin Feb 15, 2020

Choose a reason for hiding this comment

Uh oh!

ueshin commented Feb 18, 2020

Uh oh!

itholic commented Feb 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ueshin commented Feb 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

itholic commented Feb 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

itholic commented Feb 19, 2020

Uh oh!

ueshin commented Feb 19, 2020

Uh oh!

ueshin commented Feb 19, 2020

Uh oh!

itholic commented Feb 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-io commented Feb 14, 2020 •

edited

Loading

itholic commented Feb 19, 2020 •

edited

Loading

ueshin commented Feb 19, 2020 •

edited

Loading

itholic commented Feb 19, 2020 •

edited

Loading