Skip to content

Conversation

@itholic
Copy link
Contributor

@itholic itholic commented Feb 12, 2020

This PR proposes to implement DataFrame.query

>>> df = ks.DataFrame({'A': range(1, 6),
...                    'B': range(10, 0, -2),
...                    'C C': range(10, 5, -1)})
>>> df
   A   B  C C
0  1  10   10
1  2   8    9
2  3   6    8
3  4   4    7
4  5   2    6

>>> df.query('A > B')
   A  B  C C
4  5  2    6

The previous expression is equivalent to

>>> df[df.A > df.B]
   A  B  C C
4  5  2    6

For columns with spaces in their name, you can use backtick quoting.

>>> df.query('B == `C C`')
   A   B  C C
0  1  10   10

The previous expression is equivalent to

>>> df[df.B == df['C C']]
   A   B  C C
0  1  10   10

@codecov-io
Copy link

codecov-io commented Feb 13, 2020

Codecov Report

Merging #1273 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1273      +/-   ##
==========================================
- Coverage   95.14%   95.11%   -0.04%     
==========================================
  Files          35       34       -1     
  Lines        7202     7219      +17     
==========================================
+ Hits         6852     6866      +14     
- Misses        350      353       +3
Impacted Files Coverage Δ
databricks/koalas/missing/frame.py 100% <ø> (ø) ⬆️
databricks/koalas/frame.py 96.55% <100%> (-0.14%) ⬇️
databricks/koalas/usage_logging/__init__.py 97.29% <0%> (ø) ⬆️
databricks/koalas/usage_logging/usage_logger.py 100% <0%> (ø) ⬆️
databricks/conftest.py 96.22% <0%> (ø) ⬆️
databricks/koalas/__init__.py 85.1% <0%> (ø) ⬆️
databricks/koalas/namespace.py 87.87% <0%> (ø) ⬆️
databricks/koalas/plot.py 94.28% <0%> (ø) ⬆️
databricks/koalas/testing/utils.py 78.51% <0%> (ø) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8556443...8300045. Read the comment docs.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise.

@itholic
Copy link
Contributor Author

itholic commented Feb 14, 2020

@HyukjinKwon Thanks for the review!

HyukjinKwon pushed a commit that referenced this pull request Feb 18, 2020
As requested at #1273 (comment) ,

and since `map_in_pandas` (#1276 ) has been merged,

just uncommented existing doctest for DataFrame.query
ueshin added a commit that referenced this pull request Feb 19, 2020
… as expected. (#1283)

This is a follow-up of #1273.
The Spark column names are not always the same as its column label.
This PR is to rename data columns prior to filter to make sure the column names are as expected.
@ueshin ueshin mentioned this pull request Apr 9, 2020
@itholic itholic deleted the f_query branch September 10, 2020 11:48
rising-star92 added a commit to rising-star92/databricks-koalas that referenced this pull request Jan 27, 2023
As requested at databricks/koalas#1273 (comment) ,

and since `map_in_pandas` (#1276 ) has been merged,

just uncommented existing doctest for DataFrame.query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants