Allow querying DataFrame directly in sql method

While I really like the idea of @rxin's recent #256 PR, he uses an (in my opinion) over-simplistic example of `ks.sql("select * from range(10) where id > 7")`. I believe that the ability to query actual Koalas DataFrames through SQL can prove really valuable to many users. However, when trying to use `ks.sql` with a Koalas DataFrame, the following exception occurs:

```python
kdf = ks.DataFrame({'A': [1,2,3]})
>>> ks.sql("select * from kdf")
...
org.apache.spark.sql.AnalysisException: Table or view not found: kdf; line 1 pos 14
...
```

This is not surprising to someone with PySpark knowledge who knows that `kdf` has to be registered as a temporary table before being able to use it with SparkSQL. Unfortunately, (as I understand it) the target group of the Koalas library should not be expected to be Spark experts. To get the above example working, the following workaround is needed, which requires the usage of the lower-level (Py)Spark API, thus somewhat defeating the purpose of Koalas.

```python
>>> from pyspark import SparkContext
>>> from pyspark.sql import SQLContext
>>> sc = SparkContext.getOrCreate()
>>> sql = SQLContext(sc)
>>> sql.registerDataFrameAsTable(kdf._sdf, "kdf")
>>> ks.sql("select * from kdf")
   __index_level_0__  A                                                         
0                  0  1
1                  1  2
2                  2  3
# Optionally clean-up by dropping the temporary table
>>> sql.dropTempTable("kdf")
```

Wouldn't it be much more convenient if this "temporary table magic" would instead be handled by Koalas behind the scenes or are there any design objections against such an approach?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow querying DataFrame directly in sql method #285

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow querying DataFrame directly in sql method #285

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions