-
Notifications
You must be signed in to change notification settings - Fork 367
Exclude Index columns for exposed Spark DataFrame and disallow Koalas DataFrame with no index #655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f361f81 to
4ca989e
Compare
cf533d2 to
9740e70
Compare
9740e70 to
6f3c98e
Compare
| kdf = DataFrame(pdf) | ||
| return_schema = kdf._sdf.schema | ||
| if len(pdf) <= limit: | ||
| return kdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly just a refactoring to disallow _InternalDataFrame(sdf=sdf, index_map=index_map).
… DataFrame with no index
dc131cc to
ad958b8
Compare
| index_name if index_name is not None else rename(index_name))) | ||
| index_map.remove(info) | ||
|
|
||
| new_data_columns = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These codes are needed because previously it removed the index when reset_index. Now it sets the default index.
Codecov Report
@@ Coverage Diff @@
## master #655 +/- ##
==========================================
+ Coverage 93.19% 93.43% +0.23%
==========================================
Files 32 32
Lines 5294 5314 +20
==========================================
+ Hits 4934 4965 +31
+ Misses 360 349 -11
Continue to review full report at Codecov.
|
Softagram Impact Report for pull/655 (head commit: c4bfbce)⭐ Change Overview
⭐ Details of Dependency Changes
📄 Full report
Give feedback on this report to [email protected] |
ueshin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
Thanks Takuya!! |
|
Why disallow the KDFs with no index? |
|
Biggest reason was: #633 introduced operations on different Koalas DataFrame which is based upon index. So, if no index is set, operations cannot be performed on different Koalas DataFrames (because this feature is based upon a join with index column). Namely, previously the cases below ks.range(10) + ks.range(10)# This was an actual usecase reported
spark_df = spark.read.redshift("...")
ks.DataFrame(spark_df) + ks.DataFrame(spark_df)were disallowed because it has no index ^. Now, it has a default index - #639. Secondly, it makes much easier implementing other APIs. Currently, many APIs such as cumulative sum, max, etc. don't work when index does not exist because the implementation is dependent on index column. Lastly, since Koalas DataFrame, form now on, always sets the default index (as pandas does), we can allow those cases now, which makes more sense and closer to pandas' https://github.com/databricks/koalas/pull/655/files#diff-ebfd51978d1aa348373b958a1b27aa47R936 |
|
cc @rxin, BTW, this is the PR we disabled no-index case. |


This PR is a followup of and proposes two things:
So, for instance,
to_spark()now shows:and
index_mapis not expected to be empty in Koalas DataFrame, always. It sets the default index explicitly per #639