Skip to content

Conversation

@itholic
Copy link
Contributor

@itholic itholic commented Sep 11, 2019

related with #765 , I've added index_col for spark IO reads

If we know the index column already, We can prevent the creation of a default index by explicitly typing an index column as function arguments.

For example, now we can use 'read_table' like below.

>>> ks.read_table('test_table1', index_col='i32')
     i64     f  bhello
i32
0      0   0.0  people
2      1  11.0   hello
0      2  12.0  people
1      0  10.0   hello
0      0  15.0   hello
2      0   5.0   hello
0      3   3.0      yo
1      4   4.0  people
0      1   6.0  people
1      2   7.0      yo
0      3  18.0      yo
1      4  19.0   hello
1      1   1.0   hello
2      2   2.0      yo
1      1  16.0   hello
2      2  17.0      yo
2      3   8.0      yo
0      4   9.0   hello
1      3  13.0      yo
2      4  14.0   hello
>>> ks.read_table('test_table1', index_col=['i32', 'i64'])
            f  bhello
i32 i64
0   0     0.0  people
2   1    11.0   hello
0   2    12.0  people
1   0    10.0   hello
0   0    15.0   hello
2   0     5.0   hello
0   3     3.0      yo
1   4     4.0  people
0   1     6.0  people
1   2     7.0      yo
0   3    18.0      yo
1   4    19.0   hello
    1     1.0   hello
2   2     2.0      yo
1   1    16.0   hello
2   2    17.0      yo
    3     8.0      yo
0   4     9.0   hello
1   3    13.0      yo
2   4    14.0   hello

Currently only added to 'read_table' functions.

And If you think this way is okay, I'm going to create a PR with all the other functions.

@itholic
Copy link
Contributor Author

itholic commented Sep 11, 2019

@ueshin , @HyukjinKwon Could you take a look at this maybe if you available? :)

@codecov-io
Copy link

codecov-io commented Sep 11, 2019

Codecov Report

Merging #769 into master will decrease coverage by <.01%.
The diff coverage is 92.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #769      +/-   ##
==========================================
- Coverage   93.83%   93.82%   -0.01%     
==========================================
  Files          32       32              
  Lines        5744     5753       +9     
==========================================
+ Hits         5390     5398       +8     
- Misses        354      355       +1
Impacted Files Coverage Δ
databricks/koalas/namespace.py 81.22% <92.3%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1125f9...0c05081. Read the comment docs.

@softagram-bot
Copy link

Softagram Impact Report for pull/769 (head commit: 0c05081)

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

📄 Full report

Impact Report explained. Give feedback on this report to [email protected]

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending tests.

@ueshin
Copy link
Collaborator

ueshin commented Sep 13, 2019

Thanks! merging.

@ueshin ueshin merged commit b2cfd3f into databricks:master Sep 13, 2019
@itholic itholic deleted the add_index_col branch September 13, 2019 06:59
HyukjinKwon pushed a commit that referenced this pull request Sep 16, 2019
Resolves #765 ,

I applied the same logic(as worked on #769) to all of the functions mentioned in above issue.

So when we work with spark IO read, and also know about index column name,

now we can use these functions with index_col like below and avoid creation of default index:

```python
>>> ks.read_parquet(path, index_col=['i32', 'i64'])
           f  bhello
i32 i64
0   1    6.0  people
1   2    7.0      yo
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants