Skip to content

Conversation

@itholic
Copy link
Contributor

@itholic itholic commented Dec 12, 2019

Implement sort_values for Index/MultiIndex
(https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.sort_values.html#pandas.Index.sort_values)

>>> idx = ks.Index([10, 100, 1, 1000])
>>> idx
Int64Index([10, 100, 1, 1000], dtype='int64')

>>> idx.sort_values()
Int64Index([1, 10, 100, 1000], dtype='int64')

>>> idx.sort_values(ascending=False)
Int64Index([1000, 100, 10, 1], dtype='int64')

Support for MultiIndex.

>>> kidx = ks.MultiIndex.from_tuples([('a', 'x', 1), ('c', 'y', 2), ('b', 'z', 3)])
>>> kidx
MultiIndex([('a', 'x', 1),
            ('c', 'y', 2),
            ('b', 'z', 3)],
           )

>>> kidx.sort_values()
MultiIndex([('a', 'x', 1),
            ('b', 'z', 3),
            ('c', 'y', 2)],
           )

>>> kidx.sort_values(ascending=False)
MultiIndex([('c', 'y', 2),
            ('b', 'z', 3),
            ('a', 'x', 1)],
           )

@codecov-io
Copy link

codecov-io commented Dec 12, 2019

Codecov Report

Merging #1120 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1120      +/-   ##
==========================================
+ Coverage   95.19%   95.19%   +<.01%     
==========================================
  Files          35       35              
  Lines        7071     7075       +4     
==========================================
+ Hits         6731     6735       +4     
  Misses        340      340
Impacted Files Coverage Δ
databricks/koalas/missing/indexes.py 100% <ø> (ø) ⬆️
databricks/koalas/indexes.py 96.45% <100%> (+0.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 468bf3a...a8dda7d. Read the comment docs.

Comment on lines 784 to 787
if isinstance(self, MultiIndex):
result.names = self.names
else:
result.name = self.name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this, i think we can't keep names when below case.

>>> kidx = ks.MultiIndex.from_tuples([('a', 'x', 1), ('c', 'y', 2), ('b', 'z', 3)])
>>> kidx.names = ['A', 'B', 'C']
>>> kidx.sort_values()
MultiIndex([('a', 'x', 1),
            ('b', 'z', 3),
            ('c', 'y', 2)],
           )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see. @names.setter seems wrong.

Copy link
Contributor Author

@itholic itholic Dec 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i got it.

in @names.setter, the new index_map is overwritten to self._kdf._internal, not to self._internal.

like below

self._kdf._internal = internal.copy(index_map=list(zip(internal.index_columns, names)))

at this point, i curious why we overwrite self._kdf._internal rather than simply self._internal?

For now, i've fixed it to the current implementation

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine but let me leave it to @ueshin

@softagram-bot
Copy link

Softagram Impact Report for pull/1120 (head commit: 8ae7722)

⚠️ Copy paste found

ℹ️ test_indexes.py: Copy paste fragment on line 30 shared with ../test_dataframe.py, ../test_numpy_compat.py:


    @property
    def pdf(self):
        return pd.DataFrame({
            'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
            'b': [4, 5, 6, 3, 2, 1, ...(truncated 160 chars)

ℹ️ indexes.py: Copy paste fragment inside the same file on lines 720, 1163:

            raise NotImplementedError(
                \"Doesn't support symmetric_difference between Index & MultiIndex for now\")

        sdf_self = self._kdf._s...(truncated 477 chars)

Now that you are on the file, it would be easier to pay back some tech. debt.

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

📄 Full report

Impact Report explained. Give feedback on this report to [email protected]

@HyukjinKwon
Copy link
Member

@itholic can you resolve conflicts?

@itholic
Copy link
Contributor Author

itholic commented Dec 19, 2019

@HyukjinKwon resolved :)

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@HyukjinKwon HyukjinKwon merged commit c03b3a6 into databricks:master Dec 19, 2019
@itholic itholic deleted the i_sort_values branch December 20, 2019 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants