Skip to content

Commit b787bf8

Browse files
committed
BUG: don't sort unique values from categoricals
This should resolve the inconsistency mwaskom reported in GH9148. CC jreback TomAugspurger JanSchulz
1 parent 0efd4b3 commit b787bf8

File tree

3 files changed

+12
-8
lines changed

3 files changed

+12
-8
lines changed

doc/source/whatsnew/v0.16.0.txt

+2
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,8 @@ Bug Fixes
193193
SQLAlchemy type (:issue:`9083`).
194194

195195

196+
- Items in ``Categorical.unique()`` (and ``s.unique()`` if ``s`` is of dtype ``category``) now appear in the order in which they are originally found, not in sorted order (:issue:`9331`). This is now consistent with the behavior for other dtypes in pandas.
197+
196198

197199
- Fixed bug on bug endian platforms which produced incorrect results in ``StataReader`` (:issue:`8688`).
198200

pandas/core/categorical.py

+5-6
Original file line numberDiff line numberDiff line change
@@ -1386,17 +1386,16 @@ def unique(self):
13861386
"""
13871387
Return the unique values.
13881388
1389-
Unused categories are NOT returned.
1389+
Unused categories are NOT returned. Unique values are returned in order
1390+
of appearance.
13901391
13911392
Returns
13921393
-------
13931394
unique values : array
13941395
"""
1395-
unique_codes = np.unique(self.codes)
1396-
# for compatibility with normal unique, which has nan last
1397-
if unique_codes[0] == -1:
1398-
unique_codes[0:-1] = unique_codes[1:]
1399-
unique_codes[-1] = -1
1396+
from pandas.core.nanops import unique1d
1397+
# unlike np.unique, unique1d does not sort
1398+
unique_codes = unique1d(self.codes)
14001399
return take_1d(self.categories.values, unique_codes)
14011400

14021401
def equals(self, other):

pandas/tests/test_categorical.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -774,12 +774,15 @@ def test_unique(self):
774774
exp = np.asarray(["a","b"])
775775
res = cat.unique()
776776
self.assert_numpy_array_equal(res, exp)
777+
777778
cat = Categorical(["a","b","a","a"], categories=["a","b","c"])
778779
res = cat.unique()
779780
self.assert_numpy_array_equal(res, exp)
780-
cat = Categorical(["a","b","a", np.nan], categories=["a","b","c"])
781+
782+
# unique should not sort
783+
cat = Categorical(["b", "b", np.nan, "a"], categories=["a","b","c"])
781784
res = cat.unique()
782-
exp = np.asarray(["a","b", np.nan], dtype=object)
785+
exp = np.asarray(["b", np.nan, "a"], dtype=object)
783786
self.assert_numpy_array_equal(res, exp)
784787

785788
def test_mode(self):

0 commit comments

Comments
 (0)