-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TST/CLN: correctly skip in indexes/common; add test for duplicated #21902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,10 +35,6 @@ def verify_pickle(self, indices): | |
assert indices.equals(unpickled) | ||
|
||
def test_pickle_compat_construction(self): | ||
# this is testing for pickle compat | ||
if self._holder is None: | ||
return | ||
|
||
# need an object to create with | ||
pytest.raises(TypeError, self._holder) | ||
|
||
|
@@ -236,7 +232,7 @@ def test_set_name_methods(self, indices): | |
|
||
# don't tests a MultiIndex here (as its tested separated) | ||
if isinstance(indices, MultiIndex): | ||
return | ||
pytest.skip('Skip check for MultiIndex') | ||
original_name = indices.name | ||
new_ind = indices.set_names([new_name]) | ||
assert new_ind.name == new_name | ||
|
@@ -333,7 +329,8 @@ def test_copy_and_deepcopy(self, indices): | |
from copy import copy, deepcopy | ||
|
||
if isinstance(indices, MultiIndex): | ||
return | ||
pytest.skip('Skip check for MultiIndex') | ||
|
||
for func in (copy, deepcopy): | ||
idx_copy = func(indices) | ||
assert idx_copy is not indices | ||
|
@@ -342,20 +339,50 @@ def test_copy_and_deepcopy(self, indices): | |
new_copy = indices.copy(deep=True, name="banana") | ||
assert new_copy.name == "banana" | ||
|
||
def test_duplicates(self, indices): | ||
def test_has_duplicates(self, indices): | ||
if type(indices) is not self._holder: | ||
return | ||
pytest.skip('Can only check if we have the correct type') | ||
if not len(indices) or isinstance(indices, MultiIndex): | ||
return | ||
# MultiIndex tested separately in: | ||
# tests/indexes/multi/test_unique_and_duplicates | ||
pytest.skip('Skip check for empty Index and MultiIndex') | ||
|
||
idx = self._holder([indices[0]] * 5) | ||
assert not idx.is_unique | ||
assert idx.has_duplicates | ||
|
||
@pytest.mark.parametrize('keep', ['first', 'last', False]) | ||
def test_duplicated(self, indices, keep): | ||
if type(indices) is not self._holder: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you use isinstance here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. isinstance of self._holder, is this what we do elsewhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback If I change
to
I get 9 failures (instead of skips), all of which are from
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback Seem to have found something: in my normal workbook, I get
but for some reason, within
Same happens for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Opened a follow-up: #22211 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nevermind, I had gotten confused between instances and classes. So In any case, under these circumstances, I'm even more convinced that it's best to just stay with the |
||
pytest.skip('Can only check if we know the index type') | ||
if not len(indices) or isinstance(indices, MultiIndex): | ||
# MultiIndex tested separately in: | ||
# tests/indexes/multi/test_unique_and_duplicates | ||
pytest.skip('Skip check for empty Index and MultiIndex') | ||
|
||
idx = self._holder(indices) | ||
if idx.has_duplicates: | ||
# We are testing the duplicated-method here, so we need to know | ||
# exactly which indices are duplicate and how (for the result). | ||
# This is not possible if "idx" has duplicates already, which we | ||
# therefore remove. This is seemingly circular, as drop_duplicates | ||
# invokes duplicated, but in the end, it all works out because we | ||
# cross-check with Series.duplicated, which is tested separately. | ||
idx = idx.drop_duplicates() | ||
|
||
n, k = len(idx), 10 | ||
duplicated_selection = np.random.choice(n, k * n) | ||
expected = pd.Series(duplicated_selection).duplicated(keep=keep).values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I hate to check as a numpy array, much prefer to check the type and use assert_index_equal or assert_series_equal. is this how the other tests are? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Um, All the manual |
||
idx = self._holder(idx.values[duplicated_selection]) | ||
|
||
result = idx.duplicated(keep=keep) | ||
tm.assert_numpy_array_equal(result, expected) | ||
|
||
def test_unique(self, indices): | ||
# don't test a MultiIndex here (as its tested separated) | ||
# don't test a CategoricalIndex because categories change (GH 18291) | ||
if isinstance(indices, (MultiIndex, CategoricalIndex)): | ||
return | ||
pytest.skip('Skip check for MultiIndex/CategoricalIndex') | ||
|
||
# GH 17896 | ||
expected = indices.drop_duplicates() | ||
|
@@ -375,7 +402,7 @@ def test_unique_na(self): | |
def test_get_unique_index(self, indices): | ||
# MultiIndex tested separately | ||
if not len(indices) or isinstance(indices, MultiIndex): | ||
return | ||
pytest.skip('Skip check for empty Index and MultiIndex') | ||
|
||
idx = indices[[0] * 5] | ||
idx_unique = indices[[0]] | ||
|
@@ -394,7 +421,7 @@ def test_get_unique_index(self, indices): | |
|
||
# nans: | ||
if not indices._can_hold_na: | ||
return | ||
pytest.skip('Skip na-check if index cannot hold na') | ||
|
||
if needs_i8_conversion(indices): | ||
vals = indices.asi8[[0] * 5] | ||
|
@@ -423,7 +450,7 @@ def test_sort(self, indices): | |
|
||
def test_mutability(self, indices): | ||
if not len(indices): | ||
return | ||
pytest.skip('Skip check for empty Index') | ||
pytest.raises(TypeError, indices.__setitem__, 0, indices[0]) | ||
|
||
def test_view(self, indices): | ||
|
@@ -761,7 +788,7 @@ def test_equals_op(self): | |
# GH9947, GH10637 | ||
index_a = self.create_index() | ||
if isinstance(index_a, PeriodIndex): | ||
return | ||
pytest.skip('Skip check for PeriodIndex') | ||
|
||
n = len(index_a) | ||
index_b = index_a[0:-1] | ||
|
@@ -989,11 +1016,11 @@ def test_searchsorted_monotonic(self, indices): | |
# not implemented for tuple searches in MultiIndex | ||
# or Intervals searches in IntervalIndex | ||
if isinstance(indices, (MultiIndex, IntervalIndex)): | ||
return | ||
pytest.skip('Skip check for MultiIndex/IntervalIndex') | ||
|
||
# nothing to test if the index is empty | ||
if indices.empty: | ||
return | ||
pytest.skip('Skip check for empty Index') | ||
value = indices[0] | ||
|
||
# determine the expected results (handle dupes for 'right') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a duplicate these (pun intended) or are not testing indices duplicated currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
.duplicated
is hardly ever tested directly, only indirectly for stuff like.drop_duplicates
. Regardless of the changes to.duplicated
in #21645, I think duplicated should be tested separately.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so these are not relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback No,
test_duplicates
(at least forpandas/tests/indexes/common.py
, which is what this PR is about) tests.is_unique
and.has_duplicates
, but not the.duplicated
-method itself.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you point to the coverage that shows this is NOT tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I never said that it is not tested, just that it is only tested implicitly. Any call to
.drop_duplicates
will invokeduplicated
, so obviously the coverage works out.