-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TST: #15752 Add drop_duplicates tests for uint, float and bool for Series #17974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @tmnhat2001! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on October 31, 2017 at 02:40 Hours UTC |
Hi, this is my first attempt at issue 15752. Please let me know if the test cases are OK. If they are, I'll do the same to the remaining tests. |
Codecov Report
@@ Coverage Diff @@
## master #17974 +/- ##
==========================================
- Coverage 91.23% 91.22% -0.02%
==========================================
Files 163 163
Lines 50113 50113
==========================================
- Hits 45723 45714 -9
- Misses 4390 4399 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17974 +/- ##
==========================================
- Coverage 91.23% 91.23% -0.01%
==========================================
Files 163 163
Lines 50113 50114 +1
==========================================
- Hits 45723 45720 -3
- Misses 4390 4394 +4
Continue to review full report at Codecov.
|
sc = s.copy() | ||
sc.drop_duplicates(keep=False, inplace=True) | ||
assert_series_equal(sc, s[~expected]) | ||
@pytest.mark.parametrize('dtype', ['int_', 'uint', 'float_']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the parametrization is good here, just pull all dtypes to here to avoid repeating all of this code multiple times.
@pytest.mark.parametrize('arg',
[
(Series([1, 2, 3,3 ], dtype='int_'),
(Series([1, 2, 3,3 ], dtype='uint'),
.....
(Series(['1', '2', '3', '4',])
])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if you have different expected values, just add antother parameter (and expand the tuple to multiple values)
Thanks for the suggestion.I also parametrized the test according to your suggestion. I removed some duplicate test cases to reduce the number of parameters to the test function, otherwise the parametrized block would get very big. |
Series([True, False, False]), | ||
Series([True, False, True, False]) | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would pull the bool tests out and do them separately (as they are simpler and don't need as much checking);
also just in-ilne the expected results, don't use an if for that. the way to do this is to pass in things as a pair
e.g.
@pytest.mark.parametrize("tc1, tc2",
[
(Series(1,2, 3, 3, dtype='int_'), Series([1, 2, 3, 5,3 ,2, 4])),
.....
])
def test_drop_duplicates(self, tc1, tc2)
…and fix test parameterization
Thanks for the input. When you said that bools does not need as much checking, you meant that we can just use 1 test case for it right? |
thanks @tmnhat2001 ! I think to close the original issue we need some categorical ones for various dtypes. (can be directly in test_categorical). |
Thanks! I'll keep working on the remaining test cases. |
git diff upstream/master -u -- "*.py" | flake8 --diff