-
Notifications
You must be signed in to change notification settings - Fork 400
Check array index fix #320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check array index fix #320
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your pull request. It looks really nice!
Apart from my minor comments I would indeed suggest to only keep the convert_inputs
function and also use it for the encoder that do not rely on a target y
. In my opinion this makes the code more streamlined and hence easier to understand across different encoders. Do you agree or do you see any downside to this approach?
Regarding the hacktoberfest, I don't have the rights to add topics to a the project. I'd definitely see the requirements for a quality commit fulfilled. Maybe @wdm0006 could add us to the hacktoberfest.
category_encoders/utils.py
Outdated
if any(X.index != y.index): | ||
raise ValueError("`X` and `y` both have indexes, but they do not match.") | ||
if X.shape[0] != y.shape[0]: | ||
raise ValueError(f"The length of X is {X.shape[0]} but length of y is {y.shape[0]}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we still support python 3.5 (although we should probably change that). However, because of this we cannot work with f-strings at the moment
tests/test_utils.py
Outdated
self.assertEqual(3, len(y)) | ||
self.assertTrue(list(X.index) == list(y.index) == bindex) | ||
|
||
self.assertRaises(ValueError, convert_input_vector, barray, aseries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you testing convert_input_vector
here? shouldn't you rather test if an error is thrown by convert_inputs
if indices of X
and y
differ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, oops, absolutely
I started working toward this and found one downside: to do this, we need to allow |
Oh, and I need to check the two wrapper classes: they use ...and the failed test seems to be a required cast to float, though I'm not sure how I missed that locally. |
Ok I see the problem with using |
I think there's going to be a bit to discuss around setting up base class(es) and attributes. I'm happy to give it a go, but I'd rather merge this fix and take my time with that sort of refactoring. I'll get a fix into the wrappers and check out the py3.5 issue as soon as I can. |
Sounds like a good plan! Let's get this merged as soon as the python 3.5 tests succeed and proceed from there. |
Great work! Thank you. Looks good to me and merging :) |
Closes #280.
Fixes #272, probably also #290, and supersedes #304.
Proposed Changes
Replaces consecutive calls to
convert_input
(onX
) andconvert_input_vector
(ony
) by a singleconvert_inputs
to ensure that the indexes of the results match. This is necessary for proper functioning of encoders that groupy
by values ofX
, and convenient otherwise.I don't like that
convert_inputs
is one character away fromconvert_input
; other suggestions welcomed. One could convert all remainingconvert_input
calls toconvert_inputs
with the defaulty=None
, so thatconvert_input
would joinconvert_input_vector
as only used insideconvert_inputs
.I've also reduced the places where
y
gets cast to float, including that casting only when needed (in glmm wherestatsmodels
would complain otherwise, and quantile wherenumpy.quantile
would complain otherwise).And since
convert_input
has a deep-copy option, I've consolidated a few of the copies into theconvert_inputs
; there are others that I've not consolidated, mostly because the copy happens further away in the code.I'm not sure what needs to be done for a repository to "participate" in Hacktoberfest, but if it's as simple as a maintainer adding a label
hacktoberfest-approved
to the PR, I'd appreciate that.