-
Notifications
You must be signed in to change notification settings - Fork 400
Enncoders not compatible with sklearn pipelines now #265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Which encoders are we talking about? |
Leavoneout for sure, but I think all of them have the same issue. The only way put is by calling cvwrapper, which does apply fit-transform |
Are there any updates on this? I have no experience solving issues but if there is any way to contribute to solve this issue? I'm having a hard time dealing with pipelines combined with TargetEncoder() and I would like to get this fixed. Any guidelines will be deeply appreciated |
Transform method in LeaveOneOut is supposed to behave differently on training data and testing data. And it is known that it causes issues with sklearn pipelines. Nevertheless, unsupervised encoders can behave like any other encoder in sklearn. And possibly some supervised encoders can behave like that as well (I do not know which one, if any). What can be done to make the situation better:
For contributing, check |
I think this is handled by #246, which adds a custom |
closing as by suggestion of @bmreiniger |
Hi, while lookin at the code I realized that the encoders use the variable 'y' to pass information when transforming to use the 'train' behaviour or 'test' behaviour . This does not seem correct since when calling fit_transform on a sklearn pipeline, it first calls fit and then transform without the 'y' parameter. https://github.com/scikit-learn/scikit-learn/blob/0fb307bf39bbdacd6ed713c00724f8f871d60370/sklearn/pipeline.py#L742
An easy fix would be to directly define fit_transform and use 'y' there to get 'train' behaviour. And include only 'test' behaviour in the transform.
The text was updated successfully, but these errors were encountered: