-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
What's going on?
Auto-Sklearn has recently been under-maintained, we appreciate that this has caused many users to face dependency issues as pinned dependencies slowly start going out of data. While we support this project primarily through academic means, we are still proud of the community that has formed around it and are dedicated to push it forward.
Will Auto-Sklearn still be maintained?
Yes, auto-sklearn will be maintained and updated moving forward! We initially tried some of these updates, e.g. #1611, #1618 but there were larger issues at play. To alleviate this, we are currently working on a major refactor of the tool, introducing more flexibility and long-wanted features, including pipeline export, flexible pipelines, and a modular design. We expect the first prototype will be available within the next 1-2 months.
Why the refactor?
Auto-Sklearn was initially built during Python 2 and during the eariler days of scikit-learn. Machine learning libraries and their eco-system were still developing and a lot has changed since then. There were also a lot of lessons learned which while easy in concept, truly difficult to integrate into the current design.
Doing research with Auto-Sklearn has also become harder. By becoming a robust and well-performing tool, this has made performing novel research with Auto-Sklearn more difficult.
What to expect?
... Not that much, it's a refactor to get back to where we were but with the goal to make it more extensible.
We will still maintain the front facing AutoSklearnClassifier and AutoSklearnRegressor, to act primarily as it did before and staying very scikit-learn like with it's simple interface.
This refactor will allow us to solve some long standing issues that have arose. We looked through all the issues and tried to categorize what this new refactor will enable. Not all of these issues will be solved upon release but they will provide a tangible rode towards these.
- We will have a new flexible scheduling system, allowing users to hook into events as they happen, hopefully handling issues like:
- Error running "fit" with many cores. #1236
- [Question] Modify Stopping Criterion to Accuracy #1624
- callback function error! #1569
- Dask has stopped support for 3.7 #1522
- S3 support for auto-sklearn to store and load models and configurations for each run #986
- Making auto-sklearn fit methods be interactively stoppable and still give a fitted model #397
- A more flexible pipeline definition, allowing you to create your own or just modify the default, solving:
- When adding NoPreprocessing component to auto-sklearn, the lassoregression can run successfully, while the abess regression crashed #1661
- Feature Request: AutoSklearnOutlierDetector #578
- How to apply a custom preprocessor to only specified features #1110
- [Question] Is it possible to change the hyperparameter space of an algorithm? #1587
- [Question] How can I make sure AutoSklearn is always using StandardScaler for feature preprocessing? #1548
- Can Autosklearn handle Multi-Class/Multi-Label Classification and which classifiers will it use? #1429
- [Question] Are there any alternatives to One-hot encoding? #1268
- Is that possible set the initial value of hyperparams when use auto sklearn to search #577
- Enhancement: Make the Ordinal Encoder a encoder choice #1150
- Custom pipelines #379
- Auto-Sklearn will allow you to optimize your own custom sklearn pipelines and try it's darn best to return you pure functioning sklearn pipelines (no auto-sklearn custom parts attached). This means you will be able to run any library that supports sklearn pipelines. This should allow great strides towards:
- convert to scikit learn code. #388
- Allow autosklearn to export ONNX model #1006
- [Question] How to know the data and feature preprocessing used in the ensemble? #1633
- [Question] Are their any methods to get all models, not only used in ensemble? #1667
- [Question] Rebuilding Auto Sklearn pipelines with the parameter dictionary returned by .cv_results_ #1663
- [Question] Is to_sklearn() available now? #1641
- Can Autosklearn be used with SHAP? #1272
- [Question] Integration with sklearn-evaluation #1640
- [Question] How to get values of categorical variables from a fit model? #1634
- Third Party Components not shared with spawned child processes when n_jobs > 1 #1607
- [Question] Is it possible to use Scikit-learn version >=1.1.1 #1597
- [Question] Viewable Preprocessors and Regressors (internal mechanisms)? #1600
- How can I get/export a production model from a trained model (after refit) with autosklearn? #1467
- How to see the selected features when ensemble size = 1? #1102
- [Question] Is there any straight forward way to retrieve the solution and prediction vector during CV? #1448
- Is it possible to integrate a metric of imblearn as a scorer? #786
- By refactoring, we can also use newer features of sklearn, that previously we tried to bolt in, but was never a first class citizen at the time of Auto-Sklearn's conception.
- How to give sample weights? #288
- [Maint] Specify
encoded_missing_valuetoOrdinalEncoder#1615 - [Research] Use grouping of infrequent categories in
OneHotEncoder#1614 - [Research] Use parameter
quantileinHistGradientBoostingRegressor#1613 - How to weight a given class? [class balancing] #1596
- Transformer should accept
yargument in thetransformmethod #1494 - Update SGD regressor loss values for scikit learn 1.0 #1334
What can I do?
Please let us know what you think and what you'd like to see from this rebuild!