Skip to content

Commit 10df650

Browse files
authored
[doc] Add the workflow of the Auto-Pytorch (#285)
* [doc] Add workflow of the AutoPytorch * [doc] Address Ravin's comment
1 parent 7e67e56 commit 10df650

File tree

2 files changed

+30
-2
lines changed

2 files changed

+30
-2
lines changed

README.md

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,38 @@ While early AutoML frameworks focused on optimizing traditional ML pipelines and
66

77
Auto-PyTorch is mainly developed to support tabular data (classification, regression).
88
The newest features in Auto-PyTorch for tabular data are described in the paper ["Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL"](https://arxiv.org/abs/2006.13799) (see below for bibtex ref).
9+
Also, find the documentation [here](https://automl.github.io/Auto-PyTorch/development).
910

1011
***From v0.1.0, AutoPyTorch has been updated to further improve usability, robustness and efficiency by using SMAC as the underlying optimization package as well as changing the code structure. Therefore, moving from v0.0.2 to v0.1.0 will break compatibility.
1112
In case you would like to use the old API, you can find it at [`master_old`](https://github.com/automl/Auto-PyTorch/tree/master-old).***
1213

14+
## Workflow
15+
16+
The rough description of the workflow of Auto-Pytorch is drawn in the following figure.
17+
18+
<img src="figs/apt_workflow.png" width="500">
19+
20+
In the figure, **Data** is provided by user and
21+
**Portfolio** is a set of configurations of neural networks that work well on diverse datasets.
22+
The current version only supports the *greedy portfolio* as described in the paper *Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL*
23+
This portfolio is used to warm-start the optimization of SMAC.
24+
In other words, we evaluate the portfolio on a provided data as initial configurations.
25+
Then API starts the following procedures:
26+
1. **Validate input data**: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled.
27+
2. **Create dataset**: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits.
28+
3. **Evaluate baselines** *1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from `sklearn.dummy` that represents the worst possible performance.
29+
4. **Search by [SMAC](https://github.com/automl/SMAC3)**:\
30+
a. Determine budget and cut-off rules by [Hyperband](https://jmlr.org/papers/volume18/16-558/16-558.pdf)\
31+
b. Sample a pipeline hyperparameter configuration *2 by SMAC\
32+
c. Update the observations by obtained results\
33+
d. Repeat a. -- c. until the budget runs out
34+
5. Build the best ensemble for the provided dataset from the observations and [model selection of the ensemble](https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml04.icdm06long.pdf).
35+
36+
*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset
37+
38+
*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and
39+
(which specifies the choice of components in each step and their corresponding hyperparameters.
40+
1341
## Installation
1442

1543
### Manual Installation
@@ -25,8 +53,8 @@ We recommend using Anaconda for developing as follows:
2553
git submodule update --init --recursive
2654

2755
# Create the environment
28-
conda create -n autopytorch python=3.8
29-
conda activate autopytorch
56+
conda create -n auto-pytorch python=3.8
57+
conda activate auto-pytorch
3058
conda install swig
3159
cat requirements.txt | xargs -n 1 -L 1 pip install
3260
python setup.py install

figs/apt_workflow.png

120 KB
Loading

0 commit comments

Comments
 (0)