|
15 | 15 | "cell_type": "markdown",
|
16 | 16 | "metadata": {},
|
17 | 17 | "source": [
|
18 |
| - "\n# Tabular Classification with Custom Configuration Space\n\nThe following example shows how adjust the configuration space of\nthe search. Currently, there are two changes that can be made to the space:-\n1. Adjust individual hyperparameters in the pipeline\n2. Include or exclude components:\n a) include: Dictionary containing components to include. Key is the node\n name and Value is an Iterable of the names of the components\n to include. Only these components will be present in the\n search space.\n b) exclude: Dictionary containing components to exclude. Key is the node\n name and Value is an Iterable of the names of the components\n to exclude. All except these components will be present in\n the search space.\n" |
| 18 | + "\n# Tabular Classification with Custom Configuration Space\n\nThe following example shows how adjust the configuration space of\nthe search. Currently, there are two changes that can be made to the space:-\n\n1. Adjust individual hyperparameters in the pipeline\n2. Include or exclude components:\n a) include: Dictionary containing components to include. Key is the node\n name and Value is an Iterable of the names of the components\n to include. Only these components will be present in the\n search space.\n b) exclude: Dictionary containing components to exclude. Key is the node\n name and Value is an Iterable of the names of the components\n to exclude. All except these components will be present in\n the search space.\n" |
19 | 19 | ]
|
20 | 20 | },
|
21 | 21 | {
|
|
26 | 26 | },
|
27 | 27 | "outputs": [],
|
28 | 28 | "source": [
|
29 |
| - "import os\nimport tempfile as tmp\nimport warnings\n\nos.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()\nos.environ['OMP_NUM_THREADS'] = '1'\nos.environ['OPENBLAS_NUM_THREADS'] = '1'\nos.environ['MKL_NUM_THREADS'] = '1'\n\nwarnings.simplefilter(action='ignore', category=UserWarning)\nwarnings.simplefilter(action='ignore', category=FutureWarning)\n\nimport sklearn.datasets\nimport sklearn.model_selection\n\nfrom autoPyTorch.api.tabular_classification import TabularClassificationTask\nfrom autoPyTorch.utils.hyperparameter_search_space_update import HyperparameterSearchSpaceUpdates\n\n\ndef get_search_space_updates():\n \"\"\"\n Search space updates to the task can be added using HyperparameterSearchSpaceUpdates\n Returns:\n HyperparameterSearchSpaceUpdates\n \"\"\"\n updates = HyperparameterSearchSpaceUpdates()\n updates.append(node_name=\"data_loader\",\n hyperparameter=\"batch_size\",\n value_range=[16, 512],\n default_value=32)\n updates.append(node_name=\"lr_scheduler\",\n hyperparameter=\"CosineAnnealingLR:T_max\",\n value_range=[50, 60],\n default_value=55)\n updates.append(node_name='network_backbone',\n hyperparameter='ResNetBackbone:dropout',\n value_range=[0, 0.5],\n default_value=0.2)\n return updates\n\n\nif __name__ == '__main__':\n\n ############################################################################\n # Data Loading\n # ============\n X, y = sklearn.datasets.fetch_openml(data_id=40981, return_X_y=True, as_frame=True)\n X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(\n X,\n y,\n random_state=1,\n )\n\n ############################################################################\n # Build and fit a classifier with include components\n # ==================================================\n api = TabularClassificationTask(\n search_space_updates=get_search_space_updates(),\n include_components={'network_backbone': ['MLPBackbone', 'ResNetBackbone'],\n 'encoder': ['OneHotEncoder']}\n )\n\n ############################################################################\n # Search for an ensemble of machine learning algorithms\n # =====================================================\n api.search(\n X_train=X_train.copy(),\n y_train=y_train.copy(),\n X_test=X_test.copy(),\n y_test=y_test.copy(),\n optimize_metric='accuracy',\n total_walltime_limit=150,\n func_eval_time_limit_secs=30\n )\n\n ############################################################################\n # Print the final ensemble performance\n # ====================================\n y_pred = api.predict(X_test)\n score = api.score(y_pred, y_test)\n print(score)\n print(api.show_models())\n\n # Print statistics from search\n print(api.sprint_statistics())\n\n ############################################################################\n # Build and fit a classifier with exclude components\n # ==================================================\n api = TabularClassificationTask(\n search_space_updates=get_search_space_updates(),\n exclude_components={'network_backbone': ['MLPBackbone'],\n 'encoder': ['OneHotEncoder']}\n )\n\n ############################################################################\n # Search for an ensemble of machine learning algorithms\n # =====================================================\n api.search(\n X_train=X_train,\n y_train=y_train,\n X_test=X_test.copy(),\n y_test=y_test.copy(),\n optimize_metric='accuracy',\n total_walltime_limit=150,\n func_eval_time_limit_secs=30\n )\n\n ############################################################################\n # Print the final ensemble performance\n # ====================================\n y_pred = api.predict(X_test)\n score = api.score(y_pred, y_test)\n print(score)\n print(api.show_models())\n\n # Print statistics from search\n print(api.sprint_statistics())" |
| 29 | + "import os\nimport tempfile as tmp\nimport warnings\n\nos.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()\nos.environ['OMP_NUM_THREADS'] = '1'\nos.environ['OPENBLAS_NUM_THREADS'] = '1'\nos.environ['MKL_NUM_THREADS'] = '1'\n\nwarnings.simplefilter(action='ignore', category=UserWarning)\nwarnings.simplefilter(action='ignore', category=FutureWarning)\n\nimport sklearn.datasets\nimport sklearn.model_selection\n\nfrom autoPyTorch.api.tabular_classification import TabularClassificationTask\nfrom autoPyTorch.utils.hyperparameter_search_space_update import HyperparameterSearchSpaceUpdates\n\n\ndef get_search_space_updates():\n \"\"\"\n Search space updates to the task can be added using HyperparameterSearchSpaceUpdates\n Returns:\n HyperparameterSearchSpaceUpdates\n \"\"\"\n updates = HyperparameterSearchSpaceUpdates()\n updates.append(node_name=\"data_loader\",\n hyperparameter=\"batch_size\",\n value_range=[16, 512],\n default_value=32)\n updates.append(node_name=\"lr_scheduler\",\n hyperparameter=\"CosineAnnealingLR:T_max\",\n value_range=[50, 60],\n default_value=55)\n updates.append(node_name='network_backbone',\n hyperparameter='ResNetBackbone:dropout',\n value_range=[0, 0.5],\n default_value=0.2)\n return updates" |
| 30 | + ] |
| 31 | + }, |
| 32 | + { |
| 33 | + "cell_type": "markdown", |
| 34 | + "metadata": {}, |
| 35 | + "source": [ |
| 36 | + "## Data Loading\n\n" |
| 37 | + ] |
| 38 | + }, |
| 39 | + { |
| 40 | + "cell_type": "code", |
| 41 | + "execution_count": null, |
| 42 | + "metadata": { |
| 43 | + "collapsed": false |
| 44 | + }, |
| 45 | + "outputs": [], |
| 46 | + "source": [ |
| 47 | + "X, y = sklearn.datasets.fetch_openml(data_id=40981, return_X_y=True, as_frame=True)\nX_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(\n X,\n y,\n random_state=1,\n)" |
| 48 | + ] |
| 49 | + }, |
| 50 | + { |
| 51 | + "cell_type": "markdown", |
| 52 | + "metadata": {}, |
| 53 | + "source": [ |
| 54 | + "## Build and fit a classifier with include components\n\n" |
| 55 | + ] |
| 56 | + }, |
| 57 | + { |
| 58 | + "cell_type": "code", |
| 59 | + "execution_count": null, |
| 60 | + "metadata": { |
| 61 | + "collapsed": false |
| 62 | + }, |
| 63 | + "outputs": [], |
| 64 | + "source": [ |
| 65 | + "api = TabularClassificationTask(\n search_space_updates=get_search_space_updates(),\n include_components={'network_backbone': ['MLPBackbone', 'ResNetBackbone'],\n 'encoder': ['OneHotEncoder']}\n)" |
| 66 | + ] |
| 67 | + }, |
| 68 | + { |
| 69 | + "cell_type": "markdown", |
| 70 | + "metadata": {}, |
| 71 | + "source": [ |
| 72 | + "## Search for an ensemble of machine learning algorithms\n\n" |
| 73 | + ] |
| 74 | + }, |
| 75 | + { |
| 76 | + "cell_type": "code", |
| 77 | + "execution_count": null, |
| 78 | + "metadata": { |
| 79 | + "collapsed": false |
| 80 | + }, |
| 81 | + "outputs": [], |
| 82 | + "source": [ |
| 83 | + "api.search(\n X_train=X_train.copy(),\n y_train=y_train.copy(),\n X_test=X_test.copy(),\n y_test=y_test.copy(),\n optimize_metric='accuracy',\n total_walltime_limit=150,\n func_eval_time_limit_secs=30\n)" |
| 84 | + ] |
| 85 | + }, |
| 86 | + { |
| 87 | + "cell_type": "markdown", |
| 88 | + "metadata": {}, |
| 89 | + "source": [ |
| 90 | + "## Print the final ensemble performance\n\n" |
| 91 | + ] |
| 92 | + }, |
| 93 | + { |
| 94 | + "cell_type": "code", |
| 95 | + "execution_count": null, |
| 96 | + "metadata": { |
| 97 | + "collapsed": false |
| 98 | + }, |
| 99 | + "outputs": [], |
| 100 | + "source": [ |
| 101 | + "y_pred = api.predict(X_test)\nscore = api.score(y_pred, y_test)\nprint(score)\nprint(api.show_models())\n\n# Print statistics from search\nprint(api.sprint_statistics())" |
| 102 | + ] |
| 103 | + }, |
| 104 | + { |
| 105 | + "cell_type": "markdown", |
| 106 | + "metadata": {}, |
| 107 | + "source": [ |
| 108 | + "## Build and fit a classifier with exclude components\n\n" |
| 109 | + ] |
| 110 | + }, |
| 111 | + { |
| 112 | + "cell_type": "code", |
| 113 | + "execution_count": null, |
| 114 | + "metadata": { |
| 115 | + "collapsed": false |
| 116 | + }, |
| 117 | + "outputs": [], |
| 118 | + "source": [ |
| 119 | + "api = TabularClassificationTask(\n search_space_updates=get_search_space_updates(),\n exclude_components={'network_backbone': ['MLPBackbone'],\n 'encoder': ['OneHotEncoder']}\n)" |
| 120 | + ] |
| 121 | + }, |
| 122 | + { |
| 123 | + "cell_type": "markdown", |
| 124 | + "metadata": {}, |
| 125 | + "source": [ |
| 126 | + "## Search for an ensemble of machine learning algorithms\n\n" |
| 127 | + ] |
| 128 | + }, |
| 129 | + { |
| 130 | + "cell_type": "code", |
| 131 | + "execution_count": null, |
| 132 | + "metadata": { |
| 133 | + "collapsed": false |
| 134 | + }, |
| 135 | + "outputs": [], |
| 136 | + "source": [ |
| 137 | + "api.search(\n X_train=X_train,\n y_train=y_train,\n X_test=X_test.copy(),\n y_test=y_test.copy(),\n optimize_metric='accuracy',\n total_walltime_limit=150,\n func_eval_time_limit_secs=30\n)" |
| 138 | + ] |
| 139 | + }, |
| 140 | + { |
| 141 | + "cell_type": "markdown", |
| 142 | + "metadata": {}, |
| 143 | + "source": [ |
| 144 | + "## Print the final ensemble performance\n\n" |
| 145 | + ] |
| 146 | + }, |
| 147 | + { |
| 148 | + "cell_type": "code", |
| 149 | + "execution_count": null, |
| 150 | + "metadata": { |
| 151 | + "collapsed": false |
| 152 | + }, |
| 153 | + "outputs": [], |
| 154 | + "source": [ |
| 155 | + "y_pred = api.predict(X_test)\nscore = api.score(y_pred, y_test)\nprint(score)\nprint(api.show_models())\n\n# Print statistics from search\nprint(api.sprint_statistics())" |
30 | 156 | ]
|
31 | 157 | }
|
32 | 158 | ],
|
|
0 commit comments