You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -7,9 +7,9 @@ which allows you to build, train, deploy and manage machine learning solutions u
7
7
allows you the choice of using local or cloud compute resources, while managing
8
8
and maintaining the complete data science workflow from the cloud.
9
9
10
-
You can find instructions on setting up notebooks[here](./NBSETUP.md)
10
+
* Read [instructions on setting up notebooks](./NBSETUP.md) to run these notebooks.
11
11
12
-
You can find full documentation for Azure Machine Learning [here](https://aka.ms/aml-docs)
12
+
* Find quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
Copy file name to clipboardExpand all lines: how-to-use-azureml/README.md
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -13,3 +13,4 @@ As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) not
13
13
*[enable-data-collection-for-models-in-aks](./deployment/enable-data-collection-for-models-in-aks) Learn about data collection APIs for deployed model.
14
14
*[enable-app-insights-in-production-service](./deployment/enable-app-insights-in-production-service) Learn how to use App Insights with production web service.
15
15
16
+
Find quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
@@ -34,7 +34,8 @@ Below are the three execution environments supported by AutoML.
34
34
**NOTE**: Please create your Azure Databricks cluster as v4.x (high concurrency preferred) with **Python 3** (dropdown).
35
35
**NOTE**: You should at least have contributor access to your Azure subcription to run the notebook.
36
36
- Please remove the previous SDK version if there is any and install the latest SDK by installing **azureml-sdk[automl_databricks]** as a PyPi library in Azure Databricks workspace.
37
-
- Download the sample notebook 16a.auto-ml-classification-local-azuredatabricks from [GitHub](https://github.com/Azure/MachineLearningNotebooks) and import into the Azure databricks workspace.
37
+
- You can find the detail Readme instructions at [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks).
38
+
- Download the sample notebook AutoML_Databricks_local_06.ipynb from [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks) and import into the Azure databricks workspace.
38
39
- Attach the notebook to the cluster.
39
40
40
41
<aname="localconda"></a>
@@ -57,7 +58,7 @@ jupyter notebook
57
58
```
58
59
59
60
60
-
### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose Python 3.7 or higher.
61
+
### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose 64-bit Python 3.7 or higher.
61
62
-**Note**: if you already have conda installed, you can keep using it but it should be version 4.4.10 or later (as shown by: conda -V). If you have a previous version installed, you can update it using the command: conda update conda.
62
63
There's no need to install mini-conda specifically.
- Dataset: [Dominick's grocery sales of orange juice](forecasting-b/dominicks_OJ.csv)
192
193
- Example of training an AutoML forecasting model on multiple time-series
193
194
194
195
<aname="documentation"></a>
195
-
# Documentation
196
-
## Table of Contents
197
-
1.[Automated ML Settings ](#automlsettings)
198
-
1.[Cross validation split options](#cvsplits)
199
-
1.[Get Data Syntax](#getdata)
200
-
1.[Data pre-processing and featurization](#preprocessing)
201
-
202
-
<aname="automlsettings"></a>
203
-
## Automated ML Settings
204
-
205
-
|Property|Description|Default|
206
-
|-|-|-|
207
-
|**primary_metric**|This is the metric that you want to optimize.<br><br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i><br><br> Regression supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i><br><i>normalized_root_mean_squared_log_error</i>| Classification: accuracy <br><br> Regression: spearman_correlation
208
-
|**iteration_timeout_minutes**|Time limit in minutes for each iteration|None|
209
-
|**iterations**|Number of iterations. In each iteration trains the data with a specific pipeline. To get the best result, use at least 100. |100|
210
-
|**n_cross_validations**|Number of cross validation splits|None|
211
-
|**validation_size**|Size of validation set as percentage of all training samples|None|
212
-
|**max_concurrent_iterations**|Max number of iterations that would be executed in parallel|1|
213
-
|**preprocess**|*True/False* <br>Setting this to *True* enables preprocessing <br>on the input to handle missing data, and perform some common feature extraction<br>*Note: If input data is Sparse you cannot use preprocess=True*|False|
214
-
|**max_cores_per_iteration**| Indicates how many cores on the compute target would be used to train a single pipeline.<br> You can set it to *-1* to use all cores|1|
215
-
|**experiment_exit_score**|*double* value indicating the target for *primary_metric*. <br> Once the target is surpassed the run terminates|None|
216
-
|**blacklist_models**|*Array* of *strings* indicating models to ignore for Auto ML from the list of models.|None|
217
-
|**whitelist_models**|*Array* of *strings* use only models listed for Auto ML from the list of models..|None|
218
-
<aname="cvsplits"></a>
219
-
## List of models for white list/blacklist
220
-
**Classification**
221
-
<br><i>LogisticRegression</i>
222
-
<br><i>SGD</i>
223
-
<br><i>MultinomialNaiveBayes</i>
224
-
<br><i>BernoulliNaiveBayes</i>
225
-
<br><i>SVM</i>
226
-
<br><i>LinearSVM</i>
227
-
<br><i>KNN</i>
228
-
<br><i>DecisionTree</i>
229
-
<br><i>RandomForest</i>
230
-
<br><i>ExtremeRandomTrees</i>
231
-
<br><i>LightGBM</i>
232
-
<br><i>GradientBoosting</i>
233
-
<br><i>TensorFlowDNN</i>
234
-
<br><i>TensorFlowLinearClassifier</i>
235
-
<br><br>**Regression**
236
-
<br><i>ElasticNet</i>
237
-
<br><i>GradientBoosting</i>
238
-
<br><i>DecisionTree</i>
239
-
<br><i>KNN</i>
240
-
<br><i>LassoLars</i>
241
-
<br><i>SGD</i>
242
-
<br><i>RandomForest</i>
243
-
<br><i>ExtremeRandomTrees</i>
244
-
<br><i>LightGBM</i>
245
-
<br><i>TensorFlowLinearRegressor</i>
246
-
<br><i>TensorFlowDNN</i>
247
-
248
-
## Cross validation split options
249
-
### K-Folds Cross Validation
250
-
Use *n_cross_validations* setting to specify the number of cross validations. The training data set will be randomly split into *n_cross_validations* folds of equal size. During each cross validation round, one of the folds will be used for validation of the model trained on the remaining folds. This process repeats for *n_cross_validations* rounds until each fold is used once as validation set. Finally, the average scores accross all *n_cross_validations* rounds will be reported, and the corresponding model will be retrained on the whole training data set.
251
-
252
-
### Monte Carlo Cross Validation (a.k.a. Repeated Random Sub-Sampling)
253
-
Use *validation_size* to specify the percentage of the training data set that should be used for validation, and use *n_cross_validations* to specify the number of cross validations. During each cross validation round, a subset of size *validation_size* will be randomly selected for validation of the model trained on the remaining data. Finally, the average scores accross all *n_cross_validations* rounds will be reported, and the corresponding model will be retrained on the whole training data set.
254
-
255
-
### Custom train and validation set
256
-
You can specify seperate train and validation set either through the get_data() or directly to the fit method.
257
-
258
-
<aname="getdata"></a>
259
-
## get_data() syntax
260
-
The *get_data()* function can be used to return a dictionary with these values:
|X|Pandas Dataframe or Numpy Array|y|data_train, label, columns|All features to train with|
265
-
|y|Pandas Dataframe or Numpy Array|X|label|Label data to train with. For classification, this should be an array of integers. |
266
-
|X_valid|Pandas Dataframe or Numpy Array|X, y, y_valid|data_train, label|*Optional* All features to validate with. If this is not specified, X is split between train and validate|
267
-
|y_valid|Pandas Dataframe or Numpy Array|X, y, X_valid|data_train, label|*Optional* The label data to validate with. If this is not specified, y is split between train and validate|
268
-
|sample_weight|Pandas Dataframe or Numpy Array|y|data_train, label, columns|*Optional* A weight value for each label. Higher values indicate that the sample is more important.|
269
-
|sample_weight_valid|Pandas Dataframe or Numpy Array|y_valid|data_train, label, columns|*Optional* A weight value for each validation label. Higher values indicate that the sample is more important. If this is not specified, sample_weight is split between train and validate|
270
-
|data_train|Pandas Dataframe|label|X, y, X_valid, y_valid|All data (features+label) to train with|
271
-
|label|string|data_train|X, y, X_valid, y_valid|Which column in data_train represents the label|
272
-
|columns|Array of strings|data_train||*Optional* Whitelist of columns to use for features|
273
-
|cv_splits_indices|Array of integers|data_train||*Optional* List of indexes to split the data for cross validation|
274
-
275
-
<aname="preprocessing"></a>
276
-
## Data pre-processing and featurization
277
-
If you use `preprocess=True`, the following data preprocessing steps are performed automatically for you:
278
-
279
-
1. Dropping high cardinality or no variance features
280
-
- Features with no useful information are dropped from training and validation sets. These include features with all values missing, same value across all rows or with extremely high cardinality (e.g., hashes, IDs or GUIDs).
281
-
2. Missing value imputation
282
-
- For numerical features, missing values are imputed with average of values in the column.
283
-
- For categorical features, missing values are imputed with most frequent value.
284
-
3. Generating additional features
285
-
- For DateTime features: Year, Month, Day, Day of week, Day of year, Quarter, Week of the year, Hour, Minute, Second.
286
-
- For Text features: Term frequency based on bi-grams and tri-grams, Count vectorizer.
287
-
4. Transformations and encodings
288
-
- Numeric features with very few unique values are transformed into categorical features.
196
+
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
289
197
290
198
<aname="pythoncommand"></a>
291
199
# Running using python command
@@ -302,8 +210,9 @@ The main code of the file must be indented so that it is under this condition.
302
210
# Troubleshooting
303
211
## automl_setup fails
304
212
1. On windows, make sure that you are running automl_setup from an Anconda Prompt window rather than a regular cmd window. You can launch the "Anaconda Prompt" window by hitting the Start button and typing "Anaconda Prompt". If you don't see the application "Anaconda Prompt", you might not have conda or mini conda installed. In that case, you can install it [here](https://conda.io/miniconda.html)
305
-
2. Check that you have conda 4.4.10 or later. You can check the version with the command `conda -V`. If you have a previous version installed, you can update it using the command: `conda update conda`.
306
-
3. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
213
+
2. Check that you have conda 64-bit installed rather than 32-bit. You can check this with the command `conda info`. The `platform` should be `win-64` for Windows or `osx-64` for Mac.
214
+
3. Check that you have conda 4.4.10 or later. You can check the version with the command `conda -V`. If you have a previous version installed, you can update it using the command: `conda update conda`.
215
+
4. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
307
216
308
217
## configuration.ipynb fails
309
218
1) For local conda, make sure that you have susccessfully run automl_setup first.
0 commit comments