Skip to content

Commit 2ac62ae

Browse files
authored
Merge pull request #144 from rastala/master
Version 1.0.6
2 parents 4a2d6d6 + cad5d5c commit 2ac62ae

File tree

69 files changed

+5218
-1626
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+5218
-1626
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ which allows you to build, train, deploy and manage machine learning solutions u
77
allows you the choice of using local or cloud compute resources, while managing
88
and maintaining the complete data science workflow from the cloud.
99

10-
You can find instructions on setting up notebooks [here](./NBSETUP.md)
10+
* Read [instructions on setting up notebooks](./NBSETUP.md) to run these notebooks.
1111

12-
You can find full documentation for Azure Machine Learning [here](https://aka.ms/aml-docs)
12+
* Find quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
1313

1414
## Getting Started
1515

configuration.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@
9696
"source": [
9797
"import azureml.core\n",
9898
"\n",
99-
"print(\"This notebook was created using version 1.0.2 of the Azure ML SDK\")\n",
99+
"print(\"This notebook was created using version 1.0.6 of the Azure ML SDK\")\n",
100100
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
101101
]
102102
},
@@ -368,7 +368,7 @@
368368
"name": "python",
369369
"nbconvert_exporter": "python",
370370
"pygments_lexer": "ipython3",
371-
"version": "3.6.7"
371+
"version": "3.6.5"
372372
}
373373
},
374374
"nbformat": 4,

how-to-use-azureml/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,4 @@ As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) not
1313
* [enable-data-collection-for-models-in-aks](./deployment/enable-data-collection-for-models-in-aks) Learn about data collection APIs for deployed model.
1414
* [enable-app-insights-in-production-service](./deployment/enable-app-insights-in-production-service) Learn how to use App Insights with production web service.
1515

16+
Find quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).

how-to-use-azureml/automated-machine-learning/README.md

Lines changed: 12 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@ Below are the three execution environments supported by AutoML.
3434
**NOTE**: Please create your Azure Databricks cluster as v4.x (high concurrency preferred) with **Python 3** (dropdown).
3535
**NOTE**: You should at least have contributor access to your Azure subcription to run the notebook.
3636
- Please remove the previous SDK version if there is any and install the latest SDK by installing **azureml-sdk[automl_databricks]** as a PyPi library in Azure Databricks workspace.
37-
- Download the sample notebook 16a.auto-ml-classification-local-azuredatabricks from [GitHub](https://github.com/Azure/MachineLearningNotebooks) and import into the Azure databricks workspace.
37+
- You can find the detail Readme instructions at [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks).
38+
- Download the sample notebook AutoML_Databricks_local_06.ipynb from [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks) and import into the Azure databricks workspace.
3839
- Attach the notebook to the cluster.
3940

4041
<a name="localconda"></a>
@@ -57,7 +58,7 @@ jupyter notebook
5758
```
5859

5960

60-
### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose Python 3.7 or higher.
61+
### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose 64-bit Python 3.7 or higher.
6162
- **Note**: if you already have conda installed, you can keep using it but it should be version 4.4.10 or later (as shown by: conda -V). If you have a previous version installed, you can update it using the command: conda update conda.
6263
There's no need to install mini-conda specifically.
6364

@@ -123,7 +124,7 @@ bash automl_setup_linux.sh
123124

124125
- [auto-ml-remote-batchai.ipynb](remote-batchai/auto-ml-remote-batchai.ipynb)
125126
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
126-
- Example of using automated ML for classification using a remote Batch AI compute for training
127+
- Example of using automated ML for classification using remote AmlCompute for training
127128
- Parallel execution of iterations
128129
- Async tracking of progress
129130
- Cancelling individual iterations or entire run
@@ -178,114 +179,21 @@ bash automl_setup_linux.sh
178179
- Dataset: scikit learn's [digit dataset](https://innovate.burningman.org/datasets-page/)
179180
- Example of using AutoML for classification using Azure Databricks as the platform for training
180181

181-
- [auto-ml-classification_with_tensorflow.ipynb](classification_with_tensorflow/auto-ml-classification_with_tensorflow.ipynb)
182+
- [auto-ml-classification-with-whitelisting.ipynb](classification-with-whitelisting/auto-ml-classification-with-whitelisting.ipynb)
182183
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
183-
- Simple example of using Auto ML for classification with whitelisting tensorflow models.checkout
184+
- Simple example of using Auto ML for classification with whitelisting tensorflow models.
184185
- Uses local compute for training
185186

186-
- [auto-ml-forecasting-a.ipynb](forecasting-a/auto-ml-forecasting-a.ipynb)
187+
- [auto-ml-forecasting-energy-demand.ipynb](forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb)
187188
- Dataset: [NYC energy demand data](forecasting-a/nyc_energy.csv)
188189
- Example of using AutoML for training a forecasting model
189190

190-
- [auto-ml-forecasting-b.ipynb](forecasting-b/auto-ml-forecasting-b.ipynb)
191+
- [auto-ml-forecasting-orange-juice-sales.ipynb](forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb)
191192
- Dataset: [Dominick's grocery sales of orange juice](forecasting-b/dominicks_OJ.csv)
192193
- Example of training an AutoML forecasting model on multiple time-series
193194

194195
<a name="documentation"></a>
195-
# Documentation
196-
## Table of Contents
197-
1. [Automated ML Settings ](#automlsettings)
198-
1. [Cross validation split options](#cvsplits)
199-
1. [Get Data Syntax](#getdata)
200-
1. [Data pre-processing and featurization](#preprocessing)
201-
202-
<a name="automlsettings"></a>
203-
## Automated ML Settings
204-
205-
|Property|Description|Default|
206-
|-|-|-|
207-
|**primary_metric**|This is the metric that you want to optimize.<br><br> Classification supports the following primary metrics <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i><br><br> Regression supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i><br><i>normalized_root_mean_squared_log_error</i>| Classification: accuracy <br><br> Regression: spearman_correlation
208-
|**iteration_timeout_minutes**|Time limit in minutes for each iteration|None|
209-
|**iterations**|Number of iterations. In each iteration trains the data with a specific pipeline. To get the best result, use at least 100. |100|
210-
|**n_cross_validations**|Number of cross validation splits|None|
211-
|**validation_size**|Size of validation set as percentage of all training samples|None|
212-
|**max_concurrent_iterations**|Max number of iterations that would be executed in parallel|1|
213-
|**preprocess**|*True/False* <br>Setting this to *True* enables preprocessing <br>on the input to handle missing data, and perform some common feature extraction<br>*Note: If input data is Sparse you cannot use preprocess=True*|False|
214-
|**max_cores_per_iteration**| Indicates how many cores on the compute target would be used to train a single pipeline.<br> You can set it to *-1* to use all cores|1|
215-
|**experiment_exit_score**|*double* value indicating the target for *primary_metric*. <br> Once the target is surpassed the run terminates|None|
216-
|**blacklist_models**|*Array* of *strings* indicating models to ignore for Auto ML from the list of models.|None|
217-
|**whitelist_models**|*Array* of *strings* use only models listed for Auto ML from the list of models..|None|
218-
<a name="cvsplits"></a>
219-
## List of models for white list/blacklist
220-
**Classification**
221-
<br><i>LogisticRegression</i>
222-
<br><i>SGD</i>
223-
<br><i>MultinomialNaiveBayes</i>
224-
<br><i>BernoulliNaiveBayes</i>
225-
<br><i>SVM</i>
226-
<br><i>LinearSVM</i>
227-
<br><i>KNN</i>
228-
<br><i>DecisionTree</i>
229-
<br><i>RandomForest</i>
230-
<br><i>ExtremeRandomTrees</i>
231-
<br><i>LightGBM</i>
232-
<br><i>GradientBoosting</i>
233-
<br><i>TensorFlowDNN</i>
234-
<br><i>TensorFlowLinearClassifier</i>
235-
<br><br>**Regression**
236-
<br><i>ElasticNet</i>
237-
<br><i>GradientBoosting</i>
238-
<br><i>DecisionTree</i>
239-
<br><i>KNN</i>
240-
<br><i>LassoLars</i>
241-
<br><i>SGD</i>
242-
<br><i>RandomForest</i>
243-
<br><i>ExtremeRandomTrees</i>
244-
<br><i>LightGBM</i>
245-
<br><i>TensorFlowLinearRegressor</i>
246-
<br><i>TensorFlowDNN</i>
247-
248-
## Cross validation split options
249-
### K-Folds Cross Validation
250-
Use *n_cross_validations* setting to specify the number of cross validations. The training data set will be randomly split into *n_cross_validations* folds of equal size. During each cross validation round, one of the folds will be used for validation of the model trained on the remaining folds. This process repeats for *n_cross_validations* rounds until each fold is used once as validation set. Finally, the average scores accross all *n_cross_validations* rounds will be reported, and the corresponding model will be retrained on the whole training data set.
251-
252-
### Monte Carlo Cross Validation (a.k.a. Repeated Random Sub-Sampling)
253-
Use *validation_size* to specify the percentage of the training data set that should be used for validation, and use *n_cross_validations* to specify the number of cross validations. During each cross validation round, a subset of size *validation_size* will be randomly selected for validation of the model trained on the remaining data. Finally, the average scores accross all *n_cross_validations* rounds will be reported, and the corresponding model will be retrained on the whole training data set.
254-
255-
### Custom train and validation set
256-
You can specify seperate train and validation set either through the get_data() or directly to the fit method.
257-
258-
<a name="getdata"></a>
259-
## get_data() syntax
260-
The *get_data()* function can be used to return a dictionary with these values:
261-
262-
|Key|Type|Dependency|Mutually Exclusive with|Description|
263-
|:-|:-|:-|:-|:-|
264-
|X|Pandas Dataframe or Numpy Array|y|data_train, label, columns|All features to train with|
265-
|y|Pandas Dataframe or Numpy Array|X|label|Label data to train with. For classification, this should be an array of integers. |
266-
|X_valid|Pandas Dataframe or Numpy Array|X, y, y_valid|data_train, label|*Optional* All features to validate with. If this is not specified, X is split between train and validate|
267-
|y_valid|Pandas Dataframe or Numpy Array|X, y, X_valid|data_train, label|*Optional* The label data to validate with. If this is not specified, y is split between train and validate|
268-
|sample_weight|Pandas Dataframe or Numpy Array|y|data_train, label, columns|*Optional* A weight value for each label. Higher values indicate that the sample is more important.|
269-
|sample_weight_valid|Pandas Dataframe or Numpy Array|y_valid|data_train, label, columns|*Optional* A weight value for each validation label. Higher values indicate that the sample is more important. If this is not specified, sample_weight is split between train and validate|
270-
|data_train|Pandas Dataframe|label|X, y, X_valid, y_valid|All data (features+label) to train with|
271-
|label|string|data_train|X, y, X_valid, y_valid|Which column in data_train represents the label|
272-
|columns|Array of strings|data_train||*Optional* Whitelist of columns to use for features|
273-
|cv_splits_indices|Array of integers|data_train||*Optional* List of indexes to split the data for cross validation|
274-
275-
<a name="preprocessing"></a>
276-
## Data pre-processing and featurization
277-
If you use `preprocess=True`, the following data preprocessing steps are performed automatically for you:
278-
279-
1. Dropping high cardinality or no variance features
280-
- Features with no useful information are dropped from training and validation sets. These include features with all values missing, same value across all rows or with extremely high cardinality (e.g., hashes, IDs or GUIDs).
281-
2. Missing value imputation
282-
- For numerical features, missing values are imputed with average of values in the column.
283-
- For categorical features, missing values are imputed with most frequent value.
284-
3. Generating additional features
285-
- For DateTime features: Year, Month, Day, Day of week, Day of year, Quarter, Week of the year, Hour, Minute, Second.
286-
- For Text features: Term frequency based on bi-grams and tri-grams, Count vectorizer.
287-
4. Transformations and encodings
288-
- Numeric features with very few unique values are transformed into categorical features.
196+
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
289197

290198
<a name="pythoncommand"></a>
291199
# Running using python command
@@ -302,8 +210,9 @@ The main code of the file must be indented so that it is under this condition.
302210
# Troubleshooting
303211
## automl_setup fails
304212
1. On windows, make sure that you are running automl_setup from an Anconda Prompt window rather than a regular cmd window. You can launch the "Anaconda Prompt" window by hitting the Start button and typing "Anaconda Prompt". If you don't see the application "Anaconda Prompt", you might not have conda or mini conda installed. In that case, you can install it [here](https://conda.io/miniconda.html)
305-
2. Check that you have conda 4.4.10 or later. You can check the version with the command `conda -V`. If you have a previous version installed, you can update it using the command: `conda update conda`.
306-
3. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
213+
2. Check that you have conda 64-bit installed rather than 32-bit. You can check this with the command `conda info`. The `platform` should be `win-64` for Windows or `osx-64` for Mac.
214+
3. Check that you have conda 4.4.10 or later. You can check the version with the command `conda -V`. If you have a previous version installed, you can update it using the command: `conda update conda`.
215+
4. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
307216

308217
## configuration.ipynb fails
309218
1) For local conda, make sure that you have susccessfully run automl_setup first.

how-to-use-azureml/automated-machine-learning/automl_setup.cmd

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,8 @@ if not errorlevel 1 (
2121
call conda activate %conda_env_name% 2>nul:
2222
if errorlevel 1 goto ErrorExit
2323

24-
call pip install psutil
25-
2624
call python -m ipykernel install --user --name %conda_env_name% --display-name "Python (%conda_env_name%)"
2725

28-
call jupyter nbextension install --py azureml.widgets --user
29-
if errorlevel 1 goto ErrorExit
30-
31-
call jupyter nbextension enable --py azureml.widgets --user
32-
if errorlevel 1 goto ErrorExit
33-
3426
echo.
3527
echo.
3628
echo ***************************************
@@ -39,7 +31,7 @@ echo ***************************************
3931
echo.
4032
echo Starting jupyter notebook - please run the configuration notebook
4133
echo.
42-
jupyter notebook --log-level=50
34+
jupyter notebook --log-level=50 --notebook-dir='..\..'
4335

4436
goto End
4537

how-to-use-azureml/automated-machine-learning/automl_setup_linux.sh

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,6 @@ else
2727
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
2828
source activate $CONDA_ENV_NAME &&
2929
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
30-
jupyter nbextension install --py azureml.widgets --user &&
31-
jupyter nbextension enable --py azureml.widgets --user &&
3230
echo "" &&
3331
echo "" &&
3432
echo "***************************************" &&
@@ -37,7 +35,7 @@ else
3735
echo "" &&
3836
echo "Starting jupyter notebook - please run the configuration notebook" &&
3937
echo "" &&
40-
jupyter notebook --log-level=50
38+
jupyter notebook --log-level=50 --notebook-dir '../..'
4139
fi
4240

4341
if [ $? -gt 0 ]

how-to-use-azureml/automated-machine-learning/automl_setup_mac.sh

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,6 @@ else
2828
source activate $CONDA_ENV_NAME &&
2929
conda install lightgbm -c conda-forge -y &&
3030
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
31-
jupyter nbextension install --py azureml.widgets --user &&
32-
jupyter nbextension enable --py azureml.widgets --user &&
3331
pip install numpy==1.15.3
3432
echo "" &&
3533
echo "" &&
@@ -39,7 +37,7 @@ else
3937
echo "" &&
4038
echo "Starting jupyter notebook - please run the configuration notebook" &&
4139
echo "" &&
42-
jupyter notebook --log-level=50
40+
jupyter notebook --log-level=50 --notebook-dir '../..'
4341
fi
4442

4543
if [ $? -gt 0 ]

0 commit comments

Comments
 (0)