[PYTHON] Notes on running Azure Machine Learning locally

Introduction

When I tried to use Azure Machine Learning on a local machine, although there was official documentation and information, it was packed several times and it took a long time to execute, so I will make a note as a memorandum.

This time, I will summarize from the setup of Ubuntu that runs Azure ML to the execution of Azure ML.

Preparing the environment

Set up your Ubuntu 18.04 environment using the docker image on your Mac. I will omit the acquisition of the docker image and the execution part.

In addition, you need to create an Azure account and workspace to run Azure ML. I will omit the work.

Ubuntu image setup

apt-get update
apt-get upgrade

There are some things that the docker image is not enough, so refer to here (https://qiita.com/manabuishiirb/items/26de8c9740a1d2c7cfdd) and install the necessary ones.

apt-get install -y iputils-ping net-tools wget curl vim build-essential

Installation of Anaconda

This time I will install it with a command, and download Anaconda as follows by referring to this (https://www.virment.com/setup-anaconda-python-jupyter-ubuntu/).

wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh

Install as follows.


bash Anaconda3-2019.10-Linux-x86_64.sh

Run conda init to enable the conda command. Now that you have installed it in / root /, run the following command:

/root/anaconda3/bin/conda init
source /root/.bashrc 

Install Azure Python SDK

Install azure-ml by referring to the official document (https://docs.microsoft.com/ja-jp/azure/machine-learning/service/how-to-configure-environment#local). First, create an Anaconda virtual environment.

conda create -n myenv python=3.6.5
conda activate myenv
conda install notebook ipykernel
ipython kernel install --user --name myenv --display-name "Python (myenv)"

Next, install the Azure CLI required for authentication etc. I referred to here (https://docs.microsoft.com/ja-jp/cli/azure/install-azure-cli-apt?view=azure-cli-latest).

curl -sL https://aka.ms/InstallAzureCLIDeb | bash

Finally, install the Azure ML SDK.

pip install azureml-sdk[notebooks,automl]

The following error appears on the way, but there was no problem.

ERROR: azureml-automl-runtime 1.0.81 has requirement azureml-automl-core==1.0.81, but you'll have azureml-automl-core 1.0.81.1 which is incompatible.

Performing automatic machine learning

Authentication by ʻaz login`

First, authenticate with the ʻaz login` command. Access the URL that appears after executing the command with a web browser and enter the code.

az login
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code GPVMUVTKF to authenticate.

Creating a file to connect to the workspace

Create a Python program (auth.py) to create workspace information.

auth.py


from azureml.core import Workspace

subscription_id = '<Subscription id>'
resource_group  = '<Resource group name>'
workspace_name  = '<Workspace name>'

try:
    ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
    ws.write_config()
    print('Library configuration succeeded')
except:
    print('Workspace not found')

When executed, a config file for connecting to the workspace will be created in .azureml / config.json in the current directory.

Run

Create a Python program (run.py) to perform machine learning. For the data, we will use the breast cancer data provided by scikit-learn. For more information on datasets, see here (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html#sklearn.datasets.load_breast_cancer).

run.py


import logging

from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.experiment import Experiment

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

#Workspace config read
ws = Workspace.from_config()

#Data loading
data = load_breast_cancer()
df_X = pd.DataFrame(data.data, columns=data.feature_names)
df_y = pd.DataFrame(data.target, columns=['target'])
x_train, x_test, y_train, y_test = train_test_split(df_X, df_y, test_size=0.2, random_state=100)

#Machine learning settings
automl_settings = {
    "iteration_timeout_minutes": 2,
    "experiment_timeout_minutes": 20,
    "enable_early_stopping": True,
    "primary_metric": 'AUC_weighted',
    "featurization": 'auto',
    "verbosity": logging.INFO,
    "n_cross_validations": 5
}


automl_config = AutoMLConfig(task='classification',
                             debug_log='automated_ml_errors.log',
                             X=x_train.values,
                             y=y_train.values.flatten(),
                             **automl_settings)

#Run
experiment = Experiment(ws, "my-experiment")
local_run = experiment.submit(automl_config, show_output=True)

The part set in ʻautoml_settingsis described according to the data and the problem. Since this is a binary classification problem, the optimization index is set to AUC, and classification is set totask of ʻAutoMLConfig. Click here for details (https://docs.microsoft.com/ja-jp/azure/machine-learning/service/how-to-configure-auto-train).

When executed, it will build some models and ensemble after a simple feature engineering.

python run.py 

(abridgement)

Current status: DatasetFeaturization. Beginning to featurize the dataset.
Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturizationCompleted. Completed featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Classes are balanced in the training data.

TYPE:         Missing values imputation
STATUS:       PASSED
DESCRIPTION:  There were no missing values found in the training data.

TYPE:         High cardinality feature detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and no high cardinality features were detected.

****************************************************************************************************
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
ITERATION: The iteration being evaluated.
PIPELINE: A summary description of the pipeline being evaluated.
DURATION: Time taken for the current iteration.
METRIC: The result of computing score on the fitted pipeline.
BEST: The best observed score thus far.
****************************************************************************************************

 ITERATION   PIPELINE                                       DURATION      METRIC      BEST
         0   StandardScalerWrapper SGD                      0:00:13       0.9940    0.9940
         1   StandardScalerWrapper SGD                      0:00:12       0.9958    0.9958
         2   MinMaxScaler LightGBM                          0:00:12       0.9888    0.9958
         3   StandardScalerWrapper SGD                      0:00:11       0.9936    0.9958
         4   StandardScalerWrapper ExtremeRandomTrees       0:00:14       0.9908    0.9958
         5   StandardScalerWrapper LightGBM                 0:00:11       0.9887    0.9958
         6   StandardScalerWrapper SGD                      0:00:11       0.9956    0.9958
         7   MinMaxScaler RandomForest                      0:00:13       0.9814    0.9958
         8   StandardScalerWrapper SGD                      0:00:11       0.9851    0.9958
         9   MinMaxScaler SGD                               0:00:11       0.9441    0.9958
        10   MinMaxScaler RandomForest                      0:00:11       0.9802    0.9958
        11   MaxAbsScaler LightGBM                          0:00:11       0.9780    0.9958
        12   MinMaxScaler LightGBM                          0:00:12       0.9886    0.9958
        13   MinMaxScaler ExtremeRandomTrees                0:00:11       0.9816    0.9958
        14   MinMaxScaler LightGBM                          0:00:11       0.9731    0.9958
        15   StandardScalerWrapper BernoulliNaiveBayes      0:00:11       0.9705    0.9958
        16   StandardScalerWrapper LogisticRegression       0:00:13       0.9959    0.9959
        17   MaxAbsScaler ExtremeRandomTrees                0:00:28       0.9906    0.9959
        18   RobustScaler LogisticRegression                0:00:13       0.9853    0.9959
        19   RobustScaler LightGBM                          0:00:12       0.9904    0.9959
        20   StandardScalerWrapper LogisticRegression       0:00:11       0.5000    0.9959
        21   MaxAbsScaler LinearSVM                         0:00:12       0.9871    0.9959
        22   StandardScalerWrapper SVM                      0:00:12       0.9873    0.9959
        23   RobustScaler LogisticRegression                0:00:14       0.9909    0.9959
        24   MaxAbsScaler LightGBM                          0:00:15       0.9901    0.9959
        25   RobustScaler LogisticRegression                0:00:29       0.9894    0.9959
        26   MaxAbsScaler LightGBM                          0:00:13       0.9897    0.9959
        27   MaxAbsScaler LightGBM                          0:00:15       0.9907    0.9959
        28   RobustScaler KNN                               0:00:12       0.9887    0.9959
        29   MaxAbsScaler LogisticRegression                0:00:13       0.9940    0.9959
        30   VotingEnsemble                                 0:00:31       0.9965    0.9965
        31   StackEnsemble                                  0:00:36       0.9960    0.9965
Stopping criteria reached at iteration 31. Ending experiment.

Since the AUC is quite high at 0.99, it seems that something is leaking, but this time I will ignore it once.

Summary

I summarized the flow for running Azure ML in the local environment. While I think Azure ML is convenient, I wish the official Azure documentation was a little easier to understand ...

Recommended Posts

Notes on running Azure Machine Learning locally
Notes on PyQ machine learning python grammar
Notes on machine learning (updated from time to time)
Looking back on learning with Azure Machine Learning Studio
Basics of Machine Learning (Notes)
Application development using Azure Machine Learning
Machine learning
[First Deep Learning] Notes on running the sample after installing Deel
Notes on running M5Stick V with uPyLoader
Personal notes and links about machine learning ① (Machine learning)
Machine learning with Pytorch on Google Colab
Notify Slack when the machine learning process running on GCP is finished
Python learning notes
Upgrade the Azure Machine Learning SDK for Python
Notes on Flask
Automatically stop the VM when the machine learning process running on GCP is finished
[Memo] Machine learning
Machine learning classification
Try using Jupyter Notebook of Azure Machine Learning
python learning notes
Install the machine learning library TensorFlow on fedora23
Machine Learning sample
Key points of "Machine learning with Azure ML Studio"
Build a machine learning Python environment on Mac OS
14 e-mail newsletters useful for gathering information on machine learning
Build a machine learning environment natively on Windows 10 (x64)
Set up python and machine learning libraries on Ubuntu
Machine learning tutorial summary
About machine learning overfitting
Machine learning ⑤ AdaBoost Summary
Notes on neural networks
Machine learning logistic regression
Device mapper learning notes
Machine learning support vector machine
Notes on installing PycURL
Studying Machine Learning ~ matplotlib ~
Machine learning linear regression
Machine learning course memo
Machine learning library dlib
Machine learning (TensorFlow) + Lotto 6
Somehow learn machine learning
Notes on using Alembic
Notes on SciPy.linalg functions
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Machine Learning: k-Nearest Neighbors
What is machine learning?
Build an environment for machine learning using Python on MacOSX
I installed the automatic machine learning library auto-sklearn on centos7
iOS / iPad OS app "Juno" that allows machine learning on iPad
Machine learning environment settings based on Python 3 on Mac (coexistence with Python 2)