[PYTHON] Introduction to ClearML-Easy to manage machine learning experiments-

Introduction

Machine learning has the problem that it is difficult to manage experiments because it is necessary to manage not only the code used for learning the model but also the data set, the product generated by preprocessing, the model, etc. as a set. Proper experimental management is also important for bringing the code that was working in the experimental stage to the production environment and reproducing similar prediction results.

MLflow is famous for machine learning experiment management, but I found an experiment management tool called ClearML (former name: Allegro Trains), so in this article I will explain how to use ClearML easily. ..

ClearML: https://github.com/allegroai/clearml (Apache-2.0 License) Official documentation: https://allegro.ai/clearml/docs/index.html#

The following articles are also very helpful for the concept of experiment management. Thinking about experiment management Re: ML life starting from zero

wrap up

ClearML is a tool that provides machine learning experiment management and MLOps functions. It supports time-consuming and error-prone tasks related to development and version tracking in the machine learning life cycle.

ClearML has the following three main functions.

-Experiment management --Automatic experiment management including environment and learning results

In this article, I will mainly explain how to use experiment management among the three functions. It also briefly describes the ClearML architecture at the end.

I have tried using ClearML and confirmed that the following information can be managed as experiment management.

--Code version --Get the Commit ID of the code used for learning and the version of the library as a log --Data version --There is a function to manage the output intermediate products and models. --Hyperparameters --Automatically get Python argparse parameters as logs --Metrics --General loss, accuracy, confusion matrix, etc. can be obtained --Environment --Get the learning directory location of the machine used for learning as a log

Operation check environment

Please note that the new version of ClearML may not work as described in this article.

Set up a free ClearML host service

This article uses a free, externally hosted ClearML server. The setup method follows the following document. https://allegro.ai/clearml/docs/docs/getting_started/getting_started_clearml_hosted_service.html

It seems that it is possible to set up your own ClearML server on-premises, AWS, GCP, so if you have security requirements, you can set it by following the document procedure below. https://allegro.ai/clearml/docs/rst/deploying_clearml/index.html

--Sign up at the following site to register your account. --It seems that you can register your account with Google account, Bitbucket, or Github.

https://app.community.clear.ml/login?redirect=%2F

image.png

--Enter your name, email, interests, etc. and click "SIGN UP" to register your account.

--Run the following command to install clearml.

pip install clearml

--Execute the following command to start the ClearML setup wizard.

clearml-init

--A message will be displayed asking you to create account credentials, so get the credentials. Click User Account> Profile in the upper right corner of the free host service web screen

image.png

--Click Create new credentials> Copy to clipboard.

image.png

--When you paste the credential that you copied in the terminal, the message that the credential was detected is displayed as shown below.

Detected credentials key="********************" secret="*******"

--Specify the URL of the web server. This time press Enter by default.

WEB Host configured to: [https://app.community.clear.ml]

--Next, specify the URL of the API server. Keep the defaults and press Enter.

API Host configured to: [https://api.community.clear.ml] 

--The following message will be displayed, and the setup is complete.

CLEARML Hosts configuration:
Web App: https://app.community.clear.ml
API: https://api.community.clear.ml
File Store: https://files.community.clear.ml

Verifying credentials ...
Credentials verified!

New configuration stored in /home/<username>/clearml.conf
CLEARML setup completed successfully.

Try Reporting Tutorial

--There is a Tutorial code in ClearML, so clone the repository.

cd ~
git clone https://github.com/allegroai/clearml.git
cd ~/clearml/examples/frameworks/pytorch
pip install -r requirements.txt
pip install pandas scikit-learn

--There is a script for Reporting Tutorial called pytorch_mnist.py, so copy it and rename the file.

cp pytorch_mnist.py pytorch_mnist_tutorial.py

Set the directory where model checkpoints are saved

--The output directory where model checkpoints are output can be set by specifying output_uri in Task.init. --Change the following parts.

task = Task.init(project_name='examples', task_name='pytorch mnist train')

--Checkpoints will be saved in ./clearml if you make the following changes.

model_snapshots_path = './clearml'
if not os.path.exists(model_snapshots_path):
    os.makedirs(model_snapshots_path)

task = Task.init(project_name='examples', 
    task_name='extending automagical ClearML example', 
    output_uri=model_snapshots_path)

--When you run the script, ClearML will create the following directory structure.

+ - <output destination name>
|   +-- <project name>
|       +-- <task name>.<Task Id>
|           +-- models
|           +-- artifacts

Set up Logger

ClearML seems to have explicit reporting of plots, log text, tables, etc. in addition to the automatic logging feature. https://allegro.ai/clearml/docs/docs/tutorials/tutorial_explicit_reporting.html#step-2-logger-class-reporting-methods

--The logger can be obtained from Task as follows.

logger = task.get_logger

or

logger = Logger.current_logger()

--Use the Logger.report_scalar method to log scalar metrics as follows:

def train(args, model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            Logger.current_logger().report_scalar(
                "train", "loss", iteration=(epoch * len(train_loader) + batch_idx), value=loss.item())
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

--In addition, metrics such as histgram and confusion_matrix other than scalar values ​​can be implemented in the following form.

def test(args, model, device, test_loader, epoch):
    save_test_loss = []
    save_correct = []
    preds = []
    targets = []
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

            preds.append(pred.cpu().detach().numpy())
            targets.append(target.cpu().detach().numpy())

            save_test_loss.append(test_loss)
            save_correct.append(correct)

    test_loss /= len(test_loader.dataset)

    Logger.current_logger().report_scalar(
        "test", "loss", iteration=epoch, value=test_loss)
    Logger.current_logger().report_scalar(
        "test", "accuracy", iteration=epoch, value=(correct / len(test_loader.dataset)))
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    
    preds = np.concatenate(preds)
    targets = np.concatenate(targets)
    matrix = confusion_matrix(targets, preds)  # use confusion matrix of scikit-learn
    Logger.current_logger().report_confusion_matrix(title='Confusion matrix example', 
        series='Test loss / correct', matrix=matrix, iteration=1, 
        xaxis='correct', yaxis='pred', yaxis_reversed=True)
    
    Logger.current_logger().report_histogram(title='Histogram example', series='correct',
        iteration=1, values=save_correct, xaxis='Test', yaxis='Correct')

--You can also use Logger.report_text to display a text message according to the level argument.

Logger.current_logger().report_text('The default output destination for model snapshots and artifacts is: {}'.format(model_snapshots_path ), level=logging.DEBUG)

Register the product

ClearML can also be uploaded to ClearML Server by registering the product when the script is executed. If the product changes, ClearML Server will log the change. However, as of December 29, 2020, only Pandas DataFrame is supported. https://allegro.ai/clearml/docs/docs/tutorials/tutorial_explicit_reporting.html#step-3-registering-artifacts

--To register the product, add the following code to the test method as shown below.

# Create the Pandas DataFrame
test_loss_correct = {
        'test lost': save_test_loss,
        'correct': save_correct
}
df = pd.DataFrame(test_loss_correct, columns=['test lost','correct'])

# Register the test loss and correct as a Pandas DataFrame artifact
Task.current_task().register_artifact('Test_Loss_Correct', df, metadata={'metadata string': 'apple', 
    'metadata int': 100, 'metadata dict': {'dict string': 'pear', 'dict int': 200}})

--The registered product can be referenced from the Python code as follows, and can be used for later processing.

# Once the artifact is registered, we can get it and work with it. Here, we sample it.
sample = Task.current_task().get_registered_artifacts()['Test_Loss_Correct'].sample(frac=0.5, 
    replace=True, random_state=1)

Upload the product

You can upload script-generated products to ClearML by using the Task.upload_artifact method. However, unlike the registration above, this upload is not tracked for changes.

--Put the following code in the test method to upload the Prediction result.

# Upload test loss as an artifact. Here, the artifact is numpy array
Task.current_task().upload_artifact('Predictions', artifact_object=np.array(save_test_loss),
    metadata={'metadata string': 'banana', 'metadata integer': 300,
    'metadata dictionary': {'dict string': 'orange', 'dict int': 400}})

Run the Reporting script

--Execute the script with the following command. When executed, logs such as ClearML log and model training loss will be displayed.

python3 pytorch_mnist_tutorial.py

――In this case, the model is saved as follows.

ls clearml/examples/extending\ automagical\ ClearML\ example.13e46b70da274fa085e772ed700df028/models/
mnist_cnn.pt  test.pt  training.pt

Checking learning results on the web screen

--You can check the learning result on the web screen. Since project_name ='examples' is passed in the argument of Task.init, click project of examples on the web screen.

image.png

--Since task_name ='extending automagical ClearML example' in the argument of Task.init, click the one that is displayed as extending automagical ClearML example and supports learning.

image.png

--In EXPERIMENTS EXECUTION, you can check the information of the source code executed during learning. The file name of the executed script and the COMMIT ID are logged so that the experiment can be reproduced.

image.png

--In CONFIGURATION, you can check the log of hyperparameters during learning. I found this useful because I don't need to add my own log code for hyperparameters.

image.png

--ARTIFACTS allows you to check the output model information and product information.

image.png

--RESULTS allows you to see logs related to scalar values ​​and plots. The plot of loss change and accuracy change during learning is as follows.

image.png

--In addition, the plot of the confusion matrix is ​​as follows.

image.png

That's it for the Reporting Tutorial, which logs metrics and products.

Clear ML Architecture

ClearML consists of the following components.

image.png

Quote: https://allegro.ai/clearml/docs/rst/architecture/index.html

The ClearML Server shown above is from a free external host this time. As a reminder, ClearML Server can be used by setting up its own server in an on-premises environment, or by setting up a server on the cloud such as AWS or GCP.

Also, it seems to be an advantage that it can be used by just adding the same few lines of code in both the DATA SCIENTIST ENVIRONMENT environment and GPU MACHINES (on-premise or cloud) on the left of the above figure.

The goodness I felt when using ClearML

--Easy to use by installing pip and adding a few lines of code --Easy to get started with a free external host --By setting up your own server, you can use it both on-premises and in the cloud. --The code of examples is substantial - https://github.com/allegroai/clearml/tree/master/examples --Pytorch, Pytorch-Supports various frameworks such as Lightning, Tensorflow, Keras, AutoKeras - https://allegro.ai/clearml/docs/rst/integrations/index.html --The web UI looks beautiful --It seems that there is a function like MLOps, for example, a function to iteratively tune hyperparameters.

References

Disclaimer

The author pays close attention to the content, functions, etc. of this article, but does not guarantee that the content is accurate or safe. We are not responsible. The author and the organization to which the author belongs (NS Solutions Corporation) shall not be liable for any inconvenience or damage caused to the user by using the contents of this article.

Recommended Posts

Introduction to ClearML-Easy to manage machine learning experiments-
Introduction to machine learning
Super introduction to machine learning
Introduction to machine learning Note writing
Introduction to Machine Learning Library SHOGUN
Introduction to Machine Learning: How Models Work
An introduction to OpenCV for machine learning
An introduction to Python for machine learning
[Python] Easy introduction to machine learning with python (SVM)
[Super Introduction to Machine Learning] Learn Pytorch tutorials
An introduction to machine learning for bot developers
[Super Introduction to Machine Learning] Learn Pytorch tutorials
[For beginners] Introduction to vectorization in machine learning
Introduction to Deep Learning ~ Learning Rules ~
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
Introduction to Deep Learning ~ Backpropagation ~
An introduction to machine learning from a simple perceptron
Introduction to Machine Learning with scikit-learn-From data acquisition to parameter optimization
Machine learning to learn with Nogizaka46 and Keyakizaka46 Part 1 Introduction
Introduction to Deep Learning ~ Function Approximation ~
Introduction to Deep Learning ~ Coding Preparation ~
Introduction to Deep Learning ~ Dropout Edition ~
Introduction to Deep Learning ~ Forward Propagation ~
Introduction to Deep Learning ~ CNN Experiment ~
How to collect machine learning data
Python learning memo for machine learning by Chainer Chapter 8 Introduction to Numpy
Introduction to Python Basics of Machine Learning (Unsupervised Learning / Principal Component Analysis)
Python learning memo for machine learning by Chainer Chapter 10 Introduction to Cupy
Machine learning
[Introduction to StyleGAN] Unique learning of anime with your own machine ♬
Python learning memo for machine learning by Chainer Chapter 9 Introduction to scikit-learn
scikit-learn How to use summary (machine learning)
I installed Python 3.5.1 to study machine learning
How to enjoy Coursera / Machine Learning (Week 10)
Introduction to Machine Learning-Hard Margin SVM Edition-
Introduction to TensorFlow-Machine Learning Terminology / Concept Explanation
[Introduction to machine learning] Until you run the sample code with chainer
Take the free "Introduction to Python for Machine Learning" online until 4/27 application
Python beginners publish web applications using machine learning [Part 2] Introduction to explosive Python !!
Introduction to Scrapy (1)
Introduction to Scrapy (3)
Introduction to Supervisor
Introduction to Tkinter 1: Introduction
Try to forecast power demand by machine learning
[Super Introduction] Machine learning using Python-From environment construction to implementation of simple perceptron-
[Introduction to StyleGAN2] Independent learning with 10 anime faces ♬
Python & Machine Learning Study Memo ②: Introduction of Library
Free version of DataRobot! ?? Introduction to "PyCaret", a library that automates machine learning
Introduction to PyQt
Introduction to Scrapy (2)
Notes on machine learning (updated from time to time)
Machine learning algorithms (from two-class classification to multi-class classification)
[Linux] Introduction to Linux
Introduction to Scrapy (4)
Introduction to Deep Learning ~ Localization and Loss Function ~
[Translation] scikit-learn 0.18 Tutorial Introduction of machine learning by scikit-learn
Introduction to discord.py (2)
[Memo] Machine learning
Machine learning classification
How to Introduce IPython (Python2) to Mac OS X-Preparation for Introduction to Machine Learning Theory-
Python learning notes for machine learning with Chainer Chapters 11 and 12 Introduction to Pandas Matplotlib