[PYTHON] Prepare a machine learning project format and run it on SageMaker

Hello, this is Ninomiya of LIFULL CO., LTD.

In a machine learning project, after successful analysis and model accuracy evaluation, it must be successfully used in existing systems. At that time, it was difficult for our team to divide the roles of the engineers in charge of implementation.

Aiming for the state that "If a data scientist creates it in this format, it can be easily incorporated!", Wrap Amazon SageMaker and general purpose to some extent. We have prepared a development format and tools that can be used for various purposes.

What is Amazon SageMaker?

Amazon SageMaker provides all developers and data scientists with the means to build, train, and deploy machine learning models. Amazon SageMaker is a fully managed service that covers the entire machine learning workflow. Label and prepare your data, select algorithms, train your model, tune and optimize for deployment, make predictions, and execute. You can put your model into production with less effort and cost.

As the main functions, if you prepare a Docker image that meets specific specifications, you can use the following functions.

Read the official docs and @ taniyam's (same team as me) article for specifications on preparing your own Docker image with SageMaker.

Machine learning project format

First, we asked data scientists to prepare the following directory structure.

.
├── README.md
├── Dockerfile
├── config.yml
├── pyproject.toml (poetry config file)
├── script
│   └── __init__.py
└── tests
    └── __init__.py

The main process is written in script / __ init__.py, and the script is as follows. This is the library prepared by simple_sagemaker_manager.

import pandas as pd
from typing import List
from pathlib import Path
from sklearn import tree
from simple_sagemaker_manager.image_utils import AbstractModel


def train(training_path: Path) -> AbstractModel:
    """Do learning.

    Args:
        training_path (Path):Directory with csv files
    
    Returns:
        Model:Model object that inherits AbstractModel
        
    """
    train_data = pd.concat([pd.read_csv(fname, header=None) for fname in training_path.iterdir()])
    train_y = train_data.iloc[:, 0]
    train_X = train_data.iloc[:, 1:] 

    # Now use scikit-learn's decision tree classifier to train the model.
    clf = tree.DecisionTreeClassifier(max_leaf_nodes=None)
    clf = clf.fit(train_X, train_y)
    return Model(clf)


class Model(AbstractModel):
    """The method of serialization is described in AbstractModel.
    """

    def predict(self, matrix: List[List[float]]) -> List[List[str]]:
        """Inference processing.

        Args:
            matrix (List[List[float]]):Table data

        Returns:
            list:Inference result

        """
        #The result returned here will be the response of the inference API.
        return [[x] for x in self.model.predict(pd.DataFrame(matrix))]

ʻAbstractModel has the following definition, and the result of calling the savemethod (the result serialized by pickle) is saved, and this is used as a model when executing the training batch (used by the SageMaker system). It will be saved in S3. Also, the serialization method can be switched by overridingsave and load`.

import pickle
from abc import ABC, abstractmethod
from dataclasses import dataclass


@dataclass
class AbstractModel(ABC):
    model: object

    @classmethod
    def load(cls, model_path):
        #Save the model during the training batch
        with open(model_path / 'model.pkl', 'rb') as f:
            model = pickle.load(f)
        return cls(model)

    def save(self, model_path):
        #Load the model during inference
        with open(model_path / 'model.pkl', 'wb') as f:
            pickle.dump(self.model, f)

    @abstractmethod
    def predict(self, json):
        pass

I try to operate with cli by referring to projects such as Python's poetry. The development flow of Docker image of SageMaker is as follows.

Also, I made it possible to edit the Dockerfile because some machine learning libraries can only be installed with Anaconda, so I received a request that "I want you to replace it with other than the official image of Python3".

Run management of SageMaker

It's hard to run boto3 directly, so I've also prepared a wrapped library. There are a lot of operations, but in many projects we have three things we want to do: "learn the model" and "run an OR batch conversion job that sets up an inference API", so we have an interface that makes it easy to understand.

from simple_sagemaker_manager.executor import SageMakerExecutor
from simple_sagemaker_manager.executor.classes import TrainInstance, TrainSpotInstance, Image


client = SageMakerExecutor()

#When learning with a normal instance
model = client.execute_batch_training(
    instance=TrainInstance(
        instance_type='ml.m4.xlarge',
        instance_count=1,
        volume_size_in_gb=10,
        max_run=100
    ),
    image=Image(
        name="decision-trees-sample",
        uri="xxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/decision-trees-sample:latest"
    ),
    input_path="s3://xxxxxxxxxx/DEMO-scikit-byo-iris",
    output_path="s3://xxxxxxxxxx/output",
    role="arn:aws:iam::xxxxxxxxxx"
)


#When learning with Spot Instances
model = client.execute_batch_training(
    instance=TrainSpotInstance(
        instance_type='ml.m4.xlarge',
        instance_count=1,
        volume_size_in_gb=10,
        max_run=100,
        max_wait=1000
    ),
    image=Image(
        name="decision-trees-sample",
        uri="xxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/decision-trees-sample:latest"
    ),
    input_path="s3://xxxxxxxxxx/DEMO-scikit-byo-iris",
    output_path="s3://xxxxxxxxxxx/output",
    role="arn:aws:iam::xxxxxxxxxxxxx"
)

The inference API is made as follows. The points I devised are as follows.

from simple_sagemaker_manager.executor import SageMakerExecutor
from simple_sagemaker_manager.executor.classes import EndpointInstance, Model

client = SageMakerExecutor()


#When deploying a specific model
#If you specify multiple models in models, a Pipeline model will be created and used.
client.deploy_endpoint(
    instance=EndpointInstance(
        instance_type='ml.m4.xlarge',
        initial_count=1,
        initial_variant_wright=1
    ),
    models=[
        Model(
            name='decision-trees-sample-191028-111309-538454',
            model_arn='arn:aws:sagemaker:ap-northeast-1:xxxxxxxxxx',
            image_uri='xxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/decision-trees-sample:latest',
            model_data_url='s3://xxxxxxxxxx/model.tar.gz'
        )
    ],
    name='sample-endpoint',
    role="arn:aws:iam::xxxxxxxxxx"
)

# execute_batch_You can also pass the result of training
model = client.execute_batch_training(
    #Arguments omitted
) 

client.deploy_endpoint(
    instance=EndpointInstance(
        instance_type='ml.m4.xlarge',
        initial_count=1,
        initial_variant_wright=1
    ),
    models=[model],
    name='sample-endpoint',
    role="arn:aws:iam::xxxxxxxxxx"
)

Names other than endpoints (learning batch jobs, etc.) are automatically added with the current time string to avoid duplication. However, only the endpoint has the behavior of "update if there is one with the same name" to improve convenience.

Also, although omitted, the batch conversion job method is implemented in the same way.

Future issues

I implemented it like this, and now I am actually using it in the implementation of some projects. However, there are some issues that have not been implemented yet, and there are still other issues within the team.

Also, when you actually use it within the team, there are some parts that are not easy to use, so I will try to solve those problems and make the machine learning project more efficient.

Recommended Posts

Prepare a machine learning project format and run it on SageMaker
Until you create a machine learning environment with Python on Windows 7 and run it
Let's write a Python program and run it
Run a machine learning pipeline with Cloud Dataflow (Python)
Build a machine learning Python environment on Mac OS
Install Docker on Arch Linux and run it remotely
Build a machine learning environment natively on Windows 10 (x64)
Set up python and machine learning libraries on Ubuntu
How to interactively draw a machine learning pipeline with scikit-learn and save it in HTML
On a Linux environment, create a NuGet package from a C # project and load it into another project
Build a python machine learning study environment on macOS sierra
Build a machine learning environment on mac (pyenv, deeplearning, opencv)
Build a machine learning scikit-learn environment with VirtualBox and Ubuntu
Build a machine learning environment
Machine learning and mathematical optimization
How to install OpenCV on Cloud9 and run it in Python
I made a POST script to create an issue on Github and register it in the Project
Significance of machine learning and mini-batch learning
Install and run dropbox on Ubuntu 20.04
Classification and regression in machine learning
Run Matplotlib on a Docker container
Inversely analyze a machine learning model
Run headless-chrome on a Debian-based image
Run TensorFlow2 on a VPS server
Get started with machine learning with SageMaker
Basic machine learning procedure: ② Prepare data
A story about trying to run JavaScripthon on Windows and giving up.
Deploy a Python app on Google App Engine and integrate it with GitHub
Install ROS and ROS module for Roomba on RaspberryPi3 and try to run it
[Personal memo] Get data on the Web and make it a DataFrame