[PYTHON] I tried to create a model with the sample of Amazon SageMaker Autopilot

What is SageMaker Autopilot?

It automatically preprocesses, selects algorithms, and optimizes hyperparameters provided by AWS. It's AutoML that runs on SageMaker. This time, I have a sample of Autopilot, so I would like to actually move it. → Autopilot sample

Try moving the sample

First, create the necessary libraries and Sessions.

jupyter


import sagemaker
import boto3
from sagemaker import get_execution_role

region = boto3.Session().region_name

session = sagemaker.Session()
bucket = session.default_bucket()
prefix = 'sagemaker/autopilot-dm'

role = get_execution_role()

sm = boto3.Session().client(service_name='sagemaker',region_name=region)

Next, download the dataset. The data we are using this time is Bank Marketing Data Set. It's the data of the bank's direct marketing, and it seems to be the data of whether to execute the time deposit.

jupyter


!wget -N https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip
!unzip -o bank-additional.zip

local_data_path = './bank-additional/bank-additional-full.csv'

Next, divide the downloaded data into test data and train data, and delete the "y" column, which is the objective variable.

jupyter


import pandas as pd

data = pd.read_csv(local_data_path, sep=';')
train_data = data.sample(frac=0.8,random_state=200)

test_data = data.drop(train_data.index)

test_data_no_target = test_data.drop(columns=['y'])

After that, upload each divided data to S3.

jupyter


train_file = 'train_data.csv';
train_data.to_csv(train_file, index=False, header=True)
train_data_s3_path = session.upload_data(path=train_file, key_prefix=prefix + "/train")
print('Train data uploaded to: ' + train_data_s3_path)

test_file = 'test_data.csv';
test_data_no_target.to_csv(test_file, index=False, header=False)
test_data_s3_path = session.upload_data(path=test_file, key_prefix=prefix + "/test")
print('Test data uploaded to: ' + test_data_s3_path)

Next, we will set up Autopilot. In this sample, the settings are as follows, but it seems that various other settings can be made. The settings are described in this document, so please check it. please try.

jupyter



input_data_config = [{
      'DataSource': {
        'S3DataSource': {
          'S3DataType': 'S3Prefix',
          'S3Uri': 's3://{}/{}/train'.format(bucket,prefix)
        }
      },
      'TargetAttributeName': 'y'
    }
  ]

output_data_config = {
    'S3OutputPath': 's3://{}/{}/output'.format(bucket,prefix)
  }

Now that the settings are complete, let's actually move it.

jupyter


from time import gmtime, strftime, sleep
timestamp_suffix = strftime('%d-%H-%M-%S', gmtime())

auto_ml_job_name = 'automl-banking-' + timestamp_suffix
print('AutoMLJobName: ' + auto_ml_job_name)

sm.create_auto_ml_job(AutoMLJobName=auto_ml_job_name,
                      InputDataConfig=input_data_config,
                      OutputDataConfig=output_data_config,
                      RoleArn=role)

By writing the following, the content that is being executed every 30 seconds will be output.

jupyter


print ('JobStatus - Secondary Status')
print('------------------------------')


describe_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)
print (describe_response['AutoMLJobStatus'] + " - " + describe_response['AutoMLJobSecondaryStatus'])
job_run_status = describe_response['AutoMLJobStatus']
    
while job_run_status not in ('Failed', 'Completed', 'Stopped'):
    describe_response = sm.describe_auto_ml_job(AutoMLJobName=auto_ml_job_name)
    job_run_status = describe_response['AutoMLJobStatus']
    
    print (describe_response['AutoMLJobStatus'] + " - " + describe_response['AutoMLJobSecondaryStatus'])
    sleep(30)

Model creation is complete when the output is "Completed". I think it took a little over two hours.

Summary

This time, I tried to automatically create a model using SageMaker Autopilot. I realized once again that AutoML is amazing because you can create a model just by preparing the data. I hope this will reduce the difficulty of creating a model and make ML widely used.

Recommended Posts

I tried to create a model with the sample of Amazon SageMaker Autopilot
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
I tried to create a list of prime numbers with python
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried to predict the number of domestically infected people of the new corona with a mathematical model
I tried to create a table only with Django
I tried to find the entropy of the image with python
I tried to find the average of the sequence with TensorFlow
I made a function to check the model of DCGAN
I tried to divide with a deep learning language model
[Python] I tried to automatically create a daily report of YWT with Outlook mail
I tried to create a linebot (implementation)
I tried to create an environment of MkDocs on Amazon Linux
I tried to create a linebot (preparation)
Matching app I tried to take statistics of strong people & tried to create a machine learning model
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
I tried to create a Python script to get the value of a cell in Microsoft Excel
I tried to create Bulls and Cows with a shell program
I tried to expand the size of the logical volume with LVM
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I tried to improve the efficiency of daily work with Python
I tried to make a mechanism of exclusive control with Go
I tried to unlock the entrance 2 lock sesame with a single push of the AWS IoT button
I tried transcribing the news of the example business integration to Amazon Transcribe
I tried to create a program to convert hexadecimal numbers to decimal numbers with python
I tried to create a plug-in with HULFT IoT Edge Streaming [Development] (2/3)
I tried to get the authentication code of Qiita API with Python.
I tried to automatically extract the movements of PES players with software
I tried to create a plug-in with HULFT IoT Edge Streaming [Execution] (3/3)
I tried to streamline the standard role of new employees with Python
[Outlook] I tried to automatically create a daily report email with Python
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to visualize the model with the low-code machine learning library "PyCaret"
I tried to get the movie information of TMDb API with Python
I tried to create a sample to access Salesforce using Python and Bottle
I want to easily create a Noise Model
I tried to save the data with discord
I tried to touch the API of ebay
I tried to start Jupyter with Amazon lightsail
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to make a thumbnail image of the best avoidance flag-chan! With RGB values ​​[Histogram] [Visualization]
I tried to make a simple mail sending application with tkinter of Python
When I tried to create a virtual environment with Python, it didn't work
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to easily create a fully automatic attendance system with Selenium + Python
[Azure] I tried to create a Linux virtual machine in Azure of Microsoft Learn
I tried to create a button for Slack with Raspberry Pi + Tact Switch
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to learn the sin function with chainer
I tried to extract features with SIFT of OpenCV
I tried to summarize the basic form of GPLVM
How to create a submenu with the [Blender] plugin
I tried to touch the CSV file with Python
I tried to draw a route map with Python
I tried to automatically generate a password with Python3
I tried to visualize the spacha information of VTuber
I tried to erase the negative part of Meros
I tried to solve the problem with Python Vol.1
I tried to classify the voices of voice actors
I tried running the sample code of the Ansible module