[PYTHON] The procedure from generating and saving a learning model by machine learning, making it an API server, and communicating with JSON from a browser

Introduction

I used the learning model generated by machine learning as an API server, sent data from the browser via JSON communication, and returned the predicted value. This machine learning API server is implemented by three main programs. First, perform machine learning with XGBoost to generate and save the learning model. Next, implement the learning model API server in Flask. Finally, write the form tag in the HTML file so that the data obtained from the form tag can be JSON communicated with Ajax of javascript. With these three programs, you can create something that sends data from the browser to the API server and returns the predicted value.

Environment required to run this program

Libraries such as Anaconda, XGBoost, joblib, Flask, and flask-cors are installed.

Main process

This API communication by machine learning can be implemented by following the following process.

--Making a learning model with machine learning --Create an API server with flask --API communication from the browser

Create a learning model with machine learning

Here, we will use XGBoost to generate a training model. The training data uses Kaggle's Titanic dataset.

Data set preprocessing

Before doing machine learning with XGBoost, we do some pre-processing. Kaggle's Titanic dataset is divided into train and test data, so they are concatenated with concat to perform preprocessing together. Pre-processing includes processing missing values, replacing categorical data with numbers, and deleting unnecessary features.

Loading libraries and datasets

Load the library required for preprocessing. It also loads the dataset as a pandas dataframe.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
train_df = pd.read_csv('titanic/train.csv')
test_df = pd.read_csv('titanic/test.csv')
train_df.head()
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
test_df.head()
PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 892 3 Kelly, Mr. James male 34.5 0 0 330911 7.8292 NaN Q
1 893 3 Wilkes, Mrs. James (Ellen Needs) female 47.0 1 0 363272 7.0000 NaN S
2 894 2 Myles, Mr. Thomas Francis male 62.0 0 0 240276 9.6875 NaN Q
3 895 3 Wirz, Mr. Albert male 27.0 0 0 315154 8.6625 NaN S
4 896 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female 22.0 1 1 3101298 12.2875 NaN S

Dataset concatenation

Since I want to perform preprocessing collectively, train data and test data are linked with concat.

all_df = pd.concat((train_df.loc[:, 'Pclass' : 'Embarked'], test_df.loc[:, 'Pclass' : 'Embarked']))
all_df.info()
<class 'pandas.core.frame.DataFrame'>  
Int64Index: 1309 entries, 0 to 417  
Data columns (total 10 columns):  
Pclass      1309 non-null int64  
Name        1309 non-null object  
Sex         1309 non-null object  
Age         1046 non-null float64  
SibSp       1309 non-null int64  
Parch       1309 non-null int64  
Ticket      1309 non-null object  
Fare        1308 non-null float64  
Cabin       295 non-null object  
Embarked    1307 non-null object  
dtypes: float64(2), int64(3), object(5)  
memory usage: 112.5+ KB  

Handling of missing values

The Age, Fare, and Embarked values are missing, so I'm filling them with the mean and mode.

all_df['Age'] = all_df['Age'].fillna(all_df['Age'].mean())
all_df['Fare'] = all_df['Fare'].fillna(all_df['Fare'].mean())
all_df['Embarked'] = all_df['Embarked'].fillna(all_df['Embarked'].mode()[0])
all_df.info()
<class 'pandas.core.frame.DataFrame'>  
Int64Index: 1309 entries, 0 to 417  
Data columns (total 10 columns):  
Pclass      1309 non-null int64  
Name        1309 non-null object  
Sex         1309 non-null object  
Age         1309 non-null float64  
SibSp       1309 non-null int64  
Parch       1309 non-null int64  
Ticket      1309 non-null object  
Fare        1309 non-null float64  
Cabin       295 non-null object  
Embarked    1309 non-null object  
dtypes: float64(2), int64(3), object(5)  
memory usage: 112.5+ KB
all_df.head()
Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

Replace categorical data with numbers

Sex and Embarked are categorical data, so LabelEncoder replaces them with numbers.

cat_features = ['Sex', 'Embarked']

for col in cat_features:
    lbl = LabelEncoder()
    all_df[col] = lbl.fit_transform(list(all_df[col].values))
all_df.head()
Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 3 Braund, Mr. Owen Harris 1 22.0 1 0 A/5 21171 7.2500 NaN 2
1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... 0 38.0 1 0 PC 17599 71.2833 C85 0
2 3 Heikkinen, Miss. Laina 0 26.0 0 0 STON/O2. 3101282 7.9250 NaN 2
3 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) 0 35.0 1 0 113803 53.1000 C123 2
4 3 Allen, Mr. William Henry 1 35.0 0 0 373450 8.0500 NaN 2

Delete unnecessary features

Name and Ticket are categorical and unique values, so delete them. Also, Cabin has many missing values, so delete it.

all_df = all_df.drop(columns = ['Name', 'Ticket', 'Cabin'])
all_df.head()
Pclass Sex Age SibSp Parch Fare Embarked
0 3 1 22.0 1 0 7.2500 2
1 1 0 38.0 1 0 71.2833 0
2 3 0 26.0 0 0 7.9250 2
3 1 0 35.0 1 0 53.1000 2
4 3 1 35.0 0 0 8.0500 2

Separate train and test as before

Since train and test were connected, train and test are separated so that they become training data. You can separate train and test by using the shape value of train_df.

train = all_df[:train_df.shape[0]]
test = all_df[train_df.shape[0]:]
train.info()
<class 'pandas.core.frame.DataFrame'>  
Int64Index: 891 entries, 0 to 890  
Data columns (total 7 columns):  
Pclass      891 non-null int64  
Sex         891 non-null int64  
Age         891 non-null float64  
SibSp       891 non-null int64  
Parch       891 non-null int64  
Fare        891 non-null float64  
Embarked    891 non-null int64  
dtypes: float64(2), int64(5)  
memory usage: 55.7 KB  

Generate a learning model with XGBoost

Now that the pre-processing is complete, we will continue machine learning with XGBoost. This time, rather than improving the accuracy of the learning model, the purpose is to create an API server using the learning model, so parameters etc. are learned with almost default values.

y = train_df['Survived']
X_train, X_test, y_train, y_test = train_test_split(train, y, random_state = 0)
import xgboost as xgb
params = {
    "objective": "binary:logistic",
    "eval_metric": "auc",
    "eta": 0.1,
    "max_depth": 6,
    "subsample": 1,
    "colsample_bytree": 1,
    "silent": 1
}

dtrain = xgb.DMatrix(X_train, label = y_train)
dtest = xgb.DMatrix(X_test, label = y_test)

model = xgb.train(params = params,
                 dtrain = dtrain,
                 num_boost_round = 100,
                 early_stopping_rounds = 10,
                 evals = [(dtest, 'test')])
[0]	test-auc:0.886905  
Will train until test-auc hasn't improved in 10 rounds.  
[1]	test-auc:0.89624  
[2]	test-auc:0.893243  
[3]	test-auc:0.889603  
[4]	test-auc:0.892857  
[5]	test-auc:0.886005  
[6]	test-auc:0.890673  
[7]	test-auc:0.894741  
[8]	test-auc:0.889603  
[9]	test-auc:0.888832  
[10]	test-auc:0.889431  
[11]	test-auc:0.89153  
Stopping. Best iteration:  
[1]	test-auc:0.89624  

Save the learning model with joblib

There are several ways to save a machine-learned training model, but here we use joblib to save it as a pkl file. Since the saved pkl file is saved in the specified location, copy it to the folder of the API server after this and use it.

from sklearn.externals import joblib
joblib.dump(model, 'titanic_model.pkl')

['titanic_model.pkl']

Create an API server in Flask

Here, we are trying to use the learning model generated by machine learning as an API server. Flask, a Python microservices framework, is used for API server development. The flow of development is to build a virtual environment with conda, test a simple API server, and put the learning model created with XGBoost on it.

Build a virtual environment with conda

The virtual environment uses Anaconda's conda. Create a folder for application development (titanic_api in this case) in the terminal and move it to that folder. Then conda create creates the virtual environment and conda activate activates the virtual environment.

mkdir titanic_api
cd titanic_api
conda create -n titanictenv
conda activate titanictenv

Develop API in Flask

To develop an API server in Flask, let's first create and test a simple API server. Create the following folders and files in the folder you created earlier. If you can write the following code in each file, start the API server, and communicate from curl, the simple API server test is successful.

Generate the required folders and files in the terminal.

Create folders and files so that they have the following hierarchy. If you want to create an empty file, it is convenient to use the touch command.

titanic_api
├── api
│   ├── __init__.py
│   └── views
│       └── user.py
├── titanic_app.py
└── titanic_model.pkl

Write the code in the created file

Write the code in the file you just created as follows. There are three files needed to test a simple API server: api / views / user.py, api / __ init__.py, and titanic_app.py. It is convenient to use vim when writing in the terminal, and Atom when writing in the GUI.

api/views/user.py


from flask import Blueprint, request, make_response, jsonify

#Routing settings
user_router = Blueprint('user_router', __name__)

#Specify path and HTTP method
@user_router.route('/users', methods=['GET'])
def get_user_list():

  return make_response(jsonify({
    'users': [
       {
         'id': 1,
         'name': 'John'
       }
     ]
  }))

api/__init__.py


from flask import Flask, make_response, jsonify
from .views.user import user_router

def create_app():

  app = Flask(__name__)
  app.register_blueprint(user_router, url_prefix='/api')

  return app

app = create_app()

titanic_app.py


from api import app

if __name__ == '__main__':
  app.run()

Access the API server with curl.

After writing the code above, start the server with python titanic_app.py. If it starts successfully, open another terminal and test the communication with the curl command like the following. If the communication is successful, it will return the following data.

curl http://127.0.0.1:5000/api/users
{
  "users": [
    {
      "id": 1, 
      "name": "John"
    }
  ]
}

Make xgboost model an API server

Rewrite titanic_app.py, which is the startup file of the simple API server, as follows. At this time, the learning model must be saved directly under titanic_api.

titanic_app.py


import json

from flask import Flask
from flask import request
from flask import abort

import pandas as pd
from sklearn.externals import joblib
import xgboost as xgb

model = joblib.load("titanic_model.pkl")

app = Flask(__name__)

# Get headers for payload
headers = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']

@app.route('/titanic', methods=['POST'])
def titanic():
    if not request.json:
        abort(400)
    payload = request.json['data']
    values = [float(i) for i in payload.split(',')]
    data1 = pd.DataFrame([values], columns=headers, dtype=float)
    predict = model.predict(xgb.DMatrix(data1))
    return json.dumps(str(predict[0]))


if __name__ == "__main__":
    app.run(debug=True, port=5000)

API communication test with curl

After rewriting the code, start the API server again with python titanic_app.py. After the API server starts, the communication test is done with the curl command as shown below. If the value after the decimal point 1 is returned for the sent JSON data, it is successful. Now you have a learning model generated by machine learning as an API server.

curl http://localhost:5000/titanic -s -X POST -H "Content-Type: application/json" -d '{"data": "3, 1, 22.0, 1, 0, 7.2500, 2"}'

API communication from the browser

Finally, in addition to communication using the curl command from the terminal, we will create something that returns the predicted value when you enter the value from the browser. What we do here is to enable Ajax communication with the API server created earlier, and to enable communication from the browser to the API server with an HTML file.

Allow json to be POSTed to API server via Ajax communication

Here, in order to be able to communicate from HTML with javascript Ajax, it is necessary to add the API server startup file written in Flask as follows. I'll add a library called flask_cors and related code. flask_cors must be pre-installed.

titanic_app.py


import json

from flask import Flask
from flask import request
from flask import abort
from flask_cors import CORS #to add

import pandas as pd
from sklearn.externals import joblib
import xgboost as xgb

model = joblib.load("titanic_model.pkl")

app = Flask(__name__)

#add to
@app.after_request
def after_request(response):
  response.headers.add('Access-Control-Allow-Origin', '*')
  response.headers.add('Access-Control-Allow-Headers', 'Content-Type,Authorization')
  response.headers.add('Access-Control-Allow-Methods', 'GET,PUT,POST,DELETE,OPTIONS')
  return response
#↑ Add up to here

# Get headers for payload
headers = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']

@app.route('/titanic', methods=['POST'])
def titanic():
    if not request.json:
        abort(400)
    payload = request.json['data']
    values = [float(i) for i in payload.split(',')]
    data1 = pd.DataFrame([values], columns=headers, dtype=float)
    predict = model.predict(xgb.DMatrix(data1))
    return json.dumps(str(predict[0]))


if __name__ == "__main__":
    app.run(debug=True, port=5000)

Send JSON data by POST from HTML file

In the HTML file, the interface part is the input tag inside the tag, etc., and creates a data input form. The input data is received by javascript, formatted, converted to JSON format, and POSTed by Ajax communication. When the communication is successful, the predicted value from the API server is received and displayed in the area of the textarea tag.

index.html


<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Send JSON data by POST from HTML file</title>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js"></script>

<script type="text/javascript">
    $(function(){
        $("#response").html("Response Values");

        $("#button").click( function(){
            var url = $("#url_post").val();
            var feature1 =
                $("#value1").val() + "," +
                $("#value2").val() + "," +
                $("#value3").val() + "," +
                $("#value4").val() + "," +
                $("#value5").val() + "," +
                $("#value6").val() + "," +
                $("#value7").val();

            var JSONdata = {
                    data: feature1
                };

            alert(JSON.stringify(JSONdata));

            $.ajax({
                type: 'POST',
                url: url,
                data: JSON.stringify(JSONdata),
                contentType: 'application/JSON',
                dataType: 'JSON',
                scriptCharset: 'utf-8',
                success : function(data) {

                    // Success
                    alert("success");
                    alert(JSON.stringify(JSONdata));
                    $("#response").html(JSON.stringify(data));
                },
                error : function(data) {

                    // Error
                    alert("error");
                    alert(JSON.stringify(JSONdata));
                    $("#response").html(JSON.stringify(data));
                }
            });
        })
    })
</script>

</head>
<body>
    <h1>Send JSON data by POST from HTML file</h1>
    <p>URL: <input type="text" id="url_post" name="url" size="100" value="http://localhost:5000/titanic"></p>
    <p>Pclass: <input type="number" id="value1" size="30" value=3></p>
    <p>Sex: <input type="number" id="value2" size="30" value=1></p>
    <p>Age: <input type="number" id="value3" size="30" value="22.0"></p>
    <p>SibSp: <input type="number" id="value4" size="30" value="1"></p>
    <p>Parch: <input type="number" id="value5" size="30" value="0"></p>
    <p>Fare: <input type="number" id="value6" size="30" value="7.2500"></p>
    <p>Embarked: <input type="number" id="value7" size="30" value="2"></p>
    <p><button id="button" type="button">submit</button></p>
    <textarea id="response" cols=120 rows=10 disabled></textarea>
</body>
</html>

reference Build a virtual environment with conda: https://code-graffiti.com/how-to-build-a-virtual-environment-with-conda/ Develop API with flask: https://swallow-incubate.com/archives/blog/20190819 Make xgboost model an API server: https://towardsdatascience.com/publishing-machine-learning-api-with-python-flask-98be46fb2440 Allow json to be POSTed to API server via Ajax communication: https://www.hands-lab.com/tech/entry/3716.html Send JSON data by POST from HTML file: https://qiita.com/kidatti/items/21cc5c5154dbbb1aa27f

Recommended Posts

The procedure from generating and saving a learning model by machine learning, making it an API server, and communicating with JSON from a browser
An example of a mechanism that returns a prediction by HTTP from the result of machine learning
I made a server with Python socket and ssl and tried to access it from a browser
I made an API with Docker that returns the predicted value of the machine learning model
(Deep learning) Images were collected from the Flickr API and discriminated by transfer learning with VGG16.
I tried calling the prediction API of the machine learning model from WordPress
Make a thermometer with Raspberry Pi and make it visible on the browser Part 3
Implementation of a model that predicts the exchange rate (dollar-yen rate) by machine learning
[Machine learning] Create a machine learning model by performing transfer learning with your own data set
From installing Flask on CentOS to making it a service with Nginx and uWSGI
Until you create a machine learning environment with Python on Windows 7 and run it
A script that pings the registered server and sends an email with Gmail a certain number of times when it fails
POST the image with json and receive it with flask
Create a machine learning environment from scratch with Winsows 10
An introduction to machine learning from a simple perceptron
A concrete method of predicting horse racing by machine learning and simulating the recovery rate
How to interactively draw a machine learning pipeline with scikit-learn and save it in HTML
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 5: Training and saving a model using T-SQL
Precautions when inputting from CSV with Python and outputting to json to make it an exe
A story about developing a machine learning model while managing experiments and models with Azure Machine Learning + MLflow