Introduction

I used the learning model generated by machine learning as an API server, sent data from the browser via JSON communication, and returned the predicted value. This machine learning API server is implemented by three main programs. First, perform machine learning with XGBoost to generate and save the learning model. Next, implement the learning model API server in Flask. Finally, write the form tag in the HTML file so that the data obtained from the form tag can be JSON communicated with Ajax of javascript. With these three programs, you can create something that sends data from the browser to the API server and returns the predicted value.

Environment required to run this program

Libraries such as Anaconda, XGBoost, joblib, Flask, and flask-cors are installed.

Main process

This API communication by machine learning can be implemented by following the following process.

--Making a learning model with machine learning --Create an API server with flask --API communication from the browser

Create a learning model with machine learning

Here, we will use XGBoost to generate a training model. The training data uses Kaggle's Titanic dataset.

Data set preprocessing

Before doing machine learning with XGBoost, we do some pre-processing. Kaggle's Titanic dataset is divided into train and test data, so they are concatenated with concat to perform preprocessing together. Pre-processing includes processing missing values, replacing categorical data with numbers, and deleting unnecessary features.

Loading libraries and datasets

Load the library required for preprocessing. It also loads the dataset as a pandas dataframe.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

train_df = pd.read_csv('titanic/train.csv')
test_df = pd.read_csv('titanic/test.csv')

train_df.head()

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

test_df.head()

	PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	892	3	Kelly, Mr. James	male	34.5	0	0	330911	7.8292	NaN	Q
1	893	3	Wilkes, Mrs. James (Ellen Needs)	female	47.0	1	0	363272	7.0000	NaN	S
2	894	2	Myles, Mr. Thomas Francis	male	62.0	0	0	240276	9.6875	NaN	Q
3	895	3	Wirz, Mr. Albert	male	27.0	0	0	315154	8.6625	NaN	S
4	896	3	Hirvonen, Mrs. Alexander (Helga E Lindqvist)	female	22.0	1	1	3101298	12.2875	NaN	S

Dataset concatenation

Since I want to perform preprocessing collectively, train data and test data are linked with concat.

all_df = pd.concat((train_df.loc[:, 'Pclass' : 'Embarked'], test_df.loc[:, 'Pclass' : 'Embarked']))

all_df.info()

<class 'pandas.core.frame.DataFrame'>  
Int64Index: 1309 entries, 0 to 417  
Data columns (total 10 columns):  
Pclass      1309 non-null int64  
Name        1309 non-null object  
Sex         1309 non-null object  
Age         1046 non-null float64  
SibSp       1309 non-null int64  
Parch       1309 non-null int64  
Ticket      1309 non-null object  
Fare        1308 non-null float64  
Cabin       295 non-null object  
Embarked    1307 non-null object  
dtypes: float64(2), int64(3), object(5)  
memory usage: 112.5+ KB

Handling of missing values

The Age, Fare, and Embarked values are missing, so I'm filling them with the mean and mode.

all_df['Age'] = all_df['Age'].fillna(all_df['Age'].mean())
all_df['Fare'] = all_df['Fare'].fillna(all_df['Fare'].mean())
all_df['Embarked'] = all_df['Embarked'].fillna(all_df['Embarked'].mode()[0])

all_df.info()

<class 'pandas.core.frame.DataFrame'>  
Int64Index: 1309 entries, 0 to 417  
Data columns (total 10 columns):  
Pclass      1309 non-null int64  
Name        1309 non-null object  
Sex         1309 non-null object  
Age         1309 non-null float64  
SibSp       1309 non-null int64  
Parch       1309 non-null int64  
Ticket      1309 non-null object  
Fare        1309 non-null float64  
Cabin       295 non-null object  
Embarked    1309 non-null object  
dtypes: float64(2), int64(3), object(5)  
memory usage: 112.5+ KB

all_df.head()

	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	3	Braund, Mr. Owen Harris	male	22.0	1	A/5 21171	7.2500	NaN	S
1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	PC 17599	71.2833	C85	C
2	3	Heikkinen, Miss. Laina	female	26.0	0	STON/O2. 3101282	7.9250	NaN	S
3	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	113803	53.1000	C123	S
4	3	Allen, Mr. William Henry	male	35.0	0	373450	8.0500	NaN	S

Replace categorical data with numbers

Sex and Embarked are categorical data, so LabelEncoder replaces them with numbers.

cat_features = ['Sex', 'Embarked']

for col in cat_features:
    lbl = LabelEncoder()
    all_df[col] = lbl.fit_transform(list(all_df[col].values))

all_df.head()

	Pclass	Name	Sex	Age	SibSp	Ticket	Fare	Cabin	Embarked
0	3	Braund, Mr. Owen Harris	1	22.0	1	A/5 21171	7.2500	NaN	2
1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	0	38.0	1	PC 17599	71.2833	C85	0
2	3	Heikkinen, Miss. Laina	0	26.0	0	STON/O2. 3101282	7.9250	NaN	2
3	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	0	35.0	1	113803	53.1000	C123	2
4	3	Allen, Mr. William Henry	1	35.0	0	373450	8.0500	NaN	2

Delete unnecessary features

Name and Ticket are categorical and unique values, so delete them. Also, Cabin has many missing values, so delete it.

all_df = all_df.drop(columns = ['Name', 'Ticket', 'Cabin'])

all_df.head()

	Pclass	Sex	Age	SibSp	Fare	Embarked
0	3	1	22.0	1	7.2500	2
1	1	0	38.0	1	71.2833	0
2	3	0	26.0	0	7.9250	2
3	1	0	35.0	1	53.1000	2
4	3	1	35.0	0	8.0500	2

Separate train and test as before

Since train and test were connected, train and test are separated so that they become training data. You can separate train and test by using the shape value of train_df.

train = all_df[:train_df.shape[0]]
test = all_df[train_df.shape[0]:]
train.info()

<class 'pandas.core.frame.DataFrame'>  
Int64Index: 891 entries, 0 to 890  
Data columns (total 7 columns):  
Pclass      891 non-null int64  
Sex         891 non-null int64  
Age         891 non-null float64  
SibSp       891 non-null int64  
Parch       891 non-null int64  
Fare        891 non-null float64  
Embarked    891 non-null int64  
dtypes: float64(2), int64(5)  
memory usage: 55.7 KB

Generate a learning model with XGBoost

Now that the pre-processing is complete, we will continue machine learning with XGBoost. This time, rather than improving the accuracy of the learning model, the purpose is to create an API server using the learning model, so parameters etc. are learned with almost default values.

y = train_df['Survived']

X_train, X_test, y_train, y_test = train_test_split(train, y, random_state = 0)

import xgboost as xgb

params = {
    "objective": "binary:logistic",
    "eval_metric": "auc",
    "eta": 0.1,
    "max_depth": 6,
    "subsample": 1,
    "colsample_bytree": 1,
    "silent": 1
}

dtrain = xgb.DMatrix(X_train, label = y_train)
dtest = xgb.DMatrix(X_test, label = y_test)

model = xgb.train(params = params,
                 dtrain = dtrain,
                 num_boost_round = 100,
                 early_stopping_rounds = 10,
                 evals = [(dtest, 'test')])

[0]	test-auc:0.886905  
Will train until test-auc hasn't improved in 10 rounds.  
[1]	test-auc:0.89624  
[2]	test-auc:0.893243  
[3]	test-auc:0.889603  
[4]	test-auc:0.892857  
[5]	test-auc:0.886005  
[6]	test-auc:0.890673  
[7]	test-auc:0.894741  
[8]	test-auc:0.889603  
[9]	test-auc:0.888832  
[10]	test-auc:0.889431  
[11]	test-auc:0.89153  
Stopping. Best iteration:  
[1]	test-auc:0.89624

Save the learning model with joblib

There are several ways to save a machine-learned training model, but here we use joblib to save it as a pkl file. Since the saved pkl file is saved in the specified location, copy it to the folder of the API server after this and use it.

from sklearn.externals import joblib

joblib.dump(model, 'titanic_model.pkl')

['titanic_model.pkl']

Create an API server in Flask

Here, we are trying to use the learning model generated by machine learning as an API server. Flask, a Python microservices framework, is used for API server development. The flow of development is to build a virtual environment with conda, test a simple API server, and put the learning model created with XGBoost on it.

Build a virtual environment with conda

The virtual environment uses Anaconda's conda. Create a folder for application development (titanic_api in this case) in the terminal and move it to that folder. Then conda create creates the virtual environment and conda activate activates the virtual environment.

mkdir titanic_api
cd titanic_api

conda create -n titanictenv
conda activate titanictenv

Develop API in Flask

To develop an API server in Flask, let's first create and test a simple API server. Create the following folders and files in the folder you created earlier. If you can write the following code in each file, start the API server, and communicate from curl, the simple API server test is successful.

Generate the required folders and files in the terminal.

Create folders and files so that they have the following hierarchy. If you want to create an empty file, it is convenient to use the touch command.

titanic_api
├── api
│   ├── __init__.py
│   └── views
│       └── user.py
├── titanic_app.py
└── titanic_model.pkl

Write the code in the created file

Write the code in the file you just created as follows. There are three files needed to test a simple API server: api / views / user.py, api / __ init__.py, and titanic_app.py. It is convenient to use vim when writing in the terminal, and Atom when writing in the GUI.

`api/views/user.py`


from flask import Blueprint, request, make_response, jsonify

#Routing settings
user_router = Blueprint('user_router', __name__)

#Specify path and HTTP method
@user_router.route('/users', methods=['GET'])
def get_user_list():

  return make_response(jsonify({
    'users': [
       {
         'id': 1,
         'name': 'John'
       }
     ]
  }))

`api/init.py`


from flask import Flask, make_response, jsonify
from .views.user import user_router

def create_app():

  app = Flask(__name__)
  app.register_blueprint(user_router, url_prefix='/api')

  return app

app = create_app()

`titanic_app.py`


from api import app

if __name__ == '__main__':
  app.run()

Access the API server with curl.

After writing the code above, start the server with python titanic_app.py. If it starts successfully, open another terminal and test the communication with the curl command like the following. If the communication is successful, it will return the following data.

curl http://127.0.0.1:5000/api/users

{
  "users": [
    {
      "id": 1, 
      "name": "John"
    }
  ]
}

Make xgboost model an API server

Rewrite titanic_app.py, which is the startup file of the simple API server, as follows. At this time, the learning model must be saved directly under titanic_api.

`titanic_app.py`


import json

from flask import Flask
from flask import request
from flask import abort

import pandas as pd
from sklearn.externals import joblib
import xgboost as xgb

model = joblib.load("titanic_model.pkl")

app = Flask(__name__)

# Get headers for payload
headers = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']

@app.route('/titanic', methods=['POST'])
def titanic():
    if not request.json:
        abort(400)
    payload = request.json['data']
    values = [float(i) for i in payload.split(',')]
    data1 = pd.DataFrame([values], columns=headers, dtype=float)
    predict = model.predict(xgb.DMatrix(data1))
    return json.dumps(str(predict[0]))


if __name__ == "__main__":
    app.run(debug=True, port=5000)

API communication test with curl

After rewriting the code, start the API server again with python titanic_app.py. After the API server starts, the communication test is done with the curl command as shown below. If the value after the decimal point 1 is returned for the sent JSON data, it is successful. Now you have a learning model generated by machine learning as an API server.

curl http://localhost:5000/titanic -s -X POST -H "Content-Type: application/json" -d '{"data": "3, 1, 22.0, 1, 0, 7.2500, 2"}'

API communication from the browser

Finally, in addition to communication using the curl command from the terminal, we will create something that returns the predicted value when you enter the value from the browser. What we do here is to enable Ajax communication with the API server created earlier, and to enable communication from the browser to the API server with an HTML file.

Allow json to be POSTed to API server via Ajax communication

Here, in order to be able to communicate from HTML with javascript Ajax, it is necessary to add the API server startup file written in Flask as follows. I'll add a library called flask_cors and related code. flask_cors must be pre-installed.

`titanic_app.py`


import json

from flask import Flask
from flask import request
from flask import abort
from flask_cors import CORS #to add

import pandas as pd
from sklearn.externals import joblib
import xgboost as xgb

model = joblib.load("titanic_model.pkl")

app = Flask(__name__)

#add to
@app.after_request
def after_request(response):
  response.headers.add('Access-Control-Allow-Origin', '*')
  response.headers.add('Access-Control-Allow-Headers', 'Content-Type,Authorization')
  response.headers.add('Access-Control-Allow-Methods', 'GET,PUT,POST,DELETE,OPTIONS')
  return response
#↑ Add up to here

# Get headers for payload
headers = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']

@app.route('/titanic', methods=['POST'])
def titanic():
    if not request.json:
        abort(400)
    payload = request.json['data']
    values = [float(i) for i in payload.split(',')]
    data1 = pd.DataFrame([values], columns=headers, dtype=float)
    predict = model.predict(xgb.DMatrix(data1))
    return json.dumps(str(predict[0]))


if __name__ == "__main__":
    app.run(debug=True, port=5000)

Send JSON data by POST from HTML file

In the HTML file, the interface part is the input tag inside the tag, etc., and creates a data input form. The input data is received by javascript, formatted, converted to JSON format, and POSTed by Ajax communication. When the communication is successful, the predicted value from the API server is received and displayed in the area of the textarea tag.

`index.html`


<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Send JSON data by POST from HTML file</title>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/2.1.0/jquery.min.js"></script>

<script type="text/javascript">
    $(function(){
        $("#response").html("Response Values");

        $("#button").click( function(){
            var url = $("#url_post").val();
            var feature1 =
                $("#value1").val() + "," +
                $("#value2").val() + "," +
                $("#value3").val() + "," +
                $("#value4").val() + "," +
                $("#value5").val() + "," +
                $("#value6").val() + "," +
                $("#value7").val();

            var JSONdata = {
                    data: feature1
                };

            alert(JSON.stringify(JSONdata));

            $.ajax({
                type: 'POST',
                url: url,
                data: JSON.stringify(JSONdata),
                contentType: 'application/JSON',
                dataType: 'JSON',
                scriptCharset: 'utf-8',
                success : function(data) {

                    // Success
                    alert("success");
                    alert(JSON.stringify(JSONdata));
                    $("#response").html(JSON.stringify(data));
                },
                error : function(data) {

                    // Error
                    alert("error");
                    alert(JSON.stringify(JSONdata));
                    $("#response").html(JSON.stringify(data));
                }
            });
        })
    })
</script>

</head>
<body>
    <h1>Send JSON data by POST from HTML file</h1>
    <p>URL: <input type="text" id="url_post" name="url" size="100" value="http://localhost:5000/titanic"></p>
    <p>Pclass: <input type="number" id="value1" size="30" value=3></p>
    <p>Sex: <input type="number" id="value2" size="30" value=1></p>
    <p>Age: <input type="number" id="value3" size="30" value="22.0"></p>
    <p>SibSp: <input type="number" id="value4" size="30" value="1"></p>
    <p>Parch: <input type="number" id="value5" size="30" value="0"></p>
    <p>Fare: <input type="number" id="value6" size="30" value="7.2500"></p>
    <p>Embarked: <input type="number" id="value7" size="30" value="2"></p>
    <p><button id="button" type="button">submit</button></p>
    <textarea id="response" cols=120 rows=10 disabled></textarea>
</body>
</html>

reference Build a virtual environment with conda: https://code-graffiti.com/how-to-build-a-virtual-environment-with-conda/ Develop API with flask: https://swallow-incubate.com/archives/blog/20190819 Make xgboost model an API server: https://towardsdatascience.com/publishing-machine-learning-api-with-python-flask-98be46fb2440 Allow json to be POSTed to API server via Ajax communication: https://www.hands-lab.com/tech/entry/3716.html Send JSON data by POST from HTML file: https://qiita.com/kidatti/items/21cc5c5154dbbb1aa27f

[PYTHON] The procedure from generating and saving a learning model by machine learning, making it an API server, and communicating with JSON from a browser

Introduction

Environment required to run this program

Main process

Create a learning model with machine learning

Data set preprocessing

Loading libraries and datasets

Dataset concatenation

Handling of missing values

Replace categorical data with numbers

Delete unnecessary features

Separate train and test as before

Generate a learning model with XGBoost

Save the learning model with joblib

Create an API server in Flask

Build a virtual environment with conda

Develop API in Flask

Generate the required folders and files in the terminal.

Write the code in the created file

api/views/user.py

api/__init__.py

titanic_app.py

Access the API server with curl.

Make xgboost model an API server

titanic_app.py

API communication test with curl

API communication from the browser

Allow json to be POSTed to API server via Ajax communication

titanic_app.py

Send JSON data by POST from HTML file

index.html

`api/views/user.py`

`api/init.py`

`titanic_app.py`

`titanic_app.py`

`titanic_app.py`

`index.html`