[PYTHON] I tried hosting a TensorFlow deep learning model using TensorFlow Serving

Introduction

TensorFlow Serving is a flexible, high-performance machine learning model serving system designed for production environments. With TensorFlow Serving, you can easily host a model created in TensorFlow and expose the API.

See the TensorFlow Serving Documentation (https://www.tensorflow.org/tfx/guide/serving) for more information.

This time, I used TensorFlow Serving on AWS EC2 to host a deep learning model of TensorFlow. At the end of the article, I also try it with Docker.

procedure

EC2 instance creation

Enter "Deep Learning AMI" in the AMI search bar to search for the AMI you want to use. This time, I used "Deep Learning AMI (Ubuntu 18.04) Version 30.0 --ami-0b1b56cbf0f8fcea3". I used "p2.xlarge" as the instance type. The security group is set up so that ssh and http can be connected from the development environment, and all other settings are left as default.

Environment

Log in to EC2 and build the environment.

~$ ls
LICENSE                README     examples  tools
Nvidia_Cloud_EULA.pdf  anaconda3  src       tutorials

The installation procedure is introduced on the Official Site.

First, add the TensorFlow Serving URI to sources.list.d.

~$ echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -

deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2943  100  2943    0     0  18166      0 --:--:-- --:--:-- --:--:-- 18166
OK

Perform the installation.

~$ sudo apt-get update && apt-get install tensorflow-model-server
~$ tensorflow_model_server --version
TensorFlow ModelServer: 1.15.0-rc2+dev.sha.1ab7d59
TensorFlow Library: 1.15.2

This completes the installation.

Model building

From here, we will create a model to deploy. First, prepare a working directory.

~$ mkdir tfexample
~$ cd tfexample

Start jupyter-lab and build the model.

~/tfexample$ jupyter-lab --no-browser --port=8888 --ip=0.0.0.0 --allow-root

...
http://127.0.0.1:8888/?token=b92a7ceefb20c7ab3e475474dbde66a771870de1d8f5bd70
...

Since there is a part where the URL is displayed in the standard output, rewrite the part of 127.0.0.1 to the IP address of the instance and access it.

When jupyer lab starts, select the kernel of conda_tensorflow2_py36 and open the notebook. Rename it to tfmodel.ipynb.

This time I will make a model with Fashionmnist.

tfmodel.ipynb


import sys
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import os
import tempfile

print('TensorFlow version: {}'.format(tf.__version__))
# TensorFlow version: 2.1.0

tfmodel.ipynb


fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# scale the values to 0.0 to 1.0
train_images = train_images / 255.0
test_images = test_images / 255.0

# reshape for feeding into the model
train_images = train_images.reshape(train_images.shape[0], 28, 28, 1)
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1)

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

print('\ntrain_images.shape: {}, of {}'.format(train_images.shape, train_images.dtype))
print('test_images.shape: {}, of {}'.format(test_images.shape, test_images.dtype))
# train_images.shape: (60000, 28, 28, 1), of float64
# test_images.shape: (10000, 28, 28, 1), of float64

tfmodel.ipynb


model = keras.Sequential([
  keras.layers.Conv2D(input_shape=(28,28,1), filters=8, kernel_size=3, 
                      strides=2, activation='relu', name='Conv1'),
  keras.layers.Flatten(),
  keras.layers.Dense(10, activation=tf.nn.softmax, name='Softmax')
])
model.summary()

testing = False
epochs = 5

model.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=epochs)

test_loss, test_acc = model.evaluate(test_images, test_labels)
print('\nTest accuracy: {}'.format(test_acc))

# Model: "sequential"
# _________________________________________________________________
# Layer (type)                 Output Shape              Param #   
# =================================================================
# Conv1 (Conv2D)               (None, 13, 13, 8)         80        
# _________________________________________________________________
# flatten (Flatten)            (None, 1352)              0         
# _________________________________________________________________
# Softmax (Dense)              (None, 10)                13530     
# =================================================================
# Total params: 13,610
# Trainable params: 13,610
# Non-trainable params: 0
# _________________________________________________________________
# Train on 60000 samples
# Epoch 1/5
# 60000/60000 [==============================] - 46s 770us/sample - loss: 0.5398 - accuracy: 0.8182
# Epoch 2/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3849 - accuracy: 0.8643
# Epoch 3/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3513 - accuracy: 0.8751
# Epoch 4/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3329 - accuracy: 0.8820
# Epoch 5/5
# 60000/60000 [==============================] - 5s 76us/sample - loss: 0.3204 - accuracy: 0.8847
# 10000/10000 [==============================] - 1s 78us/sample - loss: 0.3475 - accuracy: 0.8779

# Test accuracy: 0.8779000043869019

tfmodel.ipynb


MODEL_DIR = tempfile.gettempdir()
version = 1
export_path = os.path.join(MODEL_DIR, str(version))
print('export_path = {}\n'.format(export_path))

tf.keras.models.save_model(
    model,
    export_path,
    overwrite=True,
    include_optimizer=True,
    save_format=None,
    signatures=None,
    options=None
)

print('\nSaved model:')
!ls -l {export_path}

# export_path = /tmp/1

# WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
# Instructions for updating:
# If using Keras pass *_constraint arguments to layers.
# INFO:tensorflow:Assets written to: /tmp/1/assets

# Saved model:
# total 84
# drwxr-xr-x 2 ubuntu ubuntu  4096 Jul 17 10:49 assets
# -rw-rw-r-- 1 ubuntu ubuntu 74970 Jul 17 10:49 saved_model.pb
# drwxr-xr-x 2 ubuntu ubuntu  4096 Jul 17 10:49 variables

The save destination of the model was created by the tempfile module. This time the model is stored in / tmp / 1.

Model host

Open another terminal, log in to your instance, and start the server.

~$ export MODEL_DIR=/tmp
~$ tensorflow_model_server \
  --rest_api_port=8501 \
  --model_name=fashion_model \
  --model_base_path="${MODEL_DIR}"

It seems that the structure should be such that there is a directory indicating the version under model_base_path, and the model is saved under it.

model_base_path/
 ├ 1/
 │ ├ assets/
 │ ├ variables/
 │ └ saved_model.pb
 ├ 2/
│ ├ (Omitted below)

I will throw a request and check it. Go back to your notebook and make a request.

tfmodel.ipynb


def show(idx, title):
    plt.figure()
    plt.imshow(test_images[idx].reshape(28,28), cmap = "gray")
    plt.axis('off')
    plt.title('\n\n{}'.format(title), fontdict={'size': 16})

tfmodel.ipynb


import json

data = json.dumps({"signature_name": "serving_default", "instances": test_images[0:3].tolist()})
print('Data: {} ... {}'.format(data[:50], data[len(data)-52:]))
# Data: {"signature_name": "serving_default", "instances": ...  [0.0], [0.0], [0.0], [0.0], [0.0], [0.0], [0.0]]]]}

tfmodel.ipynb


import requests

headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/fashion_model:predict', data=data, headers=headers)
predictions = json.loads(json_response.text)['predictions']

show(0, 'The model thought this was a {} (class {}), and it was actually a {} (class {})'.format(
  class_names[np.argmax(predictions[0])], np.argmax(predictions[0]), class_names[test_labels[0]], test_labels[0]))
スクリーンショット 2020-07-17 22.02.44.png

Send data in json format by POST. We will set the data for the ʻinstances` key, but we will predict it in batch, so we need to be careful about the shape.

By the way, the contents of predictions are as follows.

predictions[0]

# [7.71279588e-07,
#  4.52205953e-08,
#  5.55571035e-07,
#  1.59779923e-08,
#  2.27421737e-07,
#  0.00600787532,
#  8.29056205e-07,
#  0.0466650613,
#  0.00145569211,
#  0.945868969]

The probabilities for each class are stored in the list. This is the same output as the following code.

model.predict(test_images[0:3]).tolist()[0]

Host with docker

~$ docker --version
Docker version 19.03.11, build 42e35e61f3
~$ docker pull tensorflow/serving
~$ docker run -d -t --rm -p 8501:8501 -v "/tmp:/models/fashion_model" -e MODEL_NAME=fashion_model tensorflow/serving

The entry points are as follows. The RESTful API port is 8501, the gRPC port is 8500, and the model_base_path is$ {MODEL_BASE_PATH} / $ {MODEL_NAME}.

tensorflow_model_server --port=8500 --rest_api_port=8501 \
  --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME}

The entry point file is stored in /usr/bin/tf_serving_entrypoint.sh and actually contains the following code:

#!/bin/bash 

tensorflow_model_server --port=8500 --rest_api_port=8501 --model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} "$@"

Therefore, when using docker, all you have to do is mount the model storage path of the host on docker's model_base_path.

Other notes

--Supports gRPC interface. --Model path, maximum batch size, number of threads, timeout can be specified in the config file. ――It seems that you can customize the input and output format of the model called Signature. ](Https://qiita.com/t_shimmura/items/1ebd2414310f827ed608)

Recommended Posts

I tried hosting a TensorFlow deep learning model using TensorFlow Serving
I tried hosting Pytorch's deep learning model using TorchServe on Amazon SageMaker
I tried hosting a Pytorch sample model using TorchServe
PyTorch Learning Note 2 (I tried using a pre-trained model)
I tried deep learning using Theano
I tried to divide with a deep learning language model
I tried playing a ○ ✕ game using TensorFlow
I tried deep learning
I tried to make a ○ ✕ game using TensorFlow
I tried using the trained model VGG16 of the deep learning library Keras
I made a Dir en gray face classifier using TensorFlow --- ⑦ Learning model
I tried using magenta / TensorFlow
I tried refactoring the CNN model of TensorFlow using TF-Slim
I tried using Tensorboard, a visualization tool for machine learning
I made a VGG16 model using TensorFlow (on the way)
I tried reinforcement learning using PyBrain
Creating a learning model using MNIST
I tried to implement anomaly detection using a hidden Markov model
I tried drawing a line using turtle
Image recognition model using deep learning in 2016
Stock price forecast using deep learning (TensorFlow)
I tried to classify text using TensorFlow
I tried using pipenv, so a memo
[ML-Aents] I tried machine learning using Unity and Python TensorFlow (v0.11β compatible)
I made a Dir en gray face classifier using TensorFlow --- ⑧ Learning execution
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried the common story of predicting the Nikkei 225 using deep learning (backtest)
A story about simple machine learning using TensorFlow
An amateur tried Deep Learning using Caffe (Introduction)
An amateur tried Deep Learning using Caffe (Practice)
Stock Price Forecast Using Deep Learning (TensorFlow) -Part 2-
I tried using Pythonect, a dataflow programming language.
I tried reading a CSV file using Python
An amateur tried Deep Learning using Caffe (Overview)
I tried using a database (sqlite3) with kivy
I installed Chainer, a framework for deep learning
Create a REST API using the model learned in Lobe and TensorFlow Serving.
I tried to extract a line art from an image with Deep Learning
I tried running an object detection tutorial using the latest deep learning algorithm
I tried to implement various methods for machine learning (prediction model) using scikit-learn.
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using anytree
I tried using Summpy
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried using openpyxl
I tried using Ipython
I tried using PyCaret
I tried using cron
I tried using ngrok
I tried using face_recognition
I tried using Jupyter
I tried using PyCaret
I tried using Heapq
I tried using doctest