[PYTHON] The first step in speeding up inference with TensorFlow 2.X & TensorRT

Introduction

Using the model of Keras Applications, I briefly write the conversion and inference to the model for TensorRT. Installations such as TensorRT are out of scope as they utilize the NVIDIA GPU Cloud container.

Preparation

Pre-conditions

--Docker installed --And GPU container is available

Execution environment startup

Start the execution environment with the following. Since Jupyter is already installed, you can try the following code on Jupyter.

docker run -it --rm --gpus all -p 8888:8888 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/tensorflow:20.02-tf2-py3 bash

# (Optional)
jupyter lab

--nvcr.io/nvidia/tensorflow is a container registered in NVIDIA GPU Cloud. --In this container, TensorFlow 2.1, TensorRT 7.0, and other Jupyter are already installed. -(Optional) shm-size, ulimit: Since the main memory is used a lot when converting the model, it is also set as a countermeasure for memory allocation failure. -(Optional) 8888 port is for Jupyter.

Reference: Growth setting

Memory constraints for error countermeasures that often occur when using Keras.

Memory limit example


import tensorflow as tf

for dev in tf.config.experimental.list_physical_devices('GPU'):
    tf.config.experimental.set_memory_growth(dev, True)

conversion

First, output the target model. The point is ** save format **. Specify tf forsave_formatand save it with Tensorflow Saved Model. (Not necessary because it is the default in TensorFlow 2.X)

Model storage in Keras


from tensorflow.keras.applications.vgg16 import VGG16

model = VGG16(weights='imagenet')
model.save('./vgg16', save_format='tf')

Then transform the model.

TensorRT conversion single precision floating point (Float32) version


from tensorflow.python.compiler.tensorrt import trt_convert as trt

converter = trt.TrtGraphConverterV2(input_saved_model_dir='./vgg16',
                                    conversion_params=trt.DEFAULT_TRT_CONVERSION_PARAMS)
converter.convert()
converter.save('./vgg16-tensorrt')

Half precision floating point version

If you want to convert with Float16, change the parameters of converter.

converter = trt.TrtGraphConverterV2(input_saved_model_dir='./vgg16',
                                    conversion_params=trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(precision_mode=trt.TrtPrecisionMode.FP16))

Integer version

Calibration is required for 8-bit integers. I think you should use the data used for learning.

In this VGG16 setting, it is passed as a Shape of $ (N, 224, 224, 3) $.

import numpy as np

def calibration_input_fn():  #Calibration data generation function
    yield np.random.uniform(size=(5, 224, 224, 3)).astype(np.float32),  #At the end,Don't forget

converter = trt.TrtGraphConverterV2(input_saved_model_dir='./vgg16',
                                    conversion_params=trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(precision_mode=trt.TrtPrecisionMode.INT8, use_calibration=True))
converter.convert(calibration_input_fn=calibration_input_fn)
converter.save('./vgg16-tensorrt')

inference

Load the transformed model and retrieve the objects used for inference. Then, inference runs by calling the object as a function.

model = tf.saved_model.load('./vgg16-tensorrt', tags=[tf.saved_model.SERVING])
infer = model.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]

#Dummy input
x = np.random.uniform(size=(3, 224, 224, 3)).astype(np.float32)
#inference
y = infer(tf.convert_to_tensor(x))['predictions']

Reference: The input Shape can be taken below.

infer.inputs[0].shape
>>> TensorShape([None, 224, 224, 3])

Execution result comparison

Finally, a rough comparison of execution results (TensorRT above). Even if it is suitable like this, the execution speed is improved. Since the memory usage will also decrease, it seems that we can do various things such as executing multiple models, and I think that the degree of benefit will change greatly depending on the GPU architecture of the execution environment.

Execution environment: GeForce GTX 1080 Ti TensorRT比較.png

Recommended Posts

The first step in speeding up inference with TensorFlow 2.X & TensorRT
The first step in Python Matplotlib
The first step in the constraint satisfaction problem in Python
Participated in the first ISUCON with the team "Lunch" # ISUCON10 Qualifying
See the power of speeding up with NumPy and SciPy
The first step to creating a serverless application with Zappa
Use "% tensorflow_version 2.x" when using TPU with Tensorflow 2.1.0 in Colaboratory
[Python] The first step to making a game with Pyxel
Python--The first step in Pygal
Ensure reproducibility with tf.keras in Tensorflow 2.3
Inference & result display with Tensorflow + matplotlib
Web scraping with Python First step
[GUI with Python] PyQt5-The first step-