Introduction

__OpenVINO is a deep learning inference engine library provided by Intel. __ By using OpenVINO, you can load the model trained by Tensorflow or Pytorch and execute inference at high speed. You can get several times less execution time than executing inference in Tensorflow.

As expected from Intel, the OpenVINO documentation is officially and carefully compiled, https://docs.openvinotoolkit.org/latest/index.html

Also, the official forum is active, so https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/bd-p/distribution-openvino-toolkit

There is no particular problem in using OpenVINO, but I think that if there is a tutorial __ that runs from learning __Tensorflow (Keras) to inference of C ++/OpenVINO all at once, the hurdle to try it will be lowered. I am writing this article.

This tutorial has been uploaded to the Github repository in an executable format. https://github.com/tomoyaeibu/openvino2020.3.1-mnist-sample

I would be grateful if you could get __`` LGTM``` or a report I tried, so please do! __

-In the development environment, learning and inference were executed by Tensorflow, but in the production environment, it is necessary to execute inference at high speed.
- C++I want to incorporate the function to execute inference into my system.
-I tried to perform inference using OpenVINO, but for some reason it didn't work, so I'd like to see a successful example.
-I don't read the manual, so I would like to modify the working sample code and proceed with the development.

<!-Edit title and anchor name->

What is OpenVINO?
Learning and Reasoning Tutorial
Building an OpenVINO environment
Convert model trained in Python/Tensorflow (keras)
Perform inference at high speed with C ++/OpenVINO
Comparison of execution results of Python/Tensorflow and C ++/OpenVINO
References

<!-Each chapter->

What is OpenVINO?

The OpenVINO Toolkit is a collection of tools for fast deep learning inference.

There are two points to realize high-speed inference.

-Optimize the trained model and convert it to an inference-specific format.
-Infer using forward processing optimized for Intel devices.

So the OpenVINO Toolkit includes something like this:

-Optimize trained models in various formats/Optimizer to convert
-Inference engine APIs for fast inference (Python and C)++）
-A handy tool for analyzing the details of the transformed model
-A handy tool for analyzing inference bottlenecks

OpenVINO and TensorRT

Inference engine options include Intel's OpenVINO as well as Nvidia's TensorRT. Both can be used in a similar fashion, but OpenVINO can only be used with Intel devices and Tensor RT can only be used with Nvidia devices.

I think that you will choose according to the environment and requirements of the development target.

-Intel devices: Core, Xeon, Myriad
-Nvidia devices: Geforce, Tesla, Jetson

It is difficult to answer the request "Give me the fastest one because it's good". Generally, GPU is suitable for neural network calculation processing, so it is tempting to recommend TensorRT, but with OpenVINO, depending on the device, it is considerably executed with INT8 accuracy after calibration (≠ optimization). It seems that speedup is possible. https://docs.openvinotoolkit.org/latest/openvino_docs_IE_DG_Int8Inference.html

Considering the cost, it depends on the execution environment and the optimization method.

Learning and reasoning tutorials

I will start the tutorial immediately. I created a tutorial in the following environment.

	The environment used when creating the tutorial
Date	2020/12/28
PC	Leonovo ThinkPad X280
OS	Windows 10
Python	3.6.5
Tensorflow	1.15 (CPU)
numpy	1.19.3 (1.19.4 will cause an error, so downgrade)
h5py	2.10 (3.1 will cause an error, so downgrade)
C++Build environment	VisualStudio 2019
OpenVINO	2020.3 (Stable)
OpenVINO installation path	C:\Program Files (x86)\IntelSWTools\openvino

OpenVINO seems to support Tensorflow 2.x, but it still seems to be a beta version. I also tried it with Tensorflow 2.x, but it didn't work, so I recommend using Tensorflow 1.15 quietly. https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_TensorFlow.html#Convert_From_TF2X

Build an OpenVINO environment

First of all, we will build the environment such as installing OpenVINO. Another article about building the OpenVINO environment is easy to understand, so I will omit it.

OpenVINO (2019.R1) Windows 10 version installation and sample testing (@Hanapage! I used it as a reference!)

--This environment construction is necessary for "building a C ++ application using the inference engine library", but it is not necessary for executing a pre-built application. If you attach the dll to your application, you can perform inference in your application. In other words, there is no need to build an environment in the deployment environment. --This tutorial uses the latest Stable version, OpenVINO 2020.3.

Convert the model learned by Python/Tensorflow (keras).

Create a classification model using Python/Tensorflow (keras) and perform optimization/transformation using OpenVINO's model optimizer.

This tutorial uses a simple mnist (handwriting) classification as an example. I uploaded the full Python script to the Github repository. https://github.com/tomoyaeibu/openvino2020.3.1-mnist-sample/blob/main/training.py

#######################################################################################
#%% Load data.
#
#

#Download mnist data.
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# [0,1]Normalize to fit in.
x_train, x_test = x_train / 255.0, x_test / 255.0
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test  = x_test.reshape(x_test.shape[0], 28, 28, 1)
print(x_train.shape, x_test.shape); print()

#######################################################################################
#%% Setting model.
#
#

#Define a model for classification.
model = Sequential([
    Conv2D(50, (5, 5), activation='relu', input_shape=(28, 28, 1)),
    Conv2D(50, (3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dropout(0.2),
    Dense(100, activation='relu'),
    Dropout(0.4),
    Dense(10, activation='softmax')
])

#Compile the model.
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

#######################################################################################
#%% Training.
#
#

#Set to save only the best model during training.
modelCheckpoint = ModelCheckpoint(filepath = 'model.h5',
                                  monitor = 'val_loss',
                                  verbose = 1,
                                  save_best_only = True,)
#Set Early Stopping.
EarlyStopping = EarlyStopping(monitor='val_loss', patience=2, verbose=1, mode='auto')

#Perform learning.
hist = model.fit(x_train, y_train, validation_split=0.1, epochs=5, verbose=1,
                  callbacks=[modelCheckpoint, EarlyStopping])

When the training is completed, the trained model will be output as model.h5. Since this trained model is in HDF5 format, we first perform Freeze Graph processing to convert it to Protocol Buffer format. This removes the space needed for training and outputs the model to a file in a state where it can only be used for inference.

#######################################################################################
#%% Utility.
#
#

def backup_raw(imarray, filepath): # float64
    backup = imarray.tobytes()

    with open(filepath, "wb") as fout:  
        fout.write(backup)

    return backup

def convert_kerasmodel_to_frozen_pb(kerasmodelpath, pbmodelname):
    output_pb = os.path.splitext(os.path.basename(pbmodelname))[0] + ".pb"
    output_pb_path = Path(output_pb)

    #%% Reset session
    tf.keras.backend.clear_session()
    tf.keras.backend.set_learning_phase(0)

    model = tf.keras.models.load_model(kerasmodelpath, compile=False)
    session = tf.compat.v1.keras.backend.get_session()

    input_names = sorted([layer.op.name for layer in model.inputs])
    output_names = sorted([layer.op.name for layer in model.outputs])

    graph = session.graph

    #%% Freeze Graph
    with graph.as_default():
        # Convert variables to constants
        graph_frozen = tf.compat.v1.graph_util.convert_variables_to_constants(session, graph.as_graph_def(), output_names)
        # Remove training nodes
        graph_frozen = tf.compat.v1.graph_util.remove_training_nodes(graph_frozen)

        with open(output_pb, 'wb') as output_file :
            output_file.write(graph_frozen.SerializeToString())

        print ('Inputs = [%s], Outputs = [%s]' % (input_names, output_names))

#######################################################################################
#%% Evaluation
#
#

#Load the best model.
best_model = load_model('model.h5')

#Save the input data so that it can be compared with the OpenVINO result, and output the inference result.
backup_raw(x_test[5], 'x_test[5].raw')
np.set_printoptions(suppress=True)
print(x_test[5].shape)

start = time.perf_counter()
score_result = best_model.predict(x_test)[5]
end = time.perf_counter()
print("Time taken for inference : [{0}] ms".format(end-start))
print(score_result) 

#Frozon for conversion with OpenVINO optimizer_Save the model in pb format.
convert_kerasmodel_to_frozen_pb("model.h5", "model.pb")

(28, 28, 1)
Time taken for inference : [2.8801844] ms
[0.00000004 0.9999782  0.00000129 0.00000004 0.00001199 0.00000002
 0.00000262 0.00000498 0.00000056 0.00000028]

WARNING:tensorflow:From .\training_tf1.15.py:43: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.

WARNING:tensorflow:From .\training_tf1.15.py:53: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
WARNING:tensorflow:From C:\Projects\openvino2020.3.1-mnist-sample\venv-tf-1.15\lib\site-packages\tensorflow_core\python\framework\graph_util_impl.py:277: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
WARNING:tensorflow:From .\training_tf1.15.py:55: remove_training_nodes (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.remove_training_nodes`
Inputs = [['conv2d_input']], Outputs = [['dense_1/Softmax']]

When the conversion is complete, the trained model will be saved as model.pb in Protocol Buffer format.

Then, in the next chapter, I wrote one of the input data ndarray as a binary file to make sure that the inference result of Python/Tensorflow and the inference result of C ++/OpenVINO match. As explained in the next chapter, in the C ++/OpenVINO code, the input data is specified by FP32 (32bit Float) binary data, so here the data [Numbers, Width, Height, Channels] It is convenient to make it binary data so that you do not lose the index order.

Since it is binary data, it can be displayed as image data by using ImageJ etc. (Note that python's Float is FP64.)

Then use the OpenVINO Model Optimizer to convert model.pb to the OpenVINO IR format.

C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\model_optimizer>python mo.py --input_model {model.Directory where pb resides}\model.pb --output_dir {model.Directory where pb resides} --input_shape [1,28,28,1]
Model Optimizer arguments:
Common parameters:
        - Path to the Input Model:      {model.Directory where pb resides}\model.pb
        - Path for generated IR:        {model.Directory where pb resides}
        - IR output name:       model
        - Log level:    ERROR
        - Batch:        Not specified, inherited from the model
        - Input layers:         Not specified, inherited from the model
        - Output layers:        Not specified, inherited from the model
        - Input shapes:         [1,28,28]
        - Mean values:  Not specified
        - Scale values:         Not specified
        - Scale factor:         Not specified
        - Precision of IR:      FP32
        - Enable fusing:        True
        - Enable grouped convolutions fusing:   True
        - Move mean values to preprocess section:       False
        - Reverse input channels:       False
TensorFlow specific parameters:
        - Input model in text protobuf format:  False
        - Path to model dump for TensorBoard:   None
        - List of shared libraries with TensorFlow custom layers implementation:        None
        - Update the configuration file with input/output node names:   None
        - Use configuration file used to generate the model with Object Detection API:  None
        - Use the config file:  None
Model Optimizer version:

[ SUCCESS ] Generated IR version 10 model.
[ SUCCESS ] XML file: {model.Directory where pb resides}\model.xml
[ SUCCESS ] BIN file: {model.Directory where pb resides}\model.bin
[ SUCCESS ] Total execution time: 5.50 seconds.
It's been a while, check for a new version of Intel(R) Distribution of OpenVINO(TM) toolkit here https://software.intel.com/en-us/openvino-toolkit/choose-download?cid=&source=upgrade&content=2020_3_LTS or on the GitHub*

When the conversion is complete, the trained model will be output in OpenVINO IR format as model.xml and model.bin. In the next chapter of the tutorial, you will load these two model files in the OpenVINO inference engine library and actually perform inference.

__ [TIPS] If input_shape is not specified, an error will occur. __

It seems that most of the parameters of mo.py can be omitted, but if --input_shape is not specified, ERROR occurred. If you can read --input_shape from the model file before conversion, you can omit it, but basically it seems better to specify it.

This time it's 28x28x1, so specify -input_shape [1,28,28,1] in the format (Numbers, Width, Height, Channels).

↓ Error details ↓

[ ERROR ]  Shape [-1 28 28] is not fully defined for output 0 of "reshape_input". Use --input_shape with positive integers to override model input shapes.
[ ERROR ]  Cannot infer shapes or values for node "reshape_input".
[ ERROR ]  Not all output shapes were inferred or fully defined for node "reshape_input".
 For more information please refer to Model Optimizer FAQ (https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html), question #40.
[ ERROR ]
[ ERROR ]  It can happen due to bug in custom shape infer function <function Parameter.infer at 0x000000C17FA6DD08>.
[ ERROR ]  Or because the node inputs have incorrect values/shapes.
[ ERROR ]  Or because input shapes are incorrect (embedded to the model or passed via --input_shape).
[ ERROR ]  Run Model Optimizer with --log_level=DEBUG for more information.
[ ERROR ]  Exception occurred during running replacer "REPLACEMENT_ID" (<class 'extensions.middle.PartialInfer.PartialInfer'>): Stopped shape/value propagation at "reshape_input" node.
 For more information please refer to Model Optimizer FAQ (https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_Model_Optimizer_FAQ.html), question #38.

__ [TIPS] Watch out for the corresponding layers. __

In this tutorial, we transformed the trained model by the following route.

HDF5 -> [Freeze Graph] -> Prorocol Buffer -> [OpenVINO model optimizer] -> OpenVINO IR

In order to convert by this route, the network layer used by the model must be the layer supported by Freeze and OpenVINO Model Optimizer. When I try to optimize a layer that is not supported by the OpenVINO Model Optimizer, I get an error like" I don't know such a layer so I can't interpret it. Please define the layer with the custom layer plugin function. " To do.

The layers supported by the OpenVINO Model Optimizer are summarized in the OpenVINO documentation. https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Supported_Frameworks_Layers.html#tensorflow_supported_operations

__ [TIPS] There are other conversion routes. __

Other model file formats supported by the OpenVINO Model Optimizer include ONNX and Caffe. The network layers they support are slightly different, so if you can't convert on a particular route, try another route. https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Supported_Frameworks_Layers.html#onnx_supported_operators

The route when using ONNX is as follows.

HDF5 -> [keras2onnx] -> ONNX -> [OpenVINO model optimizer] -> OpenVINO IR

In the project I'm in charge of, when using Keras.layers.Lambda for Metric Learning, an error occurs with the keras2onnx route, and the conversion is successful with the Freeze Graph route. I did.

x = layers.Dense(512, activation='relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Lambda(lambda xx: metric_alpha*(xx)/K.sqrt(K.sum(xx**2)))(x) #metric learning

x = layers.Dense(1, activation='sigmoid')(x)

By the way, in the case of TensorRT instead of OpenVINO, I was able to execute inference by the keras2Onnx root even if I used Keras.layers.Lambda. Each module seems to be compatible, so let's try various routes.

Perform inference at high speed with C ++/OpenVINO

Load the model.xml and model.bin output in the previous chapter and perform inference.

Inference is executed according to the following flow.

1. Load inference engine instance (Creating an inference engine instance)
2. Read IR Generated by ModelOptimizer (Loading trained model)
3. Configure input &output (input layer and output layer settings)
4.Loading model to the device
5.Create infer request
6.Prepare input (load input data)
7.Do inference
8.Process output (get inference result)

I uploaded the C ++ code to the Github repository. https://github.com/tomoyaeibu/openvino2020.3.1-mnist-sample/blob/main/main.cpp

__1. Load inference engine instance __ __2. Read IR Generated by ModelOptimizer __


const std::string input_model = "model.xml";

// --------------------------- 1. Load inference engine instance -------------------------------------
Core ie;
// -----------------------------------------------------------------------------------------------------

// --------------------------- 2. Read IR Generated by ModelOptimizer (.xml and .bin files) ------------
CNNNetwork network = ie.ReadNetwork(input_model, input_model.substr(0, input_model.size() - 4) + WEIGHTS_EXT);
network.setBatchSize(1);
// -----------------------------------------------------------------------------------------------------

Specify the trained model file and load the model. You need to specify both model.xml and model.bin.

__3. Configure input & output __3. Configure input & output __

// --------------------------- 3. Configure input & output ---------------------------------------------
// --------------------------- Prepare input blobs -----------------------------------------------------
InputInfo::Ptr input_info = network.getInputsInfo().begin()->second;
std::string input_name = network.getInputsInfo().begin()->first;

input_info->setLayout(Layout::NCHW);
input_info->setPrecision(Precision::FP32);

// --------------------------- Prepare output blobs ----------------------------------------------------
DataPtr output_info = network.getOutputsInfo().begin()->second;
std::string output_name = network.getOutputsInfo().begin()->first;

output_info->setPrecision(Precision::FP32);
// -----------------------------------------------------------------------------------------------------

Set the input layer and output layer. This is the most important of all the settings. If you make a mistake in this setting, the inference result will be an unexpected value.

Specify the correct order for the data index in setLayout (Layout :: NCHW);. Let's check this order because it depends on the library used when creating the model and its settings. (* Honestly, Tensorflow defaults to NWHC, so I think NWHC is correct, but It works correctly with NCHW for some reason. Under investigation for understanding. )

Set the numerical precision of the input data and output data in setPrecision (Precision :: FP32);. In this tutorial, the input data is converted to floating point with [0,1], so the numerical precision of the input data is FP32. (If it is not normalized to [0,1], it is [0,255], so it is U8.) The numerical accuracy of the output data can be specified in the model optimizer. The numerical precision of the output data is FP32 because it is converted with the default settings.

__4. Loading model to the device __

const std::string device_name = "CPU";

// --------------------------- 4. Loading model to the device ------------------------------------------
ExecutableNetwork executable_network = ie.LoadNetwork(network, device_name);
// -----------------------------------------------------------------------------------------------------

For device_name, specify the string corresponding to the device used for inference. If you can use the CPU's built-in graphics, specify " GPU.x " (x is the Device Number).

Other devices that can be specified are described in the OpenVINO documentation. https://docs.openvinotoolkit.org/latest/openvino_docs_IE_DG_InferenceEngine_QueryAPI.html#query_api_in_the_core_class

__5. Create infer request __

// --------------------------- 5. Create infer request -------------------------------------------------
InferRequest infer_request = executable_network.CreateInferRequest();
// -----------------------------------------------------------------------------------------------------

Up to 4 is the inference setting, and here is the code to actually execute the inference. Since infer_request is thread-safe, it is possible to process multiple infer_requests in parallel by generating infer_requests in each of the multithreaded threads.

__6. Prepare input __

inline void readRawFileFp64(const std::string& fileName, float* buffer, int inH, int inW, int inC)
{
	std::vector<double> temp(inH * inW * inC);

	std::ifstream file(fileName, std::ios::in | std::ios::binary | std::ios::ate);
	file.seekg(0, std::ios::end);
	int size = file.tellg();
	file.seekg(0, std::ios::beg);
	file.read((char*)(temp.data()), size);
	file.close();

	for (int itr = 0; itr < inH * inW * inC; itr++)
	{
		buffer[itr] = (float)temp[itr];
	}
}

void rawToBlob(const std::string rawFilePath, InferenceEngine::Blob::Ptr& blob)
{
	InferenceEngine::SizeVector blobSize = blob->getTensorDesc().getDims();
	const size_t width = blobSize[3];
	const size_t height = blobSize[2];
	const size_t channels = blobSize[1];
	float* blob_data = blob->buffer().as<float*>();

	std::vector<float> input(width * height * channels);
	readRawFileFp64(rawFilePath, input.data(), width, height, channels);

	for (int index = 0; index < width * height * channels; index++)
	{
		blob_data[index] = input[index];
	}
}

// --------------------------- 6. Prepare input --------------------------------------------------------
Blob::Ptr imgBlob = infer_request.GetBlob(input_name);
rawToBlob(input_image_path, imgBlob);
// -----------------------------------------------------------------------------------------------------

Copy the input data to a pointer called Blob. It is good to perform data pre-processing such as data standardization when copying.

In this tutorial, the binary data of ndarray output by the learning script is copied to Blob with the same value and index order. Since the binary data of ndarray is FP64 (Double) and the format of the input data is FP32 (Float), we are casting from double to float.

__7. Do inference __

// --------------------------- 7. Do inference --------------------------------------------------------
#auto t_infer_start = std::chrono::high_resolution_clock::now();

infer_request.Infer();

#auto t_infer_end = std::chrono::high_resolution_clock::now();
#float infer_ms = std::chrono::duration<float, std::milli>(t_infer_end - t_infer_start).count();
#printf("Time taken for inference : %lf ms\n", infer_ms);
// -----------------------------------------------------------------------------------------------------

Actually perform the inference. This tutorial runs synchronously, but you can wait asynchronously for the inference process to complete.

__8. Process output (get inference result) __

int ProcessOutput(InferRequest& async_infer_request, const std::string& output_name)
{
	int result = 0;
	float buf = 0;

	try
	{
		const float* oneHotVector = (async_infer_request.GetBlob(output_name))->buffer().as<float*>();

		for (int i = 0; i < 10; i++)
		{
			printf("%d : %lf \n", i, oneHotVector[i]);
		}

		for (int i = 0; i < 10; i++)
		{
			if (oneHotVector[i] > buf)
			{
				buf = oneHotVector[i];
				result = i;
			}
		}
	}
	catch (const std::exception & ex)
	{
		OutputDebugStringA(ex.what());
		result = -1;
	}

	return result;
}

// --------------------------- 8. Process output ------------------------------------------------------
int result = ProcessOutput(infer_request, output_name);
printf("result = %d\n", result);
// -----------------------------------------------------------------------------------------------------

Finally, get the result of inference. The output results are also stored in pointers called Blobs, so they are retrieved one by one.

In this tutorial, since the output data is FP32, the output result is fetched as a pointer to Float.

__Visual Studio project settings __

In order to build and run this C ++ code, you need to configure the project correctly.

	The environment used when creating the tutorial
C/C++ ->Additional include directories	・ C:\Program Files (x86)\IntelSWTools\openvino\inference_engine\samples\cpp\common ・ C:\Program Files (x86)\IntelSWTools\openvino\opencv\include ・ C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\inference_engine\include
linker->input->Additional dependent files	[For Release builds] ・ C:\Program Files (x86)\IntelSWTools\openvino\opencv\lib\opencv_core430.lib ・ C:\Program Files (x86)\IntelSWTools\openvino\opencv\lib\opencv_imgcodecs430.lib ・ C:\Program Files (x86)\IntelSWTools\openvino\opencv\lib\opencv_imgproc430.lib ・ C:\Program Files(x86)\IntelSWTools\openvino\deployment_tools\inference_engine\lib\intel64\Release\inference_engine_legacy.lib ・ C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\inference_engine\lib\intel64\Release\inference_engine.lib ・ C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\inference_engine\lib\intel64\Release\inference_engine_c_api.lib ・ C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\inference_engine\lib\intel64\Release\inference_engine_nn_builder.lib [For Debug builds] ・ C:\Program Files (x86)\IntelSWTools\openvino\opencv\lib\opencv_core430d.lib ・ C:\Program Files (x86)\IntelSWTools\openvino\opencv\lib\opencv_imgcodecs430d.lib ・ C:\Program Files (x86)\IntelSWTools\openvino\opencv\lib\opencv_imgproc430d.lib ・ C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\inference_engine\lib\intel64\Debug\inference_engine_legacy.lib ・ C:\Program Files (x86)\Intel\openvino\deployment_tools\inference_engine\lib\intel64\Debug\inference_engine.lib ・ C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\inference_engine\lib\intel64\Debug\inference_engine_c_api.lib ・ C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\inference_engine\lib\intel64\Debug\inference_engine_nn_builder.lib
Build event->Post-build event	Description below

`Post-build event[For Release builds]`


@rem For inference engine.
xcopy "C:\Program Files (x86)\IntelSWTools\openvino\inference_engine\bin\intel64\$(Configuration)" $(SolutionDir)$(Platform)\$(Configuration)\ /D /S /R /Y /I /K
xcopy "C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\inference_engine\external\tbb\bin\tbb.dll" $(SolutionDir)$(Platform)\$(Configuration)\ /D /S /R /Y /I /K
xcopy "C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\ngraph\lib\ngraph.dll" $(SolutionDir)$(Platform)\$(Configuration)\ /D /S /R /Y /I /K

@rem For opencv function.
xcopy "C:\Program Files (x86)\IntelSWTools\openvino\opencv\bin\opencv_core430.dll" $(SolutionDir)$(Platform)\$(Configuration)\ /D /S /R /Y /I /K
xcopy "C:\Program Files (x86)\IntelSWTools\openvino\opencv\bin\opencv_imgcodecs430.dll" $(SolutionDir)$(Platform)\$(Configuration)\ /D /S /R /Y /I /K
xcopy "C:\Program Files (x86)\IntelSWTools\openvino\opencv\bin\opencv_imgproc430.dll" $(SolutionDir)$(Platform)\$(Configuration)\ /D /S /R /Y /I /K

`Post-build event[For Debug builds]`


@rem For inference engine.
xcopy "C:\Program Files (x86)\IntelSWTools\openvino\inference_engine\bin\intel64\$(Configuration)" $(SolutionDir)$(Platform)\$(Configuration)\ /D /S /R /Y /I /K
xcopy "C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\inference_engine\external\tbb\bin\tbb_debug.dll" $(SolutionDir)$(Platform)\$(Configuration)\ /D /S /R /Y /I /K
xcopy "C:\Program Files (x86)\IntelSWTools\openvino\deployment_tools\ngraph\lib\ngraphd.dll" $(SolutionDir)$(Platform)\$(Configuration)\ /D /S /R /Y /I /K

@rem For opencv function.
xcopy "C:\Program Files (x86)\IntelSWTools\openvino\opencv\bin\opencv_core430d.dll" $(SolutionDir)$(Platform)\$(Configuration)\ /D /S /R /Y /I /K
xcopy "C:\Program Files (x86)\IntelSWTools\openvino\opencv\bin\opencv_imgcodecs430d.dll" $(SolutionDir)$(Platform)\$(Configuration)\ /D /S /R /Y /I /K
xcopy "C:\Program Files (x86)\IntelSWTools\openvino\opencv\bin\opencv_imgproc430d.dll" $(SolutionDir)$(Platform)\$(Configuration)\ /D /S /R /Y /I /K

I uploaded the configured solution and project files to the Github repository. https://github.com/tomoyaeibu/openvino2020.3.1-mnist-sample

Comparison of execution results of Python/Tensorflow and C ++/OpenVINO

Let's compare the inference result of Python/Tensorflow with the inference result of C ++/OpenVINO.

`Python/Tensorflow inference results`


(28, 28, 1)
Time taken for inference : [2.8801844] ms
[0.00000004 0.9999782  0.00000129 0.00000004 0.00001199 0.00000002
 0.00000262 0.00000498 0.00000056 0.00000028]

`C++/OpenVINO inference results`


Time taken for inference : 1.889800 ms
0 : 0.000000
1 : 0.999978
2 : 0.000001
3 : 0.000000
4 : 0.000012
5 : 0.000000
6 : 0.000003
7 : 0.000005
8 : 0.000001
9 : 0.000000
result = 1

Compare the accuracy of inference results

If you look at the inference result values, you can see that they match. In order to compare the accuracy accurately, we have to look at the values in a little more detail, but even within the range that can be seen from this result, the numbers up to the sixth decimal place match.

You can also increase the inference speed at the expense of the accuracy of the inference results by changing from FP32 to FP16 in the model optimizer settings. However, if the device uses a CPU, I can't use FP16, so I haven't tried it yet. I'll give it a try if I get the chance to run it on another device.

Compare execution times of inference results

It is a condition that only one inference is made with BatchSize = 1, but you can see that C ++/OpenVINO can infer faster.

You can further speed up by increasing BatchSize or processing infer_requests in parallel on multiple cores. I would like to write an article again if I verify it at another time.

A little story

――In the project I am in charge of, I am able to execute a classification model of 3D data using Conv3D with OpenVINO. To enter 3D data, use setLayout (Layout :: NCDHW);. setLayout (Layout :: NWHDC); is not available as an option. ――In the project I'm in charge of, I want to incorporate inference functionality into an existing system, but the existing system could only be built with Visual Studio 2010. This means that you cannot build inference functionality in a project on an existing system. Therefore, we cut out only the inference function into a dll, built it with Visual Studio 2017, loaded the dll from the existing system, and called the inference method to realize the embedding.

References

-Handwriting recognition by OepnVino running in C ++ (MNIST) -OpenVINO (2019.R1) Windows 10 version installation and sample testing

OpenVINO Toolkit Documantation

[PYTHON] Tutorial to infer the model learned in Tensorflow with C ++/OpenVINO at high speed