[PYTHON] Try to profile with ONNX Runtime

This is a continuation of Speeding up Deep Learning with CPU of Raspberry Pi 4.

In order to accelerate deep learning, it is necessary to investigate which processing consumes how much time and reduce the actual processing time. Therefore, first use the profile function of ONNX Runtime for profiling.

How to enable the profiling feature is described in the ONNX Official Tutorial (https://microsoft.github.io/onnxruntime/python/auto_examples/plot_profiling.html "").

sample.py


import onnxruntime

options = onnxruntime.SessionOptions()
options.enable_profiling = True          # <-Profile function enabled
session = onnxruntime.InferenceSession(path_to_model, options)

[Profile target]

prof_file = session.end_profiling()
print(prof_file)

Profile results are saved in JSON format. In addition, profile results can be visualized with the Tracing tool built into Chrome. (Enter ** chrome: // tracing / ** in the Chrome URL to launch the tool.)

This time, let's take a profile in the case of performing image classification with ** MobileNetV1 depth 1.0 224x224 **. Each model uses the model with ONNX Runrime Graph Optimization.

Profile result

mobilenetv1_prof1.png The process is processed in the order of model loading, session initialization, and model execution. mobilenetv1_prof3.png If you expand the model execution part, the Convolution process, which is a fusion of the Batch Normalization process, is executed multiple times. (Batch Normalization should be handled independently, but ONNX Runrime Graph Optimization As a result, it is integrated into the Convolution process.)

The table below summarizes the profile results.

item processing time(ms) Percentage(%)
All processing 157.19 -
Convolution 148.017 94.2
gemm 6.053 3.9
Other 3.12 1.9

It turns out that in order to reduce the overall processing time, we have to do something about Convolution processing. I would like to consider what kind of approach is necessary from the next time onwards.

Recommended Posts

Try to profile with ONNX Runtime
Try to factorial with recursion
Try to operate Facebook with Python
Try to output audio with M5STACK
Try to reproduce color film with Python
Try to predict cherry blossoms with xgboost
Try converting to tidy data with pandas
Quickly try to visualize datasets with pandas
First YDK to try with Cisco IOS-XE
Try to generate an image with aliasing
Try to make your own AWS-SDK with bash
Try to solve the fizzbuzz problem with Keras
Try to aggregate doujin music data with pandas
Try to solve the man-machine chart with Python
Try to extract Azure document DB document with pydocumentdb
Try to draw a life curve with python
Try to communicate with EV3 and PC! (MQTT)
How to try the friends-of-friends algorithm with pyfof
How to deal with run-time errors in subprocess.call
Try to make a "cryptanalysis" cipher with Python
Try to automatically generate Python documents with Sphinx
Try to make a dihedral group with Python
Try to make client FTP fastest with Pythonista
Try to detect fish with python + OpenCV2.4 (unfinished)
Try scraping with Python.
Try to solve the programming challenge book with python3
[First API] Try to get Qiita articles with Python
Convert 202003 to 2020-03 with pandas
Try to make a command standby tool with python
Try to dynamically create a Checkbutton with Python's Tkinter
Try to visualize the room with Raspberry Pi, part 1
Try to solve the internship assignment problem with Python
Try to predict forex (FX) with non-deep machine learning
Convert keras-yolo3 to onnx
Try to make RESTful API with MVC using Flask 1.0.2
Schema-driven development with Responder: Try to display Swagger UI
[GCP] Try a sample to authenticate users with Firebase
Try to get the contents of Word with Golang
Try to implement yolact
[Neo4J] ④ Try to handle the graph structure with Cypher
Try SNN with BindsNET
A sample to try Factorization Machines quickly with fastFM
Try to tamper with requests from iphone with Burp Suite
Try to automate pdf format report creation with Python
Try regression with TensorFlow
Try to specify the axis with PyTorch's Softmax function
Try to build a deep learning / neural network with scratch
Try to play with the uprobe that supports Systemtap directly
[Evangelion] Try to automatically generate Asuka-like lines with Deep Learning
[Automatic test] How to start making automatic test during runtime with Airtest
Try to display various information useful for debugging with python
When I try to push with heroku, it doesn't work
Try Fortran with VS Code up to debug settings. [Win10]
Try to link iTunes and Hue collection case with MQTT
Try to bring up a subwindow with PyQt5 and Python
Try to extract Azure SQL Server data table with pyodbc
Try to automate the operation of network devices with Python
Try to process Titanic data with preprocessing library DataLiner (Append)
Try to get data while port forwarding to RDS with anaconda.
Try to predict if tweets will burn with machine learning
It's Halloween so I'll try to hide it with Python