[PYTHON] [For beginners] I tried using the Tensorflow Object Detection API

Introduction

This is Ichi Lab from RHEMS Giken.

This time, using TensorFlow's Object Detection API, Detect the object you want to recognize I finally tried it on an Android device.

Many helpful sites have helped me, Still, there were so many things I couldn't solve with just one or two pages.

In this article, I will pick up the parts that I stumbled upon, so I will describe them. I hope that the number of people who have similar difficulties will be reduced as much as possible.

environment

TensorFlow 1.15

  1. MacBook Pro OS : Catalina 10.15 CPU : Intel Core i5 2.3GHz Memory : 8GB 2133MHz

  2. MacBook Pro OS : Catalina 10.15 CPU : Intel Core i7 3.5GHz Memory : 16GB 2133MHz

I changed from 1 to 2 for some reason on the way.

Environment

Basically, it is as written in the formula below.

TensorFlow Object Detection - Installation

Get the latest source of Object Detection API with the following command.

$ git clone --depth 1 https://github.com/tensorflow/models.git

Where to set PYTHONPATH, it is explained in pwd, but if it is described in full, it is as follows. (There is also a container whose root directory is tf instead of tensorflow, so check it.)

$ export PYTHONPATH=$PYTHONPATH:/tensorflow/models/research:/tensorflow/models/research/slim

Precautions for building an environment

If you use Docker to run it in a container, there are various types as shown on the official page below, Be careful if you use latest (latest version).

Tensorflow - Docker

Because, as of November 2019 at the time of writing, ** Object Detection API is This is because it does not support TensorFlow 2.0 **.

Similarly, when installing with the pip command, it is essential to check the version of TensorFlow. If you want to check, you can type the following command.

$ pip list | grep tensor
tensorboard                        1.15.0     
tensorflow                         1.15.0rc3  
tensorflow-estimator               1.15.1 

If you want to change the version from 2.0 to 1.X, run the following command. (Changed to 1.15 in this example)

$ pip install tensorflow==1.15.0rc3

By the way, the difference in this version is quite important, so I will add it, The official TensorFlow has prepared a script that automatically converts even if the version is 2.0. However, it is described as ʻexcept for contrib` (excluding contrib).

Migrate your TensorFlow 1 code to TensorFlow 2

If you run the Object Detection API model_builder_test.py in TensorFlow 2.0, It will fail as follows.

AttributeError: module 'tensorflow' has no attribute 'contrib'

You're stuck with an error in contrib. This means that automatic conversion scripts cannot be used at this time.

By the way, did you run model_builder_test.py safely and it was displayed as OK? model_builder_test.py does the following in the models / research directory:

$ python object_detection/builders/model_builder_test.py

If successful, OK will be displayed as shown below. (Since it is long, some parts are omitted)

Running tests under Python 3.6.8: /usr/local/bin/python
[ RUN      ] ModelBuilderTest.test_create_faster_rcnn_model_from_config_with_example_miner  
...    

[ RUN      ] ModelBuilderTest.test_unknown_ssd_feature_extractor
[       OK ] ModelBuilderTest.test_unknown_ssd_feature_extractor
----------------------------------------------------------------------
Ran 16 tests in 0.313s

By the way, except for TensorFlow </ b>, I didn't have any effect even if I didn't care about the version, so I think you can install the latest one. That is all for building the environment.

Preparation of teacher data

To detect the object you want to recognize, create teacher data. In order to create teacher data, we have to prepare a lot of images that we want to detect.

If you find it difficult to collect by yourself, there is also a handy tool called google-images-download, so If you are interested, please check it out and use it.

Now, let's explain the teacher data. The teacher data is prepared in the TFRecord format </ b> recommended by TensorFlow. TFRecord is a simple form for storing a series of binary records, The data is serialized (saved as array data) and can be read continuously. There are roughly two ways to create TFRecord format data.

  1. Use the annotation tool
  2. Make your own from the sauce

Annotation is so-called tagging. While displaying photos and videos, it is a task to teach "this part is this". I used a tool from Microsoft called VoTT </ b> for this. VoTT

Precautions when using VoTT

  • Select ** Tensorflow Records ** in Export Settings
  • Set to Only visited Assets Other than </ b> in Asset State
  • Active Learning (mark like a hat) may disappear when you press it, so it is not recommended to use it very much.
  • If you tag a large number of images, the processing will be heavy. In that case, it is better to cut them into several images.

As you can see, it is a tool that has various difficult points to use, but It is recommended in that respect because it eliminates the hassle of converting to TFRecord format.

If you make it yourself from the sauce

It is possible to generate a TFRecord format file from an image with the library provided by TensorFlow. However, with the TFRecord generated in the official tutorial below, The TFRecord generated by VoTT had slightly different element names and could not be mixed and trained together. Usage of TFRecords and tf.Example

This document will be more useful. using_your_own_dataset.md

The story around here will be long, so I summarized it in another article, so please have a look if you like. TFRecord file creation memorandum for object detection

Learning

The explanation from here is based on the assumption that all directories are based on models / research.

Once you have the TFRecord format file ready, it's time to start learning. Here, let's learn your own original based on the existing learning model called "transfer learning". We will proceed with the method of creating a more customized learning model. Transfer learning uses weights learned in advance on large-scale data. It is expected that sufficient performance can be obtained even with a small amount of training data.

Download learning model

First, download the trained model for transfer learning. This time, I couldn't find it used in other Japanese articles. Let's proceed with the new MobileNet v3.

The trained model can be downloaded from the following page. Tensorflow detection model zoo
In this, download the one called ssd_mobilenet_v3_large_coco. Please note that this model will only work with the latest sources updated around mid-October 2019.

Place the downloaded model in the ʻobject_detection` directory. If you want to run it with a command, do the following:

$ wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v3_large_coco_2019_08_14.tar.gz  
$ tar zxvf ssd_mobilenet_v3_large_coco_2019_08_14.tar.gz

Learning model file structure

The file structure of the learning model is basically as follows.

  • checkpoint
  • frozen_inference_graph.pb
  • model.ckpt.data-00000-of-00001
  • model.ckpt.index
  • model.ckpt.meta
  • pipeline.config
  • saved_model /
    • saved_model.pb
    • variables /

When starting transfer learning, edit the contents of the file pipeline.config in this and use it.

Directory structure and filenames for training and evaluation

This section describes the prepared TFRecord file directory. The TFRecord file has separate directories for training (teacher) and verification. Generally, the ratio is training: validation is 8: 2, that is, training is divided into 80% of the data and the remaining 20% is divided into validation. (Book "Practical Machine Learning with scikit-learn and TensorFlow" From P30) Of course, it is possible to study even if you do not divide it properly. Below is the directory structure. (Train stands for training, val stands for validation)

  • test0001 / --label_map (.pbtxt) Label file
    • train / --TFRecord file (.tfrecord) Training file
    • val / --TFRecord file (.tfrecord) Verification file
    • save / --A directory prepared to save the learned data (empty at first)

TFRecord file names should be unified with serial numbers instead of separate numbers. For example, here we name it as follows.

  • hoge0000.tfrecord
  • hoge0001.tfrecord
  • hoge0002.tfrecord
  • ... (and so on)

How to name the file here is related to the setting of the following config file.

Edit config file

Open pipeline.config in the trained model you downloaded earlier. Here you can adjust various parameters such as batch size, weighting, and image expansion, but I will explain the items that you will often edit. (Excerpt from the actual config for explanation)

pipeline.config


model {
    ssd {
        num_classes: 1

num_classes : It is a setting of the number to classify. It is written relatively at the top of the config file.

pipeline.config


train_config: {
    batch_size: 32
    num_steps: 10000
    optimizer {
        momentum_optimizer: {
            learning_rate: {
                cosine_decay_learning_rate {
                    total_steps: 10000
                    warmup_steps: 10000
    fine_tune_checkpoint: "./object_detection/ssd_mobilenet_v3_large_coco/model.ckpt"
}

batch_size : When using stochastic gradient descent, to reduce the effects of outliers Train the dataset in several subsets. The number of data contained in each subset is called the batch size. This value is often the value of 2 to the nth power as is customary in the field of machine learning. And, the larger this value is, the more load is applied during learning, and depending on the environment, the process may die and not learn.

num_steps : The number of steps to learn. The number of steps can also be specified in the command when performing learning, As far as I've tried, the command specification takes precedence.

total_steps and warmup_steps: I am investigating because it is an item that was not in the config of other models, total_steps must be greater than or equal to warmup_steps. (If this condition is not met, an error will occur and learning will not start.)

fine_tune_checkpoint : Specifies a model for transfer learning. It is OK if you write the directory containing the downloaded learning model up to "~ .ckpt". This item was not included at the time of downloading ssd_mobilenet_v3. That's why I added it myself. Most models already have this item. This line is unnecessary if you do not perform transfer learning.

pipeline.config


train_input_reader: {
    tf_record_input_reader {
        input_path: "./object_detection/test0001/train/hoge????.tfrecord"
    }
    label_map_path: "./object_detection/test0001/tf_label_map.pbtxt"
}
eval_input_reader: {
    tf_record_input_reader {
        input_path: "./object_detection/test0001/val/hoge????.tfrecord"
    }
    label_map_path: "./object_detection/test0001/tf_label_map.pbtxt"
}

input_path : Specify the directory of the prepared training and verification TFRecord files. For example, in this case, we named it with a serial number of hoge0000.tfrecord, so write it as hoge ????. Tfrecord.

label_map_path : Specify the prepared label. There is no problem with the same specification for train and eval.

Start learning

After writing the config file, it's time to start learning. For training, use the file model_main.py located in the ʻobject_detection` directory. A description of the run-time arguments.

  • --model_dir: Specify the save destination of the training data. A file called ".ckpt ~" will be created in the specified directory during learning.

  • --pipeline_config_path: Specify the config file to use. Let's specify the config file you edited earlier.

  • --num_train_steps: Specify the number of learnings. This option is not needed if you want to implement the number specified in config, If you want to train at a different number of times than config, the number of times specified here will take precedence. (Result of actual trial)

The following is an execution example.

$ python object_detection/model_main.py \
--pipeline_config_path="object_detection/ssd_mobilenet_v3_large_coco/pipeline.config" \
--model_dir="./object_detection/test0001/save" \
--alsologtostderr

Since there are many options and it is troublesome to enter them one by one, it will be easier to execute if you make a shell script.

#! /bin/bash

PIPELINE_CONFIG_PATH="./object_detection/ssd_mobilenet_v3_large_coco.config"
MODEL_DIR="./object_detection/test0001/save"
NUM_TRAIN_STEPS=10000

cd '/tensorflow/models/research'

python object_detection/model_main.py \
    --pipeline_config_path=$PIPELINE_CONFIG_PATH \
    --model_dir=$MODEL_DIR \
#    --num_train_steps=$NUM_TRAIN_STEPS \
    --alsologtostderr

"Oh, I can't run the shell I made ...?" Did you forget to change the permissions?

$ chmod 775 hoge.sh

If you start learning but the process dies in the middle

If you think that learning has begun, it may happen as follows.

~ session_manager.py:500] Running local_init_op.
~ session_manager.py:502] Done running local_init_op.
~  basic_session_run_hooks.py:606] Saving checkpoints for 0 into {Save checkpoint The specified directory/file name}.ckpt
Killed

There are several reasons why the process dies, In both cases, it is highly possible that the operating environment is running out of resources.

Here, I will introduce the means that I solved when I experienced this event.

~ Solution # 1. Try changing the Docker Engine settings ~

This is a method that may be useful if you are doing it inside a Docker container. First open the settings. Select Preferences → Advanced tab </ b> to increase resources.

~ Solution # 2. Try to reduce the batch size ~

This is resolved by setting pipeline.config.

What kind of parameter combination will kill the process? I think it depends on the operating environment and the amount and size of teacher data used. First of all, it may be solved by reducing the batch size value, so If you are worried here, give it a try.

Confirmation of learning data

The training data will be saved at any time in the save destination directory specified when model_main.py is executed. Data will be saved at the timing when Saving is displayed as shown below during learning. At this stage, you can visualize the situation during learning with the Tensorboard explained below.

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 500: object_detection/test0001/save/model.ckpt-500
I1012 08:29:56.544728 139877141301056 estimator.py:2109] Saving 'checkpoint_path' summary for global step 500: object_detection/test0001/save/model.ckpt-500

Tensorboard ~ Learning Visualization ~

Visualization of training data is essential, and Tensorboard is very helpful for that.

Launch Tensorboard

You can start tensorboard with the following command.

$ tensorboard --logdir=object_detection/test0001/save
  • --logdir : Specify the directory where the learning data is saved, which was specified at the start of learning. If you specify the save destination of the learning results performed in the past, you can view it again even after the learning is completed.

To see Tensorboard in your web browser, go to localhost. By default, the port number is 6006.

http://localhost:6006/

If you are running in a Docker container environment, make sure you have access to -p 6006: 6006 when you do docker run. I have a port setting in docker-compose.yml.

How to read Tensorboard

Even if you can visualize learning, it's painful if you don't understand the point of view at all. I have summarized the contents of my research, so I hope you can refer to them. GRAPHS The graph is automatically generated from the processing of the source code.

item Description
Run You can switch the subdirectory where the log is located
Upload Upload Tensorflow model file
Trace inputs Being able to chase node dependencies
Color Choose a color coding method
- Structure :Model (network) configuration
- Device :Processed device (CPU vs GPU)
- Compute time :processing time
- Memory :memory usage
- TPU Compatibility :Run on tensor processing unit

SCALARS

item Description
Show data download links Display a link where you can save the graph. You can choose CSV or JSON format.
ignore outliners in chart scaling Whether to scale the graph to avoid outliers (check to avoid)
Tooltip sorting method Tooltip order
- default :Alphabetical order
- descending :In descending order of value
- ascending :In ascending order of value
- nearest :Close to mouse cursor
Smoothing Graph smoothing
Horizontal Axis Specifying the horizontal axis (X axis) of the line graph
- STEP :Step (number of executions)
- RELATIVE :Execution time (difference from the first time)
- WALL :Times of Day
- Runs :Show / hide graph

Reference: Visualization of learning with TensorBoard Next, I will explain each of the main graphs. Before that, I will give a description of the words that you need to know for explanation.

  • IOU :
    IOU is an abbreviation for Intersection over Union, which is an index that indicates "how much the two regions overlap". The closer this value is to 1, the more the correct answer and the inference match.
  • Recall :
    Recall. Percentage of what you should find that you were able to find correctly. Also called Sensitivity.

Reference: Meaning of IoU (evaluation index) and strictness of value Reference: [For beginners] Explanation of evaluation indicators for classification problems in machine learning (correct answer rate, precision rate, recall rate, etc.)

The following is an excerpt with reference to the explanation of the items described in ʻobject_detection / metrics / coco_tools.py`.

item Description
Precision/mAP Average accuracy of the class averaging IOU thresholds in the range 5-95 in 5 increments.
Precision/[email protected] Average accuracy of 50% IOU
Precision/[email protected] Average accuracy of 75% IOU
Precision/mAP (small) Average accuracy of small objects (less than 32 x 32 px)
Precision/mAP (medium) Average accuracy of medium-sized objects(32×32 px〜96×96 px)
Precision/mAP (large) Average accuracy of large objects(96×96 px〜10000×10000 px)
Recall/AR@1 Percentage of averages found correctly in a single detection
Recall/AR@10 Percentage of averages found correctly in 10 detections
Recall/AR@100 Percentage of averages found correctly in 100 detections
Recall/AR@100 (small) Average recall of small objects detected 100 times
Recall/AR@100 (medium) Average recall of medium objects detected 100 times

IMAGES

Display image data. You can check the difference between the correct answer of the verification data and the inference result in .ckpt file (checkpoint) units.

Conversion to inference graph

When the training is finished, it is finally time to convert the data into an inference graph. Specifically, it converts the .ckpt file created by learning into a .pb file.

There are multiple sources for conversion in the ʻobject_detection` directory.

  • export_inference_graph.py
  • export_tflite_ssd_graph.py

** To try it on Android, it will be converted to ** .tflite format, so ** use the latter **.

For ʻexport_inference_graph.py`

A description of the run-time arguments.

  • --input_type : For the inference graph, specify one of the following three depending on the user

    • image_tensor :
      4D tensor [None, None, None, 3] Normally, you should specify this.
  • ʻEncoded_image_string_tensor`: 1D tensor [None] Includes encoded PNG or JPEG images. If multiple images are provided, it is assumed that the image resolutions are the same.

  • tf_example: 1D string tensor [None] If multiple images are provided, including serialized TF Example protos, it is assumed that the image resolutions are the same.

  • --pipeline_config_path : Specify the config file used during training.

  • --trained_checkpoint_prefix : Specify the "model.ckpt-XXXX" file created as the save destination for the training data. For XXXX, specify the latest (largest) number of learning steps performed. For example, if it is executed 10000 times, it will be "model.ckpt-10000".

  • --output_directory : Specify the directory you want to export. In the specified directory, a file with exactly the same structure as the file structure of the first downloaded model will be created.

The following is an execution example.

$ python object_detection/export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path object_detection/ssd_mobilenet_v3_large_coco/pipeline.config \
--trained_checkpoint_prefix object_detection/test0001/save/model.ckpt-10000 \
--output_directory object_detection/test0001/output

ʻFor export_tflite_ssd_graph.py`

A program that converts a trained model to a .tflite compatible model. The arguments are basically the same as ʻexport_inference_graph.py. (There is no --input_type`)

The following is an execution example.

$ python object_detection/export_tflite_ssd_graph.py \
--pipeline_config_path=object_detection/ssd_mobilenet_v3_large_coco/pipeline.config \
--trained_checkpoint_prefix=object_detection/test0001/save/model.ckpt-10000 \
--output_directory=object_detection/test0001/tflite \
--add_postprocessing_op=true

Two files, tflite_graph.pb and tflite_graph.pbtxt, are created in the directory specified by --output_directory.

Convert to TensorFlow Lite format

Use tflite_convert to convert to tflite format. This converter should be included from the beginning.

If you type tflite_convert --help, usage will come out. Since there are many options to add, I summarized it in a shell script as follows. Change the directory to suit you. In addition, I recognize that --input_shapes is probably a value that matches ʻimage_resizer` of config at the time of learning, but there is no confirmation ... The meaning of the numbers is (batch size, input image height, input image width, input image depth (RGB channel)).

Converter command line reference

#! /bin/bash

OUTPUT_FILE="object_detection/test0001/tflite/test.tflite"
GRAPH_DEF_FILE="object_detection/test0001/tflite/tflite_graph.pb"
INTERFACE_TYPE="FLOAT"
INPUT_ARRAY="normalized_input_image_tensor"
OUTPUT_ARRAYS="TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3"
INPUT_SHAPES="1,300,300,3"

cd '/tensorflow/models/research'

tflite_convert \
    --output_file=$OUTPUT_FILE \
    --graph_def_file=$GRAPH_DEF_FILE \
    --inference_type=$INTERFACE_TYPE \
    --input_arrays=$INPUT_ARRAY \
    --input_shapes=$INPUT_SHAPES \
    --output_arrays=$OUTPUT_ARRAYS \
    --default_ranges_min=0 \
    --default_ranges_max=6 \
    --mean_values=128 \
    --std_dev_values=127 \
    --allow_custom_ops

In addition, I found multiple instructions to install bazel and toco on other sites, but it was possible only with this command line without them. (I tried those methods to be exact, but the result was the same)

At this time, if the version of TensorFlow is ** 1.12.0rc0, it failed **, With ** 1.15.0rc3 **, the exact same command ** worked **.

In addition, the official description recommended the Python API. Converter command line reference

Actually run it on Android

It's been a long time, but it's finally in production.

Android Studio installation

First, let's install "Android Studio". If you find other tools easier to use, you can use them.

Download sample from official

Next, download the sample collection from the official.

$ git clone --depth 1 https://github.com/tensorflow/examples.git

Sample modification

After launching Android Studio, open ʻexamples / lite / examples / object_detection / android`.

Put your own tflite file

ʻPlace yourtest.tflite and labelmap.txt in the examples / lite / examples / object_detection / android / app / src / main / assetsdirectory. labelmap.txt` is a text file that lists the tag names you have tagged.

For example, if you prepare two types of tag names, "apple" and "orange", the text file will be as follows.

labelmap.txt


???
apple
orange

The important thing is that the first line should be ???.

Editing DetectorActivity.java

It's long, but edit DetectorActivity.java in the ʻexamples / lite / examples / object_detection / android / app / src / main / java / org / tensorflow / lite / examples / detection /` directory.

DetectorActivity.java


private static final boolean TF_OD_API_IS_QUANTIZED = false; //true->false
private static final String TF_OD_API_MODEL_FILE = "test.tflite"; //detect.tflite -> my tflite
private static final String TF_OD_API_LABELS_FILE = "file:///android_asset/labelmap.txt"; //my txt

Then edit build.gradle in the ʻexamples / lite / examples / object_detection / android / app /` directory.

Comment out the following around line 40: (If you do not comment out here, it will be replaced with the default sample data when building)

build.gradle


apply from:'download_model.gradle' //Comment out

Build

Once you've done this, all you have to do is build it, run it on your device, and see! Congratulations if you can try it well.

Below is a reference site for Android

Training and serving a realtime mobile object detector in 30 minutes with Cloud TPUs How to Train Your Own Custom Model with Tensorflow Object Detection API and Deploy It into Android with TF Lite Detecting Pikachu on Android using Tensorflow Object Detection

in conclusion

What did you think? It looks like a summary page on another site, but At the very least, I think we've covered quite a bit of the information we might need to use TensorFlow's Object Detection API.

In particular, TensorFlow has different detailed behavior depending on the version, Since the versions of various reference sites used at that time are different, I think that many people have had the same difficulty.

There are still many parts that I don't understand, but I hope it helps. Thank you for reading the long article.

Recommended Posts