[PYTHON] I tried running an object detection tutorial using the latest deep learning algorithm

Introduction

Last time, I created a face classifier using a convolutional neural network (CNN), but in reality this is just the beginning of machine learning. I learned that the world is progressing at a tremendous speed. I learned that the boss who killed his life is actually the weakest of the 10,000 or more enemies.

It seems that new algorithms are published every year and competitions are held to compete for recognition accuracy, and the main reason for the progress is to improve recognition accuracy and processing speed. Among them, I tried to move the tutorial of Single Shot MultiBox Detector (SSD) which is the latest algorithm this time.

If you want to read the detailed progress in the industry, the following page is very easy to understand. SSD: Single Shot MultiBox Detector (ECCV2016) In summary CNN → R-CNN → FAST R-CNN → FASTER R-CNN → SSD (here and now)

How it took time to post this time

I was addicted to it for about 5 days, and there was a problem that I could not find a clue to solve the result. That's because the implementation itself uses Python, but the fact is that there are so many library choices needed to implement it.

At first, I thought that it was implemented using Tesnorflow, but there is also code that seems to be too incomprehensible, and it is not interesting just for the purpose of moving it involuntarily, so the code is still minimal Keras. I decided to move what I implemented.

Reference code Kuras had no problem, but this is a mechanism that directly recognizes the video and required OpenCV, FFmpeg, GTK2. From the conclusion, although OpenCV and FFmpeg were installed using Homebrew, GTK2 was not built in OpenCV and the video was not loaded.

I tried brew edit and rewritten various things, and tried various things such as build options and things like this, but in conclusion Homebrew's OpenCV seemed to be set so that GTK could not be built. Article that led to the conclusion of giving up

By the way, I tried from the build, but I gave up on the way because the build of FFmpeg was very troublesome as before.

The result of trial and error

Based on the above, I decided to try the one using Chainer. This can detect still images, not videos. chainer-SSD

environment

Git

brew install git

Python3.6.1

brew install python3

PATH setting

if [ -d $(brew --prefix)/lib/python3.6/site-packages ];then
  export PYTHONPATH=$(brew --prefix)/lib/python3.6/site-packages:$PYTHONPAT
fi

Cython

pip3 install cython

Numpy

pip3 install numpy

Chainer

pip3 install chainer  

Matplotlib

pip3 install matplotlib  

git clone

cd {Workspace}
git clone https://github.com/ninhydrin/chainer-SSD.git  

Preparation

cd {Workspace}/chainer-SSD/util
python3 setup.py build_ext -i

Run

Two images for execution are included, so let's execute this first.

cd {Workspace}/chainer-SSD
python3 demo.py img/dog.jpg
/usr/local/lib/python3.6/site-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
  warn("The default mode, 'constant', will be changed to 'reflect' in "

result

Dog, car and bicycle

スクリーンショット 2017-06-21 16.53.06.png

Fish-shaped bicycle and person

スクリーンショット 2017-06-21 16.58.01.png

I tried various things

Car and cat

スクリーンショット 2017-06-21 16.56.50.png

People and birds

スクリーンショット 2017-06-21 17.06.11.png

raccoon

スクリーンショット 2017-06-21 17.08.17.png * Not recognizing (?)

Wheelchair dog

スクリーンショット 2017-06-21 17.09.24.png

Many people

スクリーンショット 2017-06-21 17.10.58.png

Impressions

――There was a feeling that the detection accuracy of about 3 objects per image was limited. --If the object in the image is too small, it will not be detected. (I think it's because I'm resizing) ――It seems that the accuracy of the back, sideways, and blurred objects is not very high yet. (Maybe there is not enough learning) ――I didn't feel that the speed was so slow for one image, so it doesn't change. (Experience about 5 seconds)

Summary

――I would like to read the source code, learn the original image data, and try again. ――I want to try recognition directly from the video or camera. (If there is an easy way to build on Mac or CentOS) --There is too little information translated into Japanese or Japanese. I want a community where teachers or each other can teach.

Recommended Posts

I tried running an object detection tutorial using the latest deep learning algorithm
I tried deep learning using Theano
[For beginners] I tried using the Tensorflow Object Detection API
[Anomaly detection] Try using the latest method of deep distance learning
I tried using the trained model VGG16 of the deep learning library Keras
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried the common story of predicting the Nikkei 225 using deep learning (backtest)
An amateur tried Deep Learning using Caffe (Introduction)
An amateur tried Deep Learning using Caffe (Practice)
An amateur tried Deep Learning using Caffe (Overview)
I tried deep learning
I tried to compress the image using machine learning
I tried to simulate ad optimization using the bandit algorithm.
[TF] I tried to visualize the learning result using Tensorboard
[Deep Learning from scratch] I tried to explain the gradient confirmation in an easy-to-understand manner.
[For those who want to use TPU] I tried using the Tensorflow Object Detection API 2
[Pokemon Sword Shield] I tried to visualize the judgment basis of deep learning using the three family classification as an example.
I tried the TensorFlow tutorial 1st
I tried the TensorFlow tutorial 2nd
I tried reinforcement learning using PyBrain
I tried using the checkio API
I tried running the TensorFlow tutorial with comments (_TensorFlow_2_0_Introduction for beginners)
Deep Understanding Object Detection by Deep Learning by Keras
I tried hosting Pytorch's deep learning model using TorchServe on Amazon SageMaker
I tried using the BigQuery Storage API
I tried to extract a line art from an image with Deep Learning
I tried running the TensorFlow tutorial with comments (text classification of movie reviews)
I tried object detection with YOLO v3 (TensorFlow 2.1) on the GPU of windows!
I tried using scrapy for the first time
vprof --I tried using the profiler for Python
Learning neural networks using the genetic algorithm (GA)
I tried using PyCaret at the fastest speed
I tried using the Google Cloud Vision API
I tried to detect an object with M2Det!
I tried using the Datetime module by Python
I tried running Deep Floor Plan with Python 3.6.10.
I tried using the image filter of OpenCV
I tried using the functional programming library toolz
[Deep Learning from scratch] I implemented the Affine layer
I installed and used the Deep Learning library Chainer
I tried running the app on the IoT platform "Rimotte"
I tried the MNIST tutorial for beginners of tensorflow.
I tried learning my own dataset using Chainer Trainer
[Linux] I tried using the genetic statistics software PLINK
I tried clustering ECG data using the K-Shape method
I tried to get an AMI using AWS Lambda
I tried to approximate the sin function using chainer
I tried to become an Ann Man using OpenCV
I tried using the API of the salmon data project
[MNIST] I tried Fine Tuning using the ImageNet model.
[Python] I tried running a local server using flask
[Deep Learning from scratch] I tried to explain Dropout
I investigated the reinforcement learning algorithm of algorithmic trading
PyTorch Learning Note 2 (I tried using a pre-trained model)
I tried to identify the language using CNN + Melspectogram
I tried to complement the knowledge graph using OpenKE
I tried running the sample code of the Ansible module
I tried to compare the accuracy of machine learning models using kaggle as a theme.
I tried to sort out the objects from the image of the steak set meal-① Object detection
I tried using parameterized
I tried using argparse