Last time, I created a face classifier using a convolutional neural network (CNN), but in reality this is just the beginning of machine learning. I learned that the world is progressing at a tremendous speed. I learned that the boss who killed his life is actually the weakest of the 10,000 or more enemies.
It seems that new algorithms are published every year and competitions are held to compete for recognition accuracy, and the main reason for the progress is to improve recognition accuracy and processing speed. Among them, I tried to move the tutorial of Single Shot MultiBox Detector (SSD) which is the latest algorithm this time.
If you want to read the detailed progress in the industry, the following page is very easy to understand. SSD: Single Shot MultiBox Detector (ECCV2016) In summary CNN → R-CNN → FAST R-CNN → FASTER R-CNN → SSD (here and now)
I was addicted to it for about 5 days, and there was a problem that I could not find a clue to solve the result. That's because the implementation itself uses Python, but the fact is that there are so many library choices needed to implement it.
At first, I thought that it was implemented using Tesnorflow, but there is also code that seems to be too incomprehensible, and it is not interesting just for the purpose of moving it involuntarily, so the code is still minimal Keras. I decided to move what I implemented.
Reference code Kuras had no problem, but this is a mechanism that directly recognizes the video and required OpenCV, FFmpeg, GTK2. From the conclusion, although OpenCV and FFmpeg were installed using Homebrew, GTK2 was not built in OpenCV and the video was not loaded.
I tried brew edit and rewritten various things, and tried various things such as build options and things like this, but in conclusion Homebrew's OpenCV seemed to be set so that GTK could not be built. Article that led to the conclusion of giving up
By the way, I tried from the build, but I gave up on the way because the build of FFmpeg was very troublesome as before.
Based on the above, I decided to try the one using Chainer. This can detect still images, not videos. chainer-SSD
Git
brew install git
Python3.6.1
brew install python3
if [ -d $(brew --prefix)/lib/python3.6/site-packages ];then
export PYTHONPATH=$(brew --prefix)/lib/python3.6/site-packages:$PYTHONPAT
fi
Cython
pip3 install cython
Numpy
pip3 install numpy
Chainer
pip3 install chainer
Matplotlib
pip3 install matplotlib
git clone
cd {Workspace}
git clone https://github.com/ninhydrin/chainer-SSD.git
cd {Workspace}/chainer-SSD/util
python3 setup.py build_ext -i
Two images for execution are included, so let's execute this first.
cd {Workspace}/chainer-SSD
python3 demo.py img/dog.jpg
/usr/local/lib/python3.6/site-packages/skimage/transform/_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "
――There was a feeling that the detection accuracy of about 3 objects per image was limited. --If the object in the image is too small, it will not be detected. (I think it's because I'm resizing) ――It seems that the accuracy of the back, sideways, and blurred objects is not very high yet. (Maybe there is not enough learning) ――I didn't feel that the speed was so slow for one image, so it doesn't change. (Experience about 5 seconds)
――I would like to read the source code, learn the original image data, and try again. ――I want to try recognition directly from the video or camera. (If there is an easy way to build on Mac or CentOS) --There is too little information translated into Japanese or Japanese. I want a community where teachers or each other can teach.
Recommended Posts