This data

Introduction

I wanted to detect the objects of set meals and extract the names of individual dishes, so I tried to detect the objects of the dishes. This time, I thought that it would be difficult to separate the fine types of food only by object detection, so I think it would be possible to detect fine food by using object detection + image classification and two CNNs. I thought, so I decided to separate "cooking detection" and "cooking image classification". This time, I will explain about object detection with its own data set.

The network used is

yolo(darknet)
VGG-16

There are two.

Google Colaboratory I used Google Colaboratory (hereinafter colab) as the main environment used for learning this time. colab is a Python execution environment that allows you to use the GPU for free. It is very convenient when there is no environment because it takes time to learn if there is no GPU environment where you can learn locally. Since the environment required for machine learning is also in place, it does not take time to build the environment. However, if the runtime is disconnected after a continuous operation time of 12 hours, the files on the Colaboratory will also be reset. For more information about Colaboratory, click [Introduction to Python without environment construction! Please see How to use Google Colaboratory in an easy-to-understand manner.

environment

As mentioned above, I mainly used Colaboratory this time, but since it takes time to preprocess for learning on colab, only preprocessing of data is done in the local environment, and colab is used at the time of learning. I did it by uploading and learning. The execution environment actually used this time is

Ubuntu 14.04
Colaboratory

It has become.

YOLO First of all, we will learn for object detection in YOLO and actually perform object detection.

Data set creation

In order to study with YOLO, you need a dataset with the image to be learned and the coordinate data of that image as a label. This time you have to label yourself to learn your own data. You can label it manually, but it will take a lot of time, so this time, the training dataset creation tool for YOLO labelImg I used.

For how to use labelImg, refer to Yolo Learning Dataset Creation Tool: labelImg. You can proceed efficiently.

By the way, the data set created for learning to detect cooking objects this time is

rice --Miso soup --Grilled fish --Salad
ramen

There are five. We labeled about 700 sheets per class. If you want to improve the accuracy, it is better to label and prepare about 1000 sheets per class.

Preparing to learn

This time, we will use darknet's yolo for learning, so we will perform pre-processing for learning.

** 1. Install darknet ** I will download the darknet necessary for learning. The official website of darknet is here. Download according to the official website

$ git clone https://github.com/pjreddie/darknet.git
$ cd darknet

Download as and go to the darknet folder.

** 2. Store the created dataset ** I think there are two types, the image used in labelImg and the txt data containing the created label data. I will move these two to darknet for learning. The hierarchy of image and txt data is

darknet

data

images

images001.jpg images002.jpg ....

labels

images001.txt images002.txt ....

It will be. Even if multiple classes are created with labelImg, they are stored in the same images and laels folder as the other classes.

** 3. Divide into train and test ** You have to divide the dataset stored in data earlier into training data and test data. It takes a lot of time to separate training data and test data with darknet, so process.py I will use .py). With this source code

percentage_test = 20;

The data is randomly divided by test20% and train80% according to the image, and those paths are output with the file names "test.txt" and "train.txt". Put this "process.py" in darknet / data. afterwards,

$ python process.py

Will create "train.txt" and "test.txt". If you want to change the ratio of training and test data, try changing "percentage_test = 20;" to the value you like.

** 4. Learning pre-processing ** We will change various parameters in darknet for learning.

** ・ Class name setting ** Set the name of the class to be learned on darknet. In the files named obj.names and names.list, write the class names to be learned line by line. The structure of the folder is

darknet

data

images

obj.names images001.jpg images002.jpg ....

labels names.list

It will be. In the case of the dish detection used this time

`obj.names/names.list`


rice
miso soup
griled fish
salad
noodle

I will write.

**-Set the number of classes and data path ** It is necessary to set the number of classes and data path as one of the preprocessing of yolo. We will change the contents of cfg / obj.data. The contents of obj.data

`obj.data`


classes=5 #Number of classes
train = data/images/train.txt 
valid = data/images/test.txt 
labels = data/images/obj.names 
backup = backup/

And enter the path of the file created earlier.

Next, change the parameters for each version you want to use. If you want to use yolo v2, duplicate darknet / cfg / yolov2-voc.cfg, and if you want to use v3, duplicate darknet / cfg / yolov3-voc.cfg in cfg. Open that golo-obj.data

3rd line batch=64
4th line subdivision=8
　　　classes=5(Number of classes)
      filters=35(For yolov2(classes + coords + 1) * 5)
                (For yolov3(classes + +5) * 3)

Please change to.

** ・ About the detection of bounding box ** This time I wanted to detect each dish with yolo and then use another network to detect the dishes in detail, so I need to detect the bounding box of the output in order to detect each dish. Please replace darknet / src / image.c with here for the detection of the bounding box. Coordinate data is output in .txt.

** ・ Initial weight ** When learning for the first time, it is easy for the learning results to converge by downloading appropriate weights. For the initial weight of learning yolov2 initial weight file https://pjreddie.com/media/files/darknet19_448.conv.23 initial weight file for yolov3 https://pjreddie.com/media/files/darknet53.conv.74 Download it here and put it in the darknet folder.

Move to colaboratory for learning and detection

At this point, the pre-processing is finally completed, and we will finally start learning on the colaboratory.

** 1. Upload folder ** First of all, I will upload the darknet folder that has been preprocessed this time and labeled and set. There is a 12-hour limit for colaboratory, and all files on colaboratory will be reset. So instead of uploading directly on colab, upload it to Google Drive and mount it on drive on colab to save the trouble of uploading each time (I think it will take a considerable amount of time to upload once .. .). So please log in to your google account and upload each darknet folder ** on the drive you use. It is possible to compress and then upload and decompress on colab, but since the decompression code is not included in the code for colab used this time, if you use the source as it is, upload without compressing Thank you.

** 2. Learn and detect on colab ** For learning and detection, download the ** source code of here as .ipynd, upload it to drive, start it, and execute it from above **. You should be able to learn and detect objects with. Put the image you want to detect in the darknet folder and change the file path and name in the "Perform object detection" part of the source code. (It may not be possible due to the version of cuda etc. downloaded on colab. In that case, please change to the latest version and perform pip etc.) Since it is mounted on the drive, the file path will change for each person depending on the drive layout, etc. ** Please be careful about the file path! ** **

Also, the learning weights output during learning are automatically saved in darknet / backup. If you want to learn the weights once learned, change the weight path of the "Learn with yolo" part to your own ** buckup / yolo-XXX.weights **.

Crop the detected image

If you continue with the ipynd source code, the cropped image will be saved. When executed, the coordinates of the bounding box output at the time of object detection are used to trim the location of the coordinates from the image used for object detection, and the image is saved in the image folder.

Learning individual cooking

This time, image classification was performed by learning using VGG-16, which was subjected to transfer learning using a dataset called UEC-FOOD100. Basically, you can use it by executing it from the top according to the source code. However, this time we have done transfer learning for cooking verification, so please understand that those who learn from 0 will need to create a data set for it. Also, if you want to verify with the weights you created this time, please be careful about the file path.

Actual detection result

It will be the detection of the object area of our food. First is the loss (loss function). キャプチャ.JPG It looks like this. The number of learnings was 4073 and the loss value was "0.09". It was said that it would drop to about 0.06, so I think it would have been okay to study a little more.

Next, here is the actual input of the image to detect the object.

図1.jpg

Isn't it bad? This is an example of something like that. I think it's a good example that we were able to detect rice, misosoup, grilled fish, and the labels we learned this time. The result of detecting these in more detail using Vgg-net transfer learning is as follows.

図2.png

図3.png

図4.png

It looks like this. Since it has been re-recognized as white rice from the object area detection expressed as rice, it seems that the fine classification has been successful from the rough area detection of the purpose this time. However, the last grilled fish has a completely different detection result such as omelet rice.

If you just want to verify using the learned weights

If you want to use the weight of the above cooking object detection result to perform only detection https://drive.google.com/open?id=1F8hbp5OGM5syGDQLpFPxbP1n9yFz804K Unzip this data, upload it to drive, and run Object Detection .ipynd from above. I think we can detect the objects of the dishes we have learned. Also note that you will have to change the ** file path yourself! ** **

References

Counting objects using darknet (YOLO) on cats, space, music, and Google Colab [Image recognition] https://wakuphas.hatenablog.com/entry/2018/09/19/025941#Google-Colaboratory%E3%81%A8%E3%81%AF darknet official website https://pjreddie.com/darknet/yolo/ YOLO V3: Learning original data http://demura.net/misc/14458.html

[PYTHON] Cooking object detection with yolo + image classification