Background In [OpenCV] [C ++] I tried to detect multiple using template matching, I used OpenCV's Template Matching to detect Goomba. I scan the prepared image of Goomba from top to bottom to calculate the area with similar shape, but it seems that the upper cloud is similar to Goomba, and when the cloud and Goomba come out together, the cloud The one was detected first. It seems that it is possible to roughly trim areas with high similarity and then use the template image and histogram or background subtraction to make a judgment, but this time I changed my mind and tried object detection using yolov3. I will try.
Device
Environment
I thought that I installed the necessary libraries, and when I looked at pytorch official, the story was proceeding based on anaconda. Until now, I used to create a virtual environment with python3 -m venv [envname]
and install the necessary packages in it using pip, but Deep Learning libraries have dependencies on other packages. It seems quite likely, so I will use anaconda.
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
The environment of cuda is still 10.0
, but I could learn without any problem even if I set the version of cudatoolkit to 10.1. If the settings around the GPU are slightly different, an error will occur at the stage of operation, so if it is true, it is better to match.
pytorch + yolov3 Use PyTorch-YOLOv3. git clone to download the dataset and weight file. It's coco, so it's quite heavy.
git clone https://github.com/eriklindernoren/PyTorch-YOLOv3
cd PyTorch-YOLOv3/
sudo pip3 install -r requirements.txt
cd weights/
bash download_weights.sh
cd data/
bash get_coco_dataset.sh
Then check if it works.
python3 test.py --weights_path weights/yolov3.weights
If that doesn't seem to be a problem, check if the object can be detected with the default settings.
Store the image you want to detect in data/samples
and execute it with detect.py
.
python3 detect.py --image_folder data/samples/
A output
folder is created directly under the work folder, so if you look at the contents, the image marked by object detection is stored.
Annotation By default there is a trained weight file that uses coco as the dataset and can detect 80 objects. This time, I will write a machine learning method when there is an object that the individual developer wants to detect.
As a flow, prepare 100 images with the target object, Annotate (annotate, label) what object is in which place in each image, and finally execute with train.py
And learn.
The data set uses the video of Super Mario Bros (NES) Level 1-1 as written in Background. I'm worried about what data to use for machine learning, but one conclusion is that it seems better to use a game video that seems to be copyrightable. This is because fixed characters that are more fixed in the game than the images taken in reality appear repeatedly, and there are only a few patterns in the shape, so I think it's easy to get learning results immediately. Also, since it is a video, it can be captured in about 30 to 60 images per second, so it can be easily acquired.
So, use the labelme tool to annotate.
Since the work PC is a Mac, install Qt5 with homebrew and install the main body with pip.
brew install pyqt
pip install labelme
Before starting labelme, prepare the training data. This time, I extracted the 23-28 second video of Super Mario Bros (NES) Level 1-1 and put it in the serial number image by the following method.
wget https://raw.githubusercontent.com/wkentaro/dotfiles/f3c5ad1f47834818d4f123c36ed59a5943709518/local/bin/video_to_images
pip install imageio imageio-ffmpeg tqdm
python video_to_images your_video.mp4
When executed, a your_video
folder will be created and the serial number images will be stored in it.
Now that the training data is ready, specify the folder path and start labelme.
labelme ./your_video
After that, specify the label and area of the object. There were about 150 sheets. When learning with yolov3, you only need to specify the range with bounding box and rectangle, but since you may handle DNN to be learned with segmentation later, you uselessly specify the area: sweat_smile:
The label and area attached to the object are generated as a json file in the folder where the image is loaded.
When you're done, keep it in a compressed file.
tar -zcvf output.tar.gz ./your_video
Training
Now that the training data is ready, let's try machine learning.
First, create a config file. Here, if you include __ignore__
, the number of classes is 3, so set it with <num-classes> = 3
.
After execution, config/yolov3-custom.cfg
will be output.
cd config/
bash create_custom_model.sh <num-classes>
Open config/custom.data
and set the number of classes.
config/custom.data
classes=3
train=data/custom/train.txt
valid=data/custom/valid.txt
names=data/custom/classes.names
Open data/custom/classes.names
to list the object names. I think that only train
is written by default.
data/custom/classes.names
__ignore__
mario
kuribo
Next, place the compressed training data folder directly under it and decompress it.
tar -zxvf output.tar.gz
Move the images from the unzipped folder to data/custom/images /
.
By default, there is a train image in data/custom/images /
, so delete it.
rm data/custom/images/train.jpg
mv ./your_video/*.jpg data/custom/images/
Next, based on the json file in which the label information is written, rewrite it as [class ID] [object center coordinate x] [object center coordinate y] [object width] [object height]
.. At this time, the ratio [0,1.0] to the entire image is output instead of the original data. The rewritten file will be generated in data/custom/labels /
.
import os
import json
import numpy as np
def treat(filepath, classes):
with open(filepath, "r") as fin:
src = json.load(fin)
dst = []
for item in src["shapes"]:
txt = item["label"]
#Calculate the average value of each coordinate
cx, cy = np.mean(np.array(item["points"]), axis=0)
#1 for the total length of the image.Calculate the ratio when 0 is set
cx_norm = cx / src["imageWidth"]
cy_norm = cy / src["imageHeight"]
#Calculate the width and height of the object
min_x, min_y = np.min(np.array(item["points"]), axis=0)
max_x, max_y = np.max(np.array(item["points"]), axis=0)
rect_width = (max_x - min_x) / src["imageWidth"]
rect_height = (max_y - min_y) / src["imageHeight"]
#Search for class ID
idx = list(filter(lambda x: x[1] == txt, classes))[0][0]
#Arrange and format
dst.append([idx, cx_norm, cy_norm, rect_width, rect_height])
return dst
Finally, write the file paths stored in data/custom/images /
to data/custom/train.txt
(for training) and data/custom/valid.txt
(for evaluation) respectively. I think the ratio is just right: (for training): (for evaluation) = 8: 2
.
train.txt
data/custom/images/00000000.jpg
data/custom/images/00000001.jpg
data/custom/images/00000002.jpg
data/custom/images/00000003.jpg
data/custom/images/00000004.jpg
data/custom/images/00000005.jpg
data/custom/images/00000006.jpg
data/custom/images/00000007.jpg
data/custom/images/00000008.jpg
data/custom/images/00000009.jpg
data/custom/images/00000010.jpg
data/custom/images/00000011.jpg
data/custom/images/00000012.jpg
data/custom/images/00000013.jpg
data/custom/images/00000014.jpg
data/custom/images/00000015.jpg
...
Now that the settings are complete, run train.py
.
python3 train.py \
--model_def config/yolov3-custom.cfg \
--data_config config/custom.data \
--batch_size 2 \
--img_size 32 \
--epochs 200 \
--pretrained_weights weights/darknet53.conv.74
The default for batch_size
is 8, and img_size
is 416, but if the machine performance is weak, a memory error will occur. My PC was out because the GPU memory is only 4GB. In that case, lowering the value will allow the learning to proceed normally.
As for the learning result, yolov3_ckpt_ {number of epochs} .pth
is output to checkpoints/
for each epoch.
Detect Objects
First, prepare the image data for object detection. Here, we will use the video frame from the start to the goal of Super Mario Bros (NES) Level 1-1. To convert a video to a serial number image, use the video_to_images
script in the same way as you created with the training data. We were able to acquire a total of 1501 images.
So, let's try detection using detect.py
.
python3 detect.py --image_folder ./data/mario_1-1/ \
--weights_path ./checkpoints/yolov3_ckpt_199.pth \
--model_def config/yolov3-custom.cfg \
--class_path data/custom/classes.names
Specify the path of the folder containing the image you want to test with --image_folder
, and use the file generated by learning for --weights_path
. The result is stored in the output
file.
(239) Image: './data/mario_1-1/00000239.jpg'
+ Label: mario, Conf: 0.99997
...
After that, convert the serial number image to a video and you're done.
At first, I converted it with ffmpeg
as ffmpeg -r 30 -i% 8d.png -vcodec libx264 -pix_fmt yuv420p -r 60 out.mp4
, but the image quality was extremely degraded.
(See Generate video from serial number image with ffmpeg / Generate serial number image from video ~ Prevent frame dropping ~)
So I converted it to a video using OpenCV.
import cv2
import os
def main():
is_png = lambda x : os.path.splitext(x)[1] == ".png "
imgs = list(filter(is_png, os.listdir()))
imgs.sort()
width = 480
height = 270
fps = 30
fmt = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
writer = cv2.VideoWriter('output.mp4', fmt, fps, (width, height))
#resize
for img in imgs:
mat = cv2.imread(img)
dst = cv2.resize(mat, dsize=(width, height))
writer.write(dst)
writer.release()
if __name__ == "__main__":
main()
Consequence
↓ Click to watch a video of youtube detecting from the start to the goal.
――I haven't learned Fire Mario, but I identify it as Mario because it looks similar. ――Since Chibi Mario has not learned, it may be mistaken for Goomba probably because of its similar height. ――I think that the accuracy will be even better if you target objects such as blocks, treasure chests, and koopa troopa.
Reference
Recommended Posts