This article is the 5th day of Furukawa Lab Advent_calendar.
Various frameworks such as PyTorch, Chainer, Keras, TensorFlow have appeared, and it is said that anyone can easily use Deep Learning. For those who actually use Deep Learning, it may seem easy to just move it. However, it's more difficult for people who don't use Python much than deep learning. In my sense, running Deep Learning is like riding a bicycle. People who can ride a bicycle once say, "It's easy to ride a bicycle" or "You can ride other bicycles in the same way, right?" I feel like "what are you talking about?"
Furthermore, when using Deep Learning, the skills required differ depending on how far you want to go, as shown in the figure below, which is one of the reasons why the hurdles for using Deep are raised.
In this article, I will explain the path of the 2nd Step that I actually did to help you ride a bicycle called Deep Learning.
This time I will use Chainer. Let's add Chainer for that.
$ pip install chainer
$ pip install chainercv
It works like the following.
python
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from chainercv.visualizations import vis_bbox
from chainercv.datasets import voc_bbox_label_names
from chainercv.links import FasterRCNNVGG16
#Label to use (this time the default one)
label_names = voc_bbox_label_names
#Reading data "'./fish/test.Make "jpeg" your favorite image file
test_data = Image.open('./fish/test.jpg')
test_data = np.asarray(test_data).transpose(2, 0, 1).astype(np.float32)
#Model construction, model uses trained voc07 for the time being
model_frcnn = FasterRCNNVGG16(n_fg_class=len(voc_bbox_label_names), pretrained_model='voc07')
#Forecast
bboxes, labels, scores = model_frcnn.predict([test_data])
predict_result = [test_data, bboxes[0], labels[0], scores[0]]
#Drawing the result
res = predict_result
fig = plt.figure(figsize=(6, 6))
ax = fig.subplots(1, 1)
line = 0.0
vis_bbox(res[0], res[1][res[3]>line], res[2][res[3]>line], res[3][res[3]>line], label_names=label_names, ax=ax)
plt.show()
I was able to recognize it well!
Next, I put in a nodoguro image and tried it.
Of course, if you keep the default, there is no label of Nodoguro and it will not work. So I do Fine-turning to make it a Nodoguro specialized classifier. I will skip the detailed explanation of Fine-turning, but the point is that the trained model is additionally trained.
Since learning data is required for additional learning, let's create learning data. I recommend the one called labelImg. How to put it in and how to use it is written in the README of the github site, so I will explain only a simple and simple flow for the time being. First, add the one you need to run labelImg.
$ brew install qt # Install qt-5.x.x by Homebrew
$ brew install libxml2
$ pip3 install pyqt5 lxml # Install qt and lxml by pip
$ make qt5py3
I will do it.
I don't think there is anything to be careful about, but it's a place to operate in the cloned directory. I get an error like No such file or directory
$ python3 labelImg.py
When you execute labelImg.py
, the following screen will appear.
Open the image with open and enter "nodoguro" in the label on the right
You can select the range by pressing the w
key, so select Nodoguro.
Then you can label it like this.
You can also attach two like this.
Finally, press the save button to create an xml file. This file contains information about where the label or border is located.
Please number the Falui names like ʻimage_1.jpg, ʻimage_2.jpg
.
After that, create a file with the label name named classes.txt
in a bulleted list.
python
nodoguro
iwashi
cat
This is the end of learning data creation! The points to be careful are "to make the image size uniform" and "to make two or more labels". If there was only one type of label, it didn't work when learning.
NODOGURO turning Now that we have the training data, let's actually learn it. I used Imagenet for the trained model. This time, we will additionally learn 7 images.
The directory structure looks like the following.
sample/
├ fish/
│ ├ res_images/
│ │ ├ images.npy
│ │ ├ bounding_box_data.npy
│ │ └ object_ids.npy
│ ├ classes.txt
│ ├ image_1.jpg
│ ├ image_1.xml
│ ├ ...
│ ├ image_7.xml
│ └ test.jpg
├ out/
├ learn.py
├ predict.py
└ xml2numpyarray.py
It was convenient to make it in the form of numpyarray before training this time, so I converted it using the following code. If an import error occurs, please use pip.
python
import matplotlib.pyplot as plt
import numpy as np
import glob
import os
import cv2
from PIL import Image
import xmltodict
# Global Variables
classes_file = 'fish/classes.txt'
data_dir = 'fish'
classes = list()
with open(classes_file) as fd:
for one_line in fd.readlines():
cl = one_line.split('\n')[0]
classes.append(cl)
print(classes)
def getBBoxData(anno_file, classes, data_dir):
with open(anno_file) as fd:
pars = xmltodict.parse(fd.read())
ann_data = pars['annotation']
print(ann_data['filename'])
# read image
img = Image.open(os.path.join(data_dir, ann_data['filename']))
img_arr = np.asarray(img).transpose(2, 0, 1).astype(np.float32)
bbox_list = list()
obj_names = list()
for obj in ann_data['object']:
bbox_list.append([obj['bndbox']['ymin'], obj['bndbox']['xmin'], obj['bndbox']['ymax'], obj['bndbox']['xmax']])
obj_names.append(obj['name'])
bboxs = np.array(bbox_list, dtype=np.float32)
obj_names = np.array(obj_names)
obj_ids = np.array(list(map(lambda x:classes.index(x), obj_names)), dtype=np.int32)
return {'img':img, 'img_arr':img_arr, 'bboxs':bboxs, 'obj_names':obj_names, 'obj_ids':obj_ids}
def getBBoxDataSet(data_dir, classes):
anno_files = glob.glob(os.path.join(data_dir, '*.xml'))
img_list = list()
bboxs = list()
obj_ids = list()
# imgs = np.zeros([4, 3, 189, 267])
# num = 0
for ann_file in anno_files:
ret = getBBoxData(anno_file=ann_file, classes=classes, data_dir=data_dir)
print(ret['img_arr'].shape)
img_list.append(ret['img_arr'])
# imgs[num] = ret['img_arr']
bboxs.append(ret['bboxs'])
obj_ids.append(ret['obj_ids'])
imgs = np.array(img_list)
return (imgs, bboxs, obj_ids)
imgs, bboxs, obj_ids = getBBoxDataSet(data_dir=data_dir, classes=classes)
np.save(os.path.join(data_dir, 'images.npy'), imgs)
np.save(os.path.join(data_dir, 'bounding_box_data.npy'), bboxs)
np.save(os.path.join(data_dir, 'object_ids.npy'), obj_ids)
Run with the following code
python
import os
import numpy as np
import chainer
import random
from chainercv.chainer_experimental.datasets.sliceable import TupleDataset
from chainercv.links import FasterRCNNVGG16
from chainercv.links.model.faster_rcnn import FasterRCNNTrainChain
from chainer.datasets import TransformDataset
from chainercv import transforms
from chainer import training
from chainer.training import extensions
HOME = './'
data_dir = os.path.join(HOME, './fish/res_images')
file_img_set = os.path.join(data_dir, 'images.npy')
file_bbox_set = os.path.join(data_dir, 'bounding_box_data.npy')
file_object_ids = os.path.join(data_dir, 'object_ids.npy')
file_classes = os.path.join(data_dir, 'classes.txt')
#Data set loading
imgs = np.load(file_img_set)
bboxs = np.load(file_bbox_set, allow_pickle=True)
objectIDs = np.load(file_object_ids, allow_pickle=True)
#Read label information
classes = list()
with open(file_classes) as fd:
for one_line in fd.readlines():
cl = one_line.split('\n')[0]
classes.append(cl)
dataset = TupleDataset(('img', imgs), ('bbox', bboxs), ('label', objectIDs))
N = len(dataset)
N_train = (int)(N*0.9)
N_test = N - N_train
print('total:{}, train:{}, test:{}'.format(N, N_train, N_test))
#Network construction
faster_rcnn = FasterRCNNVGG16(n_fg_class=len(classes), pretrained_model='imagenet')
faster_rcnn.use_preset('evaluate')
model = FasterRCNNTrainChain(faster_rcnn)
#GPU settings(Not used this time)
gpu_id = -1
# chainer.cuda.get_device_from_id(gpu_id).use()
# model.to_gpu()
#Set how to optimize
optimizer = chainer.optimizers.MomentumSGD(lr=0.001, momentum=0.9)
optimizer.setup(model)
optimizer.add_hook(chainer.optimizer_hooks.WeightDecay(rate=0.0005))
#Data preparation
class Transform(object):
def __init__(self, faster_rcnn):
self.faster_rcnn = faster_rcnn
def __call__(self, in_data):
img, bbox, label = in_data
_, H, W = img.shape
img = self.faster_rcnn.prepare(img)
_, o_H, o_W = img.shape
scale = o_H / H
bbox = transforms.resize_bbox(bbox, (H, W), (o_H, o_W))
# horizontally flip
img, params = transforms.random_flip(
img, x_random=True, return_param=True)
bbox = transforms.flip_bbox(
bbox, (o_H, o_W), x_flip=params['x_flip'])
return img, bbox, label, scale
idxs = list(np.arange(N))
random.shuffle(idxs)
train_idxs = idxs[:N_train]
test_idxs = idxs[N_train:]
#Various settings for learning
train_data = TransformDataset(dataset[train_idxs], Transform(faster_rcnn))
train_iter = chainer.iterators.SerialIterator(train_data, batch_size=1)
test_iter = chainer.iterators.SerialIterator(dataset[test_idxs], batch_size=1, repeat=False, shuffle=False)
updater = chainer.training.updaters.StandardUpdater(train_iter, optimizer, device=gpu_id)
n_epoch = 20
out_dir = './out'
trainer = training.Trainer(updater, (n_epoch, 'epoch'), out=out_dir)
step_size = 100
trainer.extend(extensions.snapshot_object(model.faster_rcnn, 'snapshot_model.npz'), trigger=(n_epoch, 'epoch'))
trainer.extend(extensions.ExponentialShift('lr', 0.1), trigger=(step_size, 'iteration'))
log_interval = 1, 'epoch'
plot_interval = 1, 'epoch'
print_interval = 1, 'epoch'
trainer.extend(chainer.training.extensions.observe_lr(), trigger=log_interval)
trainer.extend(extensions.LogReport(trigger=log_interval))
trainer.extend(extensions.PrintReport(['iteration', 'epoch', 'elapsed_time', 'lr', 'main/loss', 'main/roi_loc_loss', 'main/roi_cls_loss', 'main/rpn_loc_loss', 'main/rpn_cls_loss', 'validation/main/map', ]), trigger=print_interval)
trainer.extend(extensions.PlotReport(['main/loss'], file_name='loss.png', trigger=plot_interval), trigger=plot_interval)
trainer.extend(extensions.dump_graph('main/loss'))
#Learning
trainer.run()
As a parameter to set here ・ At gpu (gpu is not used this time)
python
# chainer.cuda.get_device_from_id(gpu_id).use()
# model.to_gpu()
・ At the place of optimizer
python
optimizer = chainer.optimizers.MomentumSGD(lr=0.001, momentum=0.9)
optimizer.setup(model)
optimizer.add_hook(chainer.optimizer_hooks.WeightDecay(rate=0.0005))
・ Number of learning
python
n_epoch = 20
step_size = 100
Will be.
There are many other things such as batch_size
and how many test data to make (N_train = (int) (N * 0.9) `` N_test = N --N_train
), but for the time being, the above three About.
By the way, the trained network is saved in a file called ʻout / snapshot_model.npz`.
I actually recognized the blackthroat seaperch. Only those with a Score of 0.9 or higher are recognized.
python
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from chainercv.visualizations import vis_bbox
from chainercv.links import FasterRCNNVGG16
#Label reading
classes = list()
with open('./fish/classes.txt') as fd:
for one_line in fd.readlines():
cl = one_line.split('\n')[0]
classes.append(cl)
#Read test data
test_data = Image.open('./fish/test.jpg')
test_data = np.asarray(test_data).transpose(2, 0, 1).astype(np.float32)
#Load the trained model
pretrain_model = 'out/snapshot_model.npz'
#Network construction
model_frcnn = FasterRCNNVGG16(n_fg_class=len(classes), pretrained_model=pretrain_model)
#Forecast
bboxes, labels, scores = model_frcnn.predict([test_data])
predict_result = [test_data, bboxes[0], labels[0], scores[0]]
#Score is 0.Threshold setting so as not to recognize those under 9
line = 0.9
#drawing
res = predict_result
fig = plt.figure(figsize=(6, 6))
ax = fig.subplots(1, 1)
vis_bbox(res[0], res[1][res[3]>line], res[2][res[3]>line], res[3][res[3]>line], label_names=classes, ax=ax)
plt.show()
The result is here.
I was able to recognize it properly!
You can also print the number recognized by print (np.sum (labels [0] == 0))
.
This time, I tried fine-turning with a blackthroat seaperch to detect throat groves. It was pretty easy when I finished. Next, all you have to do is change the nodoguro to your favorite image, so it's relatively easy to implement. However, in order to actually realize highly accurate detection and counting, it is difficult to rework the network structure and problem settings in the first place, such as what to do with the overlapping part and what to do with the rotation. It is difficult to bring it to the research level or product level, but I think that through this implementation, you can understand that it is relatively easy to "play with Deep for the time being". think.
Most of the time, I referred to this site. http://chocolate-ball.hatenablog.com/entry/2018/05/23/012449
Recommended Posts