In this article, I will explain the following while following the steps to fine-tun the Illustration2Vec model using the animeface-character dataset and train a model that can classify 146 different character face images with 90% or better accuracy. I will.
With Chainer
--How to create a dataset object --How to divide the dataset for training and verification --How to bring trained weights and fine-tuning with new tasks -(Bonus: How to write a dataset class in full scratch)
The environment used is as follows.
The library used is as follows.
--Chainer> = 2.0.1 (It has been confirmed to work with the latest 4.1.0) --CuPy> = 1.0.1 (It has been confirmed to work with the latest 4.1.0)
Chainer is backwards compatible
** We will show a concrete example of how to procure a dataset that is not prepared in Chainer from the outside and use it for training the network described in Chainer. The basic steps are almost the same as the chapter on extending the CIFAR10 dataset class described in Chainer v4: Tutorial for Beginners.
This time, I will explain ** how to use pre-trained network weights as initial values using a dataset of domains similar to the target data **. If you want to fine-tun a network distributed in the form of Caffe's .caffe model, you can apply almost the same procedure as this article.
This article is an output of originally written in Jupyter notebook to Markdown.
First, download the dataset. This time, the face area thumbnail of the anime character distributed by nagadomi, who is the Kaggle Grand Master, at here Use a dataset.
%%bash
if [ ! -d animeface-character-dataset ]; then
curl -L -O http://www.nurs.or.jp/~nagadomi/animeface-character-dataset/data/animeface-character-dataset.zip
unzip animeface-character-dataset.zip
rm -rf animeface-character-dataset.zip
fi
Enter the library you want to use with pip. ** The cupy-cuda90
part is based on the CUDA version of your environment cupy-cuda80
(for CUDA 8.0 environment), cupy-cuda90
(for CUDA 9.0 environment), cupy- Select the appropriate one from cuda91
(for CUDA 9.1 environment). ) **
%%bash
pip install chainer
pip install cupy-cuda80 # or cupy-cuda90 or cupy-cuda91
pip install Pillow
pip install tqdm
This time, using the face images of various characters included in animeface-character-dataset, when an unknown character face image is input, it outputs which character's face in the known class list seems to be. I would like to train a network that does.
At that time, instead of training a network with parameters initialized at random, a method of fine-tuning with the target dataset based on a model trained with data of similar domains in advance ** I will try.
The data set used for learning this time is a data set containing many images as shown below, and each character is divided into folders in advance. Therefore, this time too, it will be an orthodox image classification problem.
000_hatsune_miku | 002_suzumiya_haruhi | 007_nagato_yuki | 012_asahina_mikuru |
---|---|---|---|
Here's how to create a dataset object using a class called LabeledImageDataset
, which is often used in image classification problems. First, prepare using standard Python functions.
First, get the list of paths to the image file. The image files are divided into directories for each character under ʻanimeface-character-dataset / thumb. In the code below, if there is a file called ʻignore
in the folder, the image in that folder is ignored.
import os
import glob
from itertools import chain
#Image folder
IMG_DIR = 'animeface-character-dataset/thumb'
#Folder for each character
dnames = glob.glob('{}/*'.format(IMG_DIR))
#Image file path list
fnames = [glob.glob('{}/*.png'.format(d)) for d in dnames
if not os.path.exists('{}/ignore'.format(d))]
fnames = list(chain.from_iterable(fnames))
Next, in the image file path, the part of the directory name that contains the image represents the character name, so use that to create an ID that is unique to each character for each image.
#Give each a unique ID from the folder name
labels = [os.path.basename(os.path.dirname(fn)) for fn in fnames]
dnames = [os.path.basename(d) for d in dnames
if not os.path.exists('{}/ignore'.format(d))]
labels = [dnames.index(l) for l in labels]
Now let's create the base dataset object. It's easy to do, just pass a list of tuples with the file path and its labels to LabeledImageDataset
. This is an iterator that returns tuples like (img, label)
.
from chainer.datasets import LabeledImageDataset
#Data set creation
d = LabeledImageDataset(list(zip(fnames, labels)))
Next, let's use a convenient function called Transform Dataset
provided by Chainer. This is a wrapper class that takes a dataset object and a function that represents the conversion to each data. By using this, you can prepare a part for data augmentation, preprocessing, etc. outside the dataset class.
from chainer.datasets import TransformDataset
from PIL import Image
width, height = 160, 160
#Image resize function
def resize(img):
img = Image.fromarray(img.transpose(1, 2, 0))
img = img.resize((width, height), Image.BICUBIC)
return np.asarray(img).transpose(2, 0, 1)
#Conversion to each data
def transform(inputs):
img, label = inputs
img = img[:3, ...]
img = resize(img.astype(np.uint8))
img = img - mean[:, None, None]
img = img.astype(np.float32)
#Randomly flip left and right
if np.random.rand() > 0.5:
img = img[..., ::-1]
return img, label
#Make a dataset with conversion
td = TransformDataset(d, transform)
By doing this, you can create a dataset object that receives a tuple like (img, label)
returned by the LabeledImageDataset
object d
, passes it through the transform
function, and then returns it. I did.
Now let's split this into two partial datasets, one for training and one for validation. This time, we'll use 80% of the entire dataset for training and the remaining 20% for validation. With split_dataset_random
, the data in the dataset will be shuffled once and then split at the specified breaks.
from chainer import datasets
train, valid = datasets.split_dataset_random(td, int(len(d) * 0.8), seed=0)
Dataset partitioning also provides several other functions, such as get_cross_validation_datasets_random
, which returns multiple different training and validation dataset pairs for cross-validation. Have a look at this. : SubDataset
By the way, mean
used in the conversion is the average image of the images included in the learning data set used this time. Let's calculate this.
import matplotlib.pyplot as plt
import numpy as np
from tqdm import tqdm_notebook
#Calculate if the average image is uncalculated
if not os.path.exists('image_mean.npy'):
#I want to calculate the average with a version of the training dataset that does not bite the conversion
t, _ = datasets.split_dataset_random(d, int(len(d) * 0.8), seed=0)
mean = np.zeros((3, height, width))
for img, _ in tqdm_notebook(t, desc='Calc mean'):
img = resize(img[:3].astype(np.uint8))
mean += img
mean = mean / float(len(d))
np.save('image_mean', mean)
else:
mean = np.load('image_mean.npy')
Let's display the average image calculated as a trial.
#Display of average image
%matplotlib inline
plt.imshow(mean.transpose(1, 2, 0) / 255)
plt.show()
It's kind of scary ...
When subtracting the average from each image, the average for each pixel is used, so calculate the average pixel value (RGB) of this average image.
mean = mean.mean(axis=(1, 2))
Next, let's define the model to be trained. Here, based on the network used in Illustration2Vec, which performs tag prediction and feature extraction with a model learned using many 2D illustration images, the last one. The new model is the one with two layers removed and two randomly initialized fully connected layers added.
At the time of learning, after initializing the part of the third layer and below from the output with the pre-trained weight of Illustration2Vec, the weight of that part is fixed. That is, ** train only the two newly added fully connected layers. ** **
First, download the trained parameters of the distributed Illustration2Vec model.
%%bash
if [ ! -f illust2vec_ver200.caffemodel ]; then
curl -L -O https://github.com/rezoo/illustration2vec/releases/download/v2.0.0/illust2vec_ver200.caffemodel
fi
This trained parameter is provided in the form of a caffe model, but Chainer has the ability to load Caffe's trained model very easily (CaffeFunction
. /reference/generated/chainer.links.caffe.CaffeFunction.html#chainer.links.caffe.CaffeFunction)), which you can use to load the parameters and model structure. However, it takes time to load, so save the Chain
object obtained once loaded to a file using the Python standard pickle
. This will speed up loading from the next time.
The actual network code looks like this:
import dill
import chainer
import chainer.links as L
import chainer.functions as F
from chainer import Chain
from chainer.links.caffe import CaffeFunction
from chainer import serializers
class Illust2Vec(Chain):
CAFFEMODEL_FN = 'illust2vec_ver200.caffemodel'
def __init__(self, n_classes, unchain=True):
w = chainer.initializers.HeNormal()
model = CaffeFunction(self.CAFFEMODEL_FN) #Load and save the Caffe Model. (This will take some time)
del model.encode1 #Delete unnecessary layers to save memory.
del model.encode2
del model.forwards['encode1']
del model.forwards['encode2']
model.layers = model.layers[:-2]
super(Illust2Vec, self).__init__()
with self.init_scope():
self.trunk = model #Include the original Illust2Vec model as a trunk in this model.
self.fc7 = L.Linear(None, 4096, initialW=w)
self.bn7 = L.BatchNormalization(4096)
self.fc8 = L.Linear(4096, n_classes, initialW=w)
def __call__(self, x):
h = self.trunk({'data': x}, ['conv6_3'])[0] #Conv6 of the original Illust2 Vec model_Take the output of 3.
h.unchain_backward()
h = F.dropout(F.relu(self.bn7(self.fc7(h)))) #The following layers are newly added layers.
return self.fc8(h)
n_classes = len(dnames)
model = Illust2Vec(n_classes)
model = L.Classifier(model)
/home/mitmul/chainer/chainer/links/caffe/caffe_function.py:165: UserWarning: Skip the layer "encode1neuron", since CaffeFunction does notsupport Sigmoid layer
'support %s layer' % (layer.name, layer.type))
/home/mitmul/chainer/chainer/links/caffe/caffe_function.py:165: UserWarning: Skip the layer "loss", since CaffeFunction does notsupport SigmoidCrossEntropyLoss layer
'support %s layer' % (layer.name, layer.type))
The description h.unchain_backward ()
has appeared in the part of __call__
. ʻUnchain_backward is called from some intermediate output
Variable` etc. in the network and disconnects all network nodes before that point. Therefore, during training, the error will not be transmitted to the layers before this is called, and as a result, the parameters will not be updated.
As mentioned above
At the time of learning, after initializing the part of the third layer and below from the output with the pre-trained weight of Illustration2Vec, the weight of that part is fixed.
The code to do this is this h.unchain_backward ()
.
For more information on how this works, see this article that explains how Chainer's autograd works with Define-by-Run. : Create 1-file Chainer
Now, let's train using this dataset and model. First, load the required modules.
from chainer import iterators
from chainer import training
from chainer import optimizers
from chainer.training import extensions
from chainer.training import triggers
from chainer.dataset import concat_examples
Next, set the learning parameters. This time
--Batch size 64 --The learning rate starts from 0.01 and increases 0.1 times at the 10th epoch. ――20 Epoch ends learning
will do.
batchsize = 64
gpu_id = 0
initial_lr = 0.01
lr_drop_epoch = 10
lr_drop_ratio = 0.1
train_epoch = 20
Below is the code to learn.
train_iter = iterators.MultiprocessIterator(train, batchsize)
valid_iter = iterators.MultiprocessIterator(
valid, batchsize, repeat=False, shuffle=False)
optimizer = optimizers.MomentumSGD(lr=initial_lr)
optimizer.setup(model)
optimizer.add_hook(chainer.optimizer.WeightDecay(0.0001))
updater = training.StandardUpdater(
train_iter, optimizer, device=gpu_id)
trainer = training.Trainer(updater, (train_epoch, 'epoch'), out='AnimeFace-result')
trainer.extend(extensions.LogReport())
trainer.extend(extensions.observe_lr())
#The value you want to write to standard output
trainer.extend(extensions.PrintReport(
['epoch',
'main/loss',
'main/accuracy',
'val/main/loss',
'val/main/accuracy',
'elapsed_time',
'lr']))
#Loss plot automatically saved every epoch
trainer.extend(extensions.PlotReport(
['main/loss',
'val/main/loss'],
'epoch', file_name='loss.png'))
#Accuracy plots are also automatically saved every epoch
trainer.extend(extensions.PlotReport(
['main/accuracy',
'val/main/accuracy'],
'epoch', file_name='accuracy.png'))
#Extension that validates by setting the model's train property to False
trainer.extend(extensions.Evaluator(valid_iter, model, device=gpu_id), name='val')
#Lr learning rate for each specified epoch_drop_Multiply the ratio
trainer.extend(
extensions.ExponentialShift('lr', lr_drop_ratio),
trigger=(lr_drop_epoch, 'epoch'))
trainer.run()
epoch main/loss main/accuracy val/main/loss val/main/accuracy elapsed_time lr
1 1.58266 0.621792 0.623695 0.831607 29.4045 0.01
2 0.579938 0.835989 0.54294 0.85179 56.3893 0.01
3 0.421797 0.877897 0.476766 0.876872 83.9976 0.01
4 0.3099 0.909251 0.438246 0.879637 113.476 0.01
5 0.244549 0.928394 0.427892 0.884571 142.931 0.01
6 0.198274 0.938638 0.41589 0.893617 172.42 0.01
7 0.171127 0.946709 0.432277 0.89115 201.868 0.01
8 0.146401 0.953125 0.394634 0.902549 231.333 0.01
9 0.12377 0.964404 0.409338 0.894667 260.8 0.01
10 0.109239 0.967198 0.400371 0.907746 290.29 0.01
11 0.0948708 0.971337 0.378603 0.908831 319.742 0.001
12 0.0709512 0.98065 0.380891 0.90786 349.242 0.001
13 0.0699093 0.981892 0.384257 0.90457 379.944 0.001
14 0.0645318 0.982099 0.370053 0.908008 410.963 0.001
15 0.0619039 0.983547 0.379178 0.908008 441.941 0.001
16 0.0596897 0.983646 0.375837 0.911709 472.832 0.001
17 0.0579783 0.984789 0.379593 0.908008 503.836 0.001
18 0.0611943 0.982202 0.378177 0.90842 534.86 0.001
19 0.061885 0.98303 0.373961 0.90569 565.831 0.001
20 0.0548781 0.986341 0.3698 0.910624 596.847 0.001
Learning was completed in less than 6 minutes. The process of going to standard output was like the above. In the end, you can get more than 90% accuracy for the verification data set. Now, let's display the loss curve and accuracy curve in the learning process saved as an image file.
from IPython.display import Image
Image(filename='AnimeFace-result/loss.png')
Image(filename='AnimeFace-result/accuracy.png')
I feel that it has converged safely.
Finally, let's take some images from the validation dataset and look at the individual classification results.
%matplotlib inline
import matplotlib.pyplot as plt
from PIL import Image
from chainer import cuda
chainer.config.train = False
for _ in range(10):
x, t = valid[np.random.randint(len(valid))]
x = cuda.to_gpu(x)
y = F.softmax(model.predictor(x[None, ...]))
pred = os.path.basename(dnames[int(y.data.argmax())])
label = os.path.basename(dnames[t])
print('pred:', pred, 'label:', label, pred == label)
x = cuda.to_cpu(x)
x += mean[:, None, None]
x = x / 256
x = np.clip(x, 0, 1)
plt.imshow(x.transpose(1, 2, 0))
plt.show()
pred: 097_kamikita_komari label: 097_kamikita_komari True
pred: 127_setsuna_f_seiei label: 127_setsuna_f_seiei True
pred: 171_ikari_shinji label: 171_ikari_shinji True
pred: 042_tsukimura_mayu label: 042_tsukimura_mayu True
pred: 001_kinomoto_sakura label: 001_kinomoto_sakura True
pred: 090_minase_iori label: 090_minase_iori True
pred: 132_minamoto_chizuru label: 132_minamoto_chizuru True
pred: 106_nia label: 106_nia True
pred: 174_hayama_mizuki label: 174_hayama_mizuki True
pred: 184_suzumiya_akane label: 184_suzumiya_akane True
When I randomly selected 10 images, I was able to answer all of these images correctly.
Finally, save a snapshot for the time being, as it may be used for something someday.
from chainer import serializers
serializers.save_npz('animeface.model', model)
To write a dataset class in full scratch, you can prepare your own class that inherits the chainer.dataset.DatasetMixin
class. The class must have a __len__
method and a get_example
method. For example:
class MyDataset(chainer.dataset.DatasetMixin):
def __init__(self, image_paths, labels):
self.image_paths = image_paths
self.labels = labels
def __len__(self):
return len(self.image_paths)
def get_example(self, i):
img = Image.open(self.image_paths[i])
img = np.asarray(img, dtype=np.float32)
img = img.transpose(2, 0, 1)
label = self.labels[i]
return img, label
This is done by passing a list of image file paths and a list of labels arranged in the corresponding order to the constructor, and if you specify an index with the []
accessor, the image is read from the corresponding path and arranged with the label. It is a dataset class that returns a tuple. For example, you can use it as follows.
image_files = ['images/hoge_0_1.png', 'images/hoge_5_1.png', 'images/hoge_2_1.png', 'images/hoge_3_1.png', ...]
labels = [0, 5, 2, 3, ...]
dataset = MyDataset(image_files, labels)
img, label = dataset[2]
#=> 'images/hoge_2_1.png'The image data read from and its label (2 in this case) are returned.
This object can be passed to Iterator as it is and can be used for learning with Trainer. In other words
train_iter = iterators.MultiprocessIterator(dataset, batchsize=128)
You can create an iterator like this, pass it to the Updater with the Optimizer, pass the updater to the Trainer, and start learning with the Trainer.
In fact, the dataset for use with Chainer's Trainer is ** just a Python list OK **. What this means is that if you can get the length with len ()
and retrieve the element with the []
accessor, you can ** handle it as a dataset object **. For example
data_list = [(x1, t1), (x2, t2), ...]
You can pass this to Iterator by creating a list of tuples such as (data, label)
.
train_iter = iterators.MultiprocessIterator(data_list, batchsize=128)
However, the drawback of this approach is that the entire dataset must be stored in memory before training. To prevent this, [ImageDataset](http://docs.chainer.org/en/stable/reference/generated/chainer.datasets.ImageDataset.html#chainer.datasets.ImageDataset] and [TupleDataset](http: // How to combine docs.chainer.org/en/stable/reference/generated/chainer.datasets.TupleDataset.html#chainer.datasets.TupleDataset) and [LabaledImageDataset](http://docs.chainer.org/en/stable/ There are classes such as reference / generated / chainer.datasets.LabeledImageDataset.html # chainer.datasets.LabeledImageDataset). Please refer to the document for details. http://docs.chainer.org/en/stable/reference/datasets.html#general-datasets
Recommended Posts