[PYTHON] Categorize face images of anime characters with Chainer

In this article, I will explain the following while following the steps to fine-tun the Illustration2Vec model using the animeface-character dataset and train a model that can classify 146 different character face images with 90% or better accuracy. I will.

With Chainer

--How to create a dataset object --How to divide the dataset for training and verification --How to bring trained weights and fine-tuning with new tasks -(Bonus: How to write a dataset class in full scratch)

The environment used is as follows.

NVIDIA Pascal TITAN X
Ubuntu 16.04
Python 3.6.3

The library used is as follows.

--Chainer> = 2.0.1 (It has been confirmed to work with the latest 4.1.0) --CuPy> = 1.0.1 (It has been confirmed to work with the latest 4.1.0)

Pillow 4.0.0
tqdm 4.14.0

Chainer is backwards compatible

To summarize the contents roughly

** We will show a concrete example of how to procure a dataset that is not prepared in Chainer from the outside and use it for training the network described in Chainer. The basic steps are almost the same as the chapter on extending the CIFAR10 dataset class described in Chainer v4: Tutorial for Beginners.

This time, I will explain ** how to use pre-trained network weights as initial values using a dataset of domains similar to the target data **. If you want to fine-tun a network distributed in the form of Caffe's .caffe model, you can apply almost the same procedure as this article.

This article is an output of originally written in Jupyter notebook to Markdown.

1. Download dataset

First, download the dataset. This time, the face area thumbnail of the anime character distributed by nagadomi, who is the Kaggle Grand Master, at here Use a dataset.

%%bash
if [ ! -d animeface-character-dataset ]; then
    curl -L -O http://www.nurs.or.jp/~nagadomi/animeface-character-dataset/data/animeface-character-dataset.zip
    unzip animeface-character-dataset.zip
    rm -rf animeface-character-dataset.zip
fi

Enter the library you want to use with pip. ** The cupy-cuda90 part is based on the CUDA version of your environment cupy-cuda80 (for CUDA 8.0 environment), cupy-cuda90 (for CUDA 9.0 environment), cupy- Select the appropriate one from cuda91 (for CUDA 9.1 environment). ) **

%%bash
pip install chainer
pip install cupy-cuda80 # or cupy-cuda90 or cupy-cuda91
pip install Pillow
pip install tqdm

2. Check the problem settings

This time, using the face images of various characters included in animeface-character-dataset, when an unknown character face image is input, it outputs which character's face in the known class list seems to be. I would like to train a network that does.

At that time, instead of training a network with parameters initialized at random, a method of fine-tuning with the target dataset based on a model trained with data of similar domains in advance ** I will try.

The data set used for learning this time is a data set containing many images as shown below, and each character is divided into folders in advance. Therefore, this time too, it will be an orthodox image classification problem.

Appropriately extracted data sample

000_hatsune_miku	002_suzumiya_haruhi	007_nagato_yuki	012_asahina_mikuru

3. Create a dataset object

Here's how to create a dataset object using a class called LabeledImageDataset, which is often used in image classification problems. First, prepare using standard Python functions.

First, get the list of paths to the image file. The image files are divided into directories for each character under ʻanimeface-character-dataset / thumb. In the code below, if there is a file called ʻignore in the folder, the image in that folder is ignored.

import os
import glob
from itertools import chain

#Image folder
IMG_DIR = 'animeface-character-dataset/thumb'

#Folder for each character
dnames = glob.glob('{}/*'.format(IMG_DIR))

#Image file path list
fnames = [glob.glob('{}/*.png'.format(d)) for d in dnames
          if not os.path.exists('{}/ignore'.format(d))]
fnames = list(chain.from_iterable(fnames))

Next, in the image file path, the part of the directory name that contains the image represents the character name, so use that to create an ID that is unique to each character for each image.

#Give each a unique ID from the folder name
labels = [os.path.basename(os.path.dirname(fn)) for fn in fnames]
dnames = [os.path.basename(d) for d in dnames
          if not os.path.exists('{}/ignore'.format(d))]
labels = [dnames.index(l) for l in labels]

Now let's create the base dataset object. It's easy to do, just pass a list of tuples with the file path and its labels to LabeledImageDataset. This is an iterator that returns tuples like (img, label).

from chainer.datasets import LabeledImageDataset

#Data set creation
d = LabeledImageDataset(list(zip(fnames, labels)))

Next, let's use a convenient function called Transform Dataset provided by Chainer. This is a wrapper class that takes a dataset object and a function that represents the conversion to each data. By using this, you can prepare a part for data augmentation, preprocessing, etc. outside the dataset class.

from chainer.datasets import TransformDataset
from PIL import Image

width, height = 160, 160

#Image resize function
def resize(img):
    img = Image.fromarray(img.transpose(1, 2, 0))
    img = img.resize((width, height), Image.BICUBIC)
    return np.asarray(img).transpose(2, 0, 1)

#Conversion to each data
def transform(inputs):
    img, label = inputs
    img = img[:3, ...]
    img = resize(img.astype(np.uint8))
    img = img - mean[:, None, None]
    img = img.astype(np.float32)
    #Randomly flip left and right
    if np.random.rand() > 0.5:
        img = img[..., ::-1]
    return img, label

#Make a dataset with conversion
td = TransformDataset(d, transform)

By doing this, you can create a dataset object that receives a tuple like (img, label) returned by the LabeledImageDataset object d, passes it through the transform function, and then returns it. I did.

Now let's split this into two partial datasets, one for training and one for validation. This time, we'll use 80% of the entire dataset for training and the remaining 20% for validation. With split_dataset_random, the data in the dataset will be shuffled once and then split at the specified breaks.

from chainer import datasets

train, valid = datasets.split_dataset_random(td, int(len(d) * 0.8), seed=0)

Dataset partitioning also provides several other functions, such as get_cross_validation_datasets_random, which returns multiple different training and validation dataset pairs for cross-validation. Have a look at this. : SubDataset

By the way, mean used in the conversion is the average image of the images included in the learning data set used this time. Let's calculate this.

import matplotlib.pyplot as plt
import numpy as np
from tqdm import tqdm_notebook

#Calculate if the average image is uncalculated
if not os.path.exists('image_mean.npy'):
    #I want to calculate the average with a version of the training dataset that does not bite the conversion
    t, _ = datasets.split_dataset_random(d, int(len(d) * 0.8), seed=0)

    mean = np.zeros((3, height, width))
    for img, _ in tqdm_notebook(t, desc='Calc mean'):
        img = resize(img[:3].astype(np.uint8))
        mean += img
    mean = mean / float(len(d))
    np.save('image_mean', mean)
else:
    mean = np.load('image_mean.npy')

Let's display the average image calculated as a trial.

#Display of average image
%matplotlib inline
plt.imshow(mean.transpose(1, 2, 0) / 255)
plt.show()

png

It's kind of scary ...

When subtracting the average from each image, the average for each pixel is used, so calculate the average pixel value (RGB) of this average image.

mean = mean.mean(axis=(1, 2))

4. Model definition and fine-tuning preparation

Next, let's define the model to be trained. Here, based on the network used in Illustration2Vec, which performs tag prediction and feature extraction with a model learned using many 2D illustration images, the last one. The new model is the one with two layers removed and two randomly initialized fully connected layers added.

At the time of learning, after initializing the part of the third layer and below from the output with the pre-trained weight of Illustration2Vec, the weight of that part is fixed. That is, ** train only the two newly added fully connected layers. ** **

First, download the trained parameters of the distributed Illustration2Vec model.

%%bash
if [ ! -f illust2vec_ver200.caffemodel ]; then
    curl -L -O https://github.com/rezoo/illustration2vec/releases/download/v2.0.0/illust2vec_ver200.caffemodel
fi

This trained parameter is provided in the form of a caffe model, but Chainer has the ability to load Caffe's trained model very easily (CaffeFunction. /reference/generated/chainer.links.caffe.CaffeFunction.html#chainer.links.caffe.CaffeFunction)), which you can use to load the parameters and model structure. However, it takes time to load, so save the Chain object obtained once loaded to a file using the Python standard pickle. This will speed up loading from the next time.

The actual network code looks like this:

import dill

import chainer
import chainer.links as L
import chainer.functions as F

from chainer import Chain
from chainer.links.caffe import CaffeFunction
from chainer import serializers

class Illust2Vec(Chain):

    CAFFEMODEL_FN = 'illust2vec_ver200.caffemodel'

    def __init__(self, n_classes, unchain=True):
        w = chainer.initializers.HeNormal()        
        model = CaffeFunction(self.CAFFEMODEL_FN)  #Load and save the Caffe Model. (This will take some time)
        del model.encode1  #Delete unnecessary layers to save memory.
        del model.encode2
        del model.forwards['encode1']
        del model.forwards['encode2']
        model.layers = model.layers[:-2]
        
        super(Illust2Vec, self).__init__()
        with self.init_scope():
            self.trunk = model  #Include the original Illust2Vec model as a trunk in this model.
            self.fc7 = L.Linear(None, 4096, initialW=w)
            self.bn7 = L.BatchNormalization(4096)
            self.fc8 = L.Linear(4096, n_classes, initialW=w)
            
    def __call__(self, x):
        h = self.trunk({'data': x}, ['conv6_3'])[0]  #Conv6 of the original Illust2 Vec model_Take the output of 3.
        h.unchain_backward()
        h = F.dropout(F.relu(self.bn7(self.fc7(h))))  #The following layers are newly added layers.
        return self.fc8(h)

n_classes = len(dnames)
model = Illust2Vec(n_classes)
model = L.Classifier(model)

/home/mitmul/chainer/chainer/links/caffe/caffe_function.py:165: UserWarning: Skip the layer "encode1neuron", since CaffeFunction does notsupport Sigmoid layer
  'support %s layer' % (layer.name, layer.type))
/home/mitmul/chainer/chainer/links/caffe/caffe_function.py:165: UserWarning: Skip the layer "loss", since CaffeFunction does notsupport SigmoidCrossEntropyLoss layer
  'support %s layer' % (layer.name, layer.type))

The description h.unchain_backward () has appeared in the part of __call__. ʻUnchain_backward is called from some intermediate output Variable` etc. in the network and disconnects all network nodes before that point. Therefore, during training, the error will not be transmitted to the layers before this is called, and as a result, the parameters will not be updated.

As mentioned above

At the time of learning, after initializing the part of the third layer and below from the output with the pre-trained weight of Illustration2Vec, the weight of that part is fixed.

The code to do this is this h.unchain_backward ().

For more information on how this works, see this article that explains how Chainer's autograd works with Define-by-Run. : Create 1-file Chainer

5. Learning

Now, let's train using this dataset and model. First, load the required modules.

from chainer import iterators
from chainer import training
from chainer import optimizers
from chainer.training import extensions
from chainer.training import triggers
from chainer.dataset import concat_examples

Next, set the learning parameters. This time

--Batch size 64 --The learning rate starts from 0.01 and increases 0.1 times at the 10th epoch. ――20 Epoch ends learning

will do.

batchsize = 64
gpu_id = 0
initial_lr = 0.01
lr_drop_epoch = 10
lr_drop_ratio = 0.1
train_epoch = 20

Below is the code to learn.

train_iter = iterators.MultiprocessIterator(train, batchsize)
valid_iter = iterators.MultiprocessIterator(
    valid, batchsize, repeat=False, shuffle=False)

optimizer = optimizers.MomentumSGD(lr=initial_lr)
optimizer.setup(model)
optimizer.add_hook(chainer.optimizer.WeightDecay(0.0001))

updater = training.StandardUpdater(
    train_iter, optimizer, device=gpu_id)

trainer = training.Trainer(updater, (train_epoch, 'epoch'), out='AnimeFace-result')
trainer.extend(extensions.LogReport())
trainer.extend(extensions.observe_lr())

#The value you want to write to standard output
trainer.extend(extensions.PrintReport(
    ['epoch',
     'main/loss',
     'main/accuracy',
     'val/main/loss',
     'val/main/accuracy',
     'elapsed_time',
     'lr']))

#Loss plot automatically saved every epoch
trainer.extend(extensions.PlotReport(
        ['main/loss',
         'val/main/loss'],
        'epoch', file_name='loss.png'))

#Accuracy plots are also automatically saved every epoch
trainer.extend(extensions.PlotReport(
        ['main/accuracy',
         'val/main/accuracy'],
        'epoch', file_name='accuracy.png'))

#Extension that validates by setting the model's train property to False
trainer.extend(extensions.Evaluator(valid_iter, model, device=gpu_id), name='val')

#Lr learning rate for each specified epoch_drop_Multiply the ratio
trainer.extend(
    extensions.ExponentialShift('lr', lr_drop_ratio),
    trigger=(lr_drop_epoch, 'epoch'))

trainer.run()

epoch       main/loss   main/accuracy  val/main/loss  val/main/accuracy  elapsed_time  lr        
1           1.58266     0.621792       0.623695       0.831607           29.4045       0.01        
2           0.579938    0.835989       0.54294        0.85179            56.3893       0.01        
3           0.421797    0.877897       0.476766       0.876872           83.9976       0.01        
4           0.3099      0.909251       0.438246       0.879637           113.476       0.01        
5           0.244549    0.928394       0.427892       0.884571           142.931       0.01        
6           0.198274    0.938638       0.41589        0.893617           172.42        0.01        
7           0.171127    0.946709       0.432277       0.89115            201.868       0.01        
8           0.146401    0.953125       0.394634       0.902549           231.333       0.01        
9           0.12377     0.964404       0.409338       0.894667           260.8         0.01        
10          0.109239    0.967198       0.400371       0.907746           290.29        0.01        
11          0.0948708   0.971337       0.378603       0.908831           319.742       0.001       
12          0.0709512   0.98065        0.380891       0.90786            349.242       0.001       
13          0.0699093   0.981892       0.384257       0.90457            379.944       0.001       
14          0.0645318   0.982099       0.370053       0.908008           410.963       0.001       
15          0.0619039   0.983547       0.379178       0.908008           441.941       0.001       
16          0.0596897   0.983646       0.375837       0.911709           472.832       0.001       
17          0.0579783   0.984789       0.379593       0.908008           503.836       0.001       
18          0.0611943   0.982202       0.378177       0.90842            534.86        0.001       
19          0.061885    0.98303        0.373961       0.90569            565.831       0.001       
20          0.0548781   0.986341       0.3698         0.910624           596.847       0.001

Learning was completed in less than 6 minutes. The process of going to standard output was like the above. In the end, you can get more than 90% accuracy for the verification data set. Now, let's display the loss curve and accuracy curve in the learning process saved as an image file.

from IPython.display import Image
Image(filename='AnimeFace-result/loss.png')

Image(filename='AnimeFace-result/accuracy.png')

I feel that it has converged safely.

Finally, let's take some images from the validation dataset and look at the individual classification results.

%matplotlib inline
import matplotlib.pyplot as plt

from PIL import Image
from chainer import cuda

chainer.config.train = False
for _ in range(10):
    x, t = valid[np.random.randint(len(valid))]
    x = cuda.to_gpu(x)
    y = F.softmax(model.predictor(x[None, ...]))
    
    pred = os.path.basename(dnames[int(y.data.argmax())])
    label = os.path.basename(dnames[t])
    
    print('pred:', pred, 'label:', label, pred == label)

    x = cuda.to_cpu(x)
    x += mean[:, None, None]
    x = x / 256
    x = np.clip(x, 0, 1)
    plt.imshow(x.transpose(1, 2, 0))
    plt.show()

pred: 097_kamikita_komari label: 097_kamikita_komari True

pred: 127_setsuna_f_seiei label: 127_setsuna_f_seiei True

pred: 171_ikari_shinji label: 171_ikari_shinji True

pred: 042_tsukimura_mayu label: 042_tsukimura_mayu True

pred: 001_kinomoto_sakura label: 001_kinomoto_sakura True

pred: 090_minase_iori label: 090_minase_iori True

pred: 132_minamoto_chizuru label: 132_minamoto_chizuru True

pred: 106_nia label: 106_nia True

pred: 174_hayama_mizuki label: 174_hayama_mizuki True

pred: 184_suzumiya_akane label: 184_suzumiya_akane True

When I randomly selected 10 images, I was able to answer all of these images correctly.

Finally, save a snapshot for the time being, as it may be used for something someday.

from chainer import serializers

serializers.save_npz('animeface.model', model)

6. Bonus 1: How to write a dataset class in full scratch

To write a dataset class in full scratch, you can prepare your own class that inherits the chainer.dataset.DatasetMixin class. The class must have a __len__ method and a get_example method. For example:

class MyDataset(chainer.dataset.DatasetMixin):
    
    def __init__(self, image_paths, labels):
        self.image_paths = image_paths
        self.labels = labels
        
    def __len__(self):
        return len(self.image_paths)
    
    def get_example(self, i):
        img = Image.open(self.image_paths[i])
        img = np.asarray(img, dtype=np.float32)
        img = img.transpose(2, 0, 1)
        label = self.labels[i]
        return img, label

This is done by passing a list of image file paths and a list of labels arranged in the corresponding order to the constructor, and if you specify an index with the [] accessor, the image is read from the corresponding path and arranged with the label. It is a dataset class that returns a tuple. For example, you can use it as follows.

image_files = ['images/hoge_0_1.png', 'images/hoge_5_1.png', 'images/hoge_2_1.png', 'images/hoge_3_1.png', ...]
labels = [0, 5, 2, 3, ...]

dataset = MyDataset(image_files, labels)

img, label = dataset[2]

#=> 'images/hoge_2_1.png'The image data read from and its label (2 in this case) are returned.

This object can be passed to Iterator as it is and can be used for learning with Trainer. In other words

train_iter = iterators.MultiprocessIterator(dataset, batchsize=128)

You can create an iterator like this, pass it to the Updater with the Optimizer, pass the updater to the Trainer, and start learning with the Trainer.

7. Bonus 2: How to create the simplest dataset object

In fact, the dataset for use with Chainer's Trainer is ** just a Python list OK **. What this means is that if you can get the length with len () and retrieve the element with the [] accessor, you can ** handle it as a dataset object **. For example

data_list = [(x1, t1), (x2, t2), ...]

You can pass this to Iterator by creating a list of tuples such as (data, label).

train_iter = iterators.MultiprocessIterator(data_list, batchsize=128)

However, the drawback of this approach is that the entire dataset must be stored in memory before training. To prevent this, [ImageDataset](http://docs.chainer.org/en/stable/reference/generated/chainer.datasets.ImageDataset.html#chainer.datasets.ImageDataset] and [TupleDataset](http: // How to combine docs.chainer.org/en/stable/reference/generated/chainer.datasets.TupleDataset.html#chainer.datasets.TupleDataset) and [LabaledImageDataset](http://docs.chainer.org/en/stable/ There are classes such as reference / generated / chainer.datasets.LabeledImageDataset.html # chainer.datasets.LabeledImageDataset). Please refer to the document for details. http://docs.chainer.org/en/stable/reference/datasets.html#general-datasets