[PYTHON] Implementation of "blurred" neural network using Chainer

Introduction

Dec-01-2016 13-02-04.gif

I want to learn blurred RNN (Chainer meetup 01) was very interesting, so I implemented it with Chainer as well. Also, I made an app that allows you to check the trained model directly from the browser.

First of all, what I did was basically the second brew of the original slide, and speaking of the difference, I tried experimenting with increasing learning data (it was not learned well), and I made a web application. That's about it.

The web application I made looks like this. Nov-30-2016 21-21-21.gif

The one I made this time is open to the public on GitHub. The documentation may be written soon.

Model for "blurring"

There is research that it is possible to generate a caption of an image by inputting the features of the image extracted using the Convolutional Neural Network (CNN) to the Recurrent Neural Network (RNN). Famous places include [Karpathy + 2015] and [Vinyals + 2014]. Can be mentioned.

Screenshot 2016-11-30 21.44.43.png (From [Vinyals + 2014])

In this figure, an image is input, and after passing through a multi-layered CNN, it is input to LSTM (a type of RNN), and an explanation is generated word by word. It's a simple model, but it's known to work surprisingly well, you can see an actual generation example here (this is Karpathy)

By the way, if you can ** translate the input image into "description", then you can also * translate the input image into "blurred"? In other words, is it possible to "blur" ** with a neural network? Based on this intuition, by implementing these models with Chainer, we created a "fully automatic defocused neural network".

Data resources used

  1. CNN trained model
  2. Blurred data for the image

For 1, it was easy to use the trained model provided for Caffe. This time I used CaffeNet, but I think other networks may be used. However, since the output of the fc7 layer (fully connected layer) is required to extract the features from the image, it looks like GoogleNet. Lighter models are (probably) unusable.

Regarding 2, there is a wonderful web service called bokete-a web service that blurs a word with photos, so I will crawl from here and collect it with enthusiasm. As mentioned by the former slide author, each blurred page has a simple structure of "1 image + 1 text", so I think it is not so difficult to collect data.

environment

Feature extraction from the original image

Chainer is very useful because it can load Caffe's trained models. I defined the following method in Model Class and used the output.

Model.py


def encode_image(self, img_array):
    batchsize = img_array.shape[0]
    if self.config["use_caffenet"]:
        img_x = chainer.Variable(img_array, volatile='on')
        y = self.enc_img(
            inputs={"data": img_x},
            outputs={"fc7"})[0]
    else:
        x = self.xp.random.rand(batchsize, 4096).astype(np.float32)
        y = chainer.Variable(x)
    y.volatile = 'off'
    return self.img2x(y)

The image file vectorized by PIL etc. is stored in img_array. For this, I referred to Reading caffemodel with Chainer and classifying images --Qiita.

Input the original image to LSTM

Input the features of the extracted image into the LSTM. This is also as defined in Model Class.

Model.py


def __call__(self, x, img_vec=None):
    if img_vec is not None:
        h0 = self.embed_mat(x) + img_vec
    else:
        h0 = self.embed_mat(x)
    h1 = self.dec_lstm(h0)
    y = self.l1(h1)
    return y

Actually, we want to input the feature only when the time t = 0, so The image vector is given only when n = 0 in the calculation process.

Trainer.py


def _calc_loss(self, batch):
    boke, img = batch
    boke = self.xp.asarray(boke, dtype=np.int32)
    img = self.xp.asarray(img, dtype=np.float32)

    # 1.Put the vectorized image into CNN and make it a feature vector
    img_vec = self.model.predictor.encode_image(img)

    # 2.Learn to decode boke
    accum_loss = 0
    n = 0
    for curr_words, next_words in zip(boke.T, boke[:, 1:].T):
        if n == 0:
            accum_loss += self.model(curr_words, img_vec, next_words)
        else:
            accum_loss += self.model(curr_words, next_words)
        n += 1
    return accum_loss

You may be wondering, "Why don't you enter the features of the image every hour?", But [Karpathy + 2015]

Note that we provide the image context vector b v to the RNN only at the first iteration, which we found to work better than at each time step.

It seems that you can get better output only when t = 0 (I have not actually tried it).

Learn Bokeh: Large Data

The pioneer I want to learn a blurred RNN (Chainer meetup 01) learned with 500 samples, and then 20,000 samples. It was written that I wanted to try it in, so I also wanted to do it on that scale, so I tried learning with about 30,000 samples. (Word Embedding, hidden layer of LSTM are both 100 dimensions, batch size 16 Dropout, etc.)

However, the loss did not decrease well.

average_loss.log


"average_loss":[
    10.557335326276828,
    9.724091438064605,
    9.051927699901125,
    8.728849313754363,
    8.36422316245738,
    8.1049892753394,
    7.999240087562069,
    7.78314874008182,
    7.821357278519156,
    7.629313596859783
]

(By the way, it is the loss of training data)

At the stage of turning about 30 epochs, the loss did not decrease. By the way, I tried to blur the image at hand, but I couldn't get a decent output ...

This is probably because there were only about 2000 images of the original blur for 30,000 blurs. (Multiple blurs are added to one image for the convenience of acquiring data from "blurred") Since there are probably 10 or more correct answers (translation destinations) for one image, the parameters of the model may not be adjusted in a unique direction. I think.

Learn bokeh: small data

Since "I want to check just by reducing the loss", I made an experiment with small-scale data (about 300 blurs), taking into consideration that one blur corresponds to one image:

average_loss.log


"average_loss": [
    6.765932078588577,
    1.7259380289486477,
    0.7160143222127642,
    0.3597904167004994,
    0.1992428061507997
]

Certainly the loss has decreased. (Turned up to a total of 100 epochs)

From this, even in the case of large-scale data, it can be expected that "loss will decrease (= can be learned) if there is a one-to-one correspondence between image and blur".

Learning results

When I tried it with the image at hand, I got the following output. ____________________________2016-10-31_13.11.20.png ** (Is this out of focus ...?) **

Creating a web app

I learned a lot, and it's boring if I can't check it from the browser, so I made a web application. (The code is open to the public)

Backend_API.png It is a state. Backend_API.png You can see the statistics of the data used for learning. Backend_API 2.png By pressing the Generate button, you can see the blurring of the training / development data.

Chainer's trained model is loaded behind the web application, and when you press a button on the browser side, the blur generation method fires.

in conclusion

This time, I used the image description generation models of [Karpathy + 2015] and [Vinyals + 2014] to learn and generate blur, but I don't think this model is the best for dealing with blur. Since this model is designed and evaluated on the assumption that there is only one correct answer in the description of the image **, it is an "arbitrary correct answer (= interesting)" such as "blurring on the image". I don't think it's suitable for "blurred" data. Actually, as a result of trying to learn by giving multiple correct answer data (blurring) to one image, I suffered from the phenomenon that loss does not decrease.

Also, even if the loss on the training data is reduced, the loss on the development data will probably not be reduced. (It should definitely be necessary to remove more input / output domains e.g. Narrow down the input image to the ossan, do not fill in the blanks, etc.)

In the first place, is the approach of trying to "generate blur on the input image" appropriate? Is there a more straightforward approach? For example, if you intentionally add a completely different image description to the input image **, it would be an interesting blur.

For example like this mig.jpg Screenshot 2016-12-01 18.05.56.png (It is an image)

Is a neural network really necessary for interesting bokeh? It's a very annoying place.

Recommended Posts

Implementation of "blurred" neural network using Chainer
Simple neural network implementation using Chainer
Rank learning using neural network (Implementation of RankNet by Chainer)
Bayesian optimization implementation of neural network hyperparameters (Chainer + GPyOpt)
Implementation of a convolutional neural network using only Numpy
Implementation of a two-layer neural network 2
Implementation of 3-layer neural network (no learning)
Simple neural network implementation using Chainer-Data preparation-
Simple neural network implementation using Chainer-Model description-
Simple neural network implementation using Chainer-optimization algorithm setting-
Implementation of TF-IDF using gensim
Overview of DNC (Differentiable Neural Computers) + Implementation by Chainer
Neural network starting with Chainer
Neural network implementation in python
Implementation of Chainer series learning using variable length mini-batch
Neural network implementation (NumPy only)
PRML Chapter 5 Neural Network Python Implementation
Simple neural network theory and implementation
Implementation of desktop notifications using Python
Touch the object of the neural network
Survivor prediction using kaggle's titanic neural network [80.8%]
Try using TensorFlow-Part 2-Convolutional Neural Network (MNIST)
[Chainer] Document classification by convolutional neural network
Python vs Ruby "Deep Learning from scratch" Chapter 3 Implementation of 3-layer neural network
Reinforcement learning 10 Try using a trained neural network.
Another style conversion method using Convolutional Neural Network
Verification of Batch Normalization with multi-layer neural network
Recognition of handwritten numbers by multi-layer neural network
[Deep Learning from scratch] Initial value of neural network weight using sigmoid function
Parametric Neural Network
[Deep Learning from scratch] Initial value of neural network weight when using Relu function
[Python] Implementation of clustering using a mixed Gaussian model
Author estimation using neural network and Doc2Vec (Aozora Bunko)
Implementation of object authenticity judgment condition using __bool__ method
Model using convolutional neural network in natural language processing
The story of making a music generation neural network
Implement feedforward neural network in Chainer to classify documents
Let's analyze the emotions of Tweet using Chainer (2nd)
Basics of PyTorch (2) -How to make a neural network-
Let's analyze the sentiment of Tweet using Chainer (1st)
Implement Convolutional Neural Network
Implementation of Fibonacci sequence
Implement Neural Network from 1
Convolutional neural network experience
Basics of network programs?
About variable of chainer
Example of using lambda
How to easily draw the structure of a neural network on Google Colaboratory using "convnet-drawer"
Precautions when using Chainer
Implementation of VGG16 using Keras created without using a trained model
Try building a neural network in Python without using a library
Verification and implementation of video reconstruction method using GRU and Autoencoder
Construction of a neural network that reproduces XOR by Z3
CNN Acceleration Series ~ FCNN: Introduction of Fourier Convolutional Neural Network ~
Study of Recurrent Neural Network (RNN) by Chainer ~ Accuracy verification of random numbers in Excel and R ~