[PYTHON] Another style conversion method using Convolutional Neural Network

Style conversion method using Convolutional Neural Network

"A Neural Algorithm of Artistic Style" (hereinafter Neural style) is famous as a style conversion method using Convolutional Neural Network (CNN), and the following implementation there is.

Another method of style conversion is "Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis" (hereinafter "style conversion using MRF"). I will introduce this.

I implemented this technique in Chainer. The source code can be found at https://github.com/dsanno/chainer-neural-style.

What is style conversion?

Style conversion example

Sample by the author of the paper

https://github.com/chuanli11/CNNMRF

Sample by Chainer implementation

Generated image Content image Style image
Oil painting
Watercolor Chihiro Iwasaki's "Hanaguruma
Pen drawing

Image source:

Overview

Style conversion using MRF generates the following image.

The difference is that Neural-style brings the overall style of the image closer to the style image, while MRF brings the local style closer to the style image.

Existing implementation

algorithm

The title includes "Markov Random Field (MRF, Markov Random Field)", but since MRF is not the heart of the algorithm, I will omit the explanation.

Input and output

Enter the content image and style image. Content images are represented by $ x_c $ and style images are represented by $ x_s $. The generated image to be output is represented by $ x $.

CNN

This method uses CNN for image recognition such as VGG as well as Neural style.

Layer output

The output of a particular layer when the image $ x $ is input to the CNN is represented by $ \ Phi (x) $. The layer output when the content image is input is $ \ Phi (x_c) $, and the layer output when the style image is input is $ \ Phi (x_s) $.

patch

Generate a patch from the CNN layer output. A patch is a collection of $ k \ times k $ areas of layer output into one, and is a vector of length $ k \ times k \ times C $ ($ C $ is the number of channels in layer output). You can generate multiple patches from one layer, and the $ i $ th patch is represented by $ \ Psi_i (\ Phi (x)) $.

Definition of energy

This algorithm defines energy as a function of $ x $. Then calculate $ x $ to minimize the energy. The definition of the energy function is as follows. The explanation of each section is explained below.

E(x) = E_s(\Phi(x), \Phi(x_s)) + \alpha_1 E_c(\Phi(x), \Phi(x_c)) + \alpha_2 \Upsilon(x) 

MRFs loss function

One item $ E_s $ is called MRFs loss function and its definition is as follows.

E_s(\Phi(x), \Phi(x_s)) = \sum^{m}_{i=1}||\Psi_i(\Phi(x)) - \Psi_{NN(i)}(\Phi(x_s))||^2

However, $ NN (i) $ is defined by the following formula.

NN(i) := \mathop{\rm arg\,max}\limits_{j=1,...,m_s} \frac{\Psi_i(\Phi(x)) \cdot \Psi_j(\Phi(x_s))}{|\Psi_i(\Phi(x))| \cdot |\Psi_j(\Phi(x_s))|}

In the paper, it is argmin, but when I look at the implementation, argmax seems to be correct. This formula has the following meanings.

Content loss function

The two items $ E_c $ are called the Content loss function and their definitions are as follows.

E_c(\Phi(x), \Phi(x_c)) = ||\Phi(x) - \Phi(x_c)||^2

This means that the closer the CNN layer generated from $ x $ and the CNN layer generated from $ x_c $, the smaller the energy.

Regularizer

The three items $ \ Upsilon $ are regularization terms for smoothing the image. The definition is as follows, $ x_ {i, j} $ is the value of the pixel whose x coordinate is $ i $ and y coordinate is $ j $. The smaller the difference between adjacent pixels, the smaller the energy.


\Upsilon(x) = \sum_{i,j}((x_{i,j+1} - x_{i,j})^2 + (x_{i+1,j} - x_{i,j})^2)

Implementation

I implemented it with Chainer. For CNN, we use the VGG 16 layers model, which is also familiar to Chainer-gogh. The source code can be found at https://github.com/dsanno/chainer-neural-style.

Comparison of methods

Execution time

Execution time is shorter than Neural-style. Neural-style requires thousands of iterations, while style conversion using MRF requires hundreds of iterations.

Difference in style

With Neural-style, the color usage changes drastically, but the style conversion using MRF does not change the color usage significantly, and I get the impression that the touch of the picture changes.

application

There is neural-doodle as an application of style conversion using MRF. neural-doodle allows you to specify which style to specify in which area of the image. As you can see by looking at the linked image, the face photo is converted to Van Gogh's portrait style, but by specifying the style for each area, a more natural conversion is realized.

In neural-doodle, style specification is realized by concatenating the patch vector output from the CNN layer with the vector representing the style number (one-hot vector indicating which style the patch corresponds to).

References

Recommended Posts

Another style conversion method using Convolutional Neural Network
Try using TensorFlow-Part 2-Convolutional Neural Network (MNIST)
Style conversion by neural style
Implement Convolutional Neural Network
Convolutional neural network experience
Model using convolutional neural network in natural language processing
Implementation of a convolutional neural network using only Numpy
Simple neural network implementation using Chainer
What is a Convolutional Neural Network?
Survivor prediction using kaggle's titanic neural network [80.8%]
Implementation of "blurred" neural network using Chainer
Simple neural network implementation using Chainer-Data preparation-
Simple neural network implementation using Chainer-Model description-
I made an image discrimination (cifar10) model using a convolutional neural network.
[Chainer] Document classification by convolutional neural network
Simple neural network implementation using Chainer-optimization algorithm setting-
Reinforcement learning 10 Try using a trained neural network.
Parametric Neural Network
Author estimation using neural network and Doc2Vec (Aozora Bunko)