Introduction

Recently, I often see an interesting structure in the generative model called GAN (Generative Adversarial Networks). The structure in which the Generator and Discriminator compete with each other to improve accuracy is like a match between a counterfeiter (= Generator) and an appraiser (Discriminator) in the art world, a world like Gallery Fake. Is reminiscent of. I feel romance in the mechanism itself. Furthermore, it is really strange because it can generate accurately.

However, it seems that the problem with GAN was that it was difficult to learn, but this BEGAN is learning while balancing the conflict between the two. It is said that this problem will be solved by doing so. BEGAN was fairly simple and easy to understand, so I implemented it with Keras.

Interesting part of BEGAN

It seems that there is a mechanism to balance while inheriting various GAN improvements, but I personally found it interesting because of the features described in the paper (including those that other people thought before). As a part,

Discriminator is made up of AutoEncoder's Loss function The Loss of that AutoEncoderWasserstein distance*It's a small and difficult concept, but in reality it's similarMean Absolute Error (In other words|InputX - AutoEncoder(InputX)| )Is
** Diversity ** can be expressed by the parameter $ \ gamma \ in [0, 1] $, which allows you to adjust "how far away from the sample".
Presents a formula that can evaluate the progress of learning ** without visually checking the actual image **

However, the most interesting thing is

When the Loss function of Discriminator's AutoEncoder is $ L (x) $, Let the Discriminator's Loss function be $ L_ {D} (x) $ and the Generator's Loss function be $ L_ {G} (x) $.

$ L_ {D} (x) = L (true image) --k_ {t} * L (image generated by Generator) $ $ L_ {G} (x) = L (Image generated by Generator) $ $ k_ {t + 1} = k_ {t} + \ lambda_ {k} (\ gamma * L (true image) --L (Generator generated image)) $

It is a place to learn in the form of. This $ k_ {t} $ seems to start from 0 and gradually increase.

Discriminator is forced to do "do your best to AutoEncode the true image (make the Loss smaller)" and "do not do the AutoEncoder of the fake image (make the Loss larger)" to reduce the loss. At first, since k = 0, we will optimize the AutoEncoder of the true image, but we will also make an effort to increase the Loss of the Generator image as k gradually increases. At the end, k reaches an equilibrium state when $ (\ gamma * L (true image) --L (image generated by Generator)) = 0 $ (at that time, $ \ gamma $ comes into effect. Ne).

Since the Generator always devises the generated image so that the Loss of the AutoEncoder becomes smaller, the competition will gradually become more sophisticated.

This dilemma-like mechanism was very interesting, and I thought it was clearly expressed (well, I don't know much about other GANs ...).

Implementation

Source code

https://github.com/mokemokechicken/keras_BEGAN It is placed in.

I think that it is suitable as an implementation because it can generate an image that looks like that ...

With Keras, it is difficult to perform such unusual models and learning (there are not many samples), but once you understand how to write it, it is actually not so difficult, and once you can make it, it is easy to read due to its high modularity. It has the advantage of being easy to apply in various ways.

Learning progress

Plotting the values of various Loss for each batch is as follows. I am training with $ \ gamma = 0.5 $.

The meaning of each value is as follows.

m_global: A value that indicates the degree of learning progress. It seems that this should converge.
k: A value that balances the Discriminator and Generator described above.
loss_discriminator: L_{D}(x)
loss_generator: L_{G}(x)
loss_real_x: L (true image)
loss_gen_x: L (Image generated by Generator)

What I think of when I see Pat

k seems to increase once and then decrease.
loss_real_x * gamma = loss_gen_x is properly converged
m_global is gradually decreasing and is about to converge
Looking at other cases, do you feel that you will endure a little more after this and end when m_global is almost flat?

Generation example after execution

sample image

Any 64x64 Pixel square image is fine, but as a sample image, http://vis-www.cs.umass.edu/lfw/

[new] All images aligned with deep funneling 
(111MB, md5sum 68331da3eb755a505a502b5aacb3c201)

I was allowed to use. Excluding the grayscale image, there are 13194 samples.

Generated image

When I arranged the images generated according to the progress of learning, it looked like this.

Epoch 1
Epoch 25
Epoch 50
Epoch 75
Epoch 100
Epoch 125
Epoch 150
Epoch 175
Epoch 200
Epoch 215

As a face photo, up to about epoch125 is quite good. After that, probably because I tried to capture the background, the disorder of the face part is amazing. If you want to focus on the face and generate it, it may be a little more beautiful if you use the one with the background crushed and only the face. It seems that the number of Conv Layers of the Model is related to how beautiful the image is, and it may have been a little short.

Execution time

It was about 680 seconds / epoch with the following machine specs.

Linux
Dataset: All images aligned with deep funneling (13194 samples)
Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
GeForce GTX 1080

Environment Variables

  KERAS_BACKEND=theano
  THEANO_FLAGS=device=gpu,floatX=float32,lib.cnmem=1.0

at the end

I feel like I finally learned a little about GAN.

[PYTHON] Implement BEGAN (Boundary Equilibrium Generative Adversarial Networks) in Keras