Recently, I often see an interesting structure in the generative model called GAN (Generative Adversarial Networks). The structure in which the Generator and Discriminator compete with each other to improve accuracy is like a match between a counterfeiter (= Generator) and an appraiser (Discriminator) in the art world, a world like Gallery Fake. Is reminiscent of. I feel romance in the mechanism itself. Furthermore, it is really strange because it can generate accurately.
However, it seems that the problem with GAN was that it was difficult to learn, but this BEGAN is learning while balancing the conflict between the two. It is said that this problem will be solved by doing so. BEGAN was fairly simple and easy to understand, so I implemented it with Keras.
It seems that there is a mechanism to balance while inheriting various GAN improvements, but I personally found it interesting because of the features described in the paper (including those that other people thought before). As a part,
However, the most interesting thing is
When the Loss function of Discriminator's AutoEncoder is $ L (x) $, Let the Discriminator's Loss function be $ L_ {D} (x) $ and the Generator's Loss function be $ L_ {G} (x) $.
$ L_ {D} (x) = L (true image) --k_ {t} * L (image generated by Generator) $ $ L_ {G} (x) = L (Image generated by Generator) $ $ k_ {t + 1} = k_ {t} + \ lambda_ {k} (\ gamma * L (true image) --L (Generator generated image)) $
It is a place to learn in the form of. This $ k_ {t} $ seems to start from 0 and gradually increase.
Discriminator is forced to do "do your best to AutoEncode the true image (make the Loss smaller)" and "do not do the AutoEncoder of the fake image (make the Loss larger)" to reduce the loss. At first, since k = 0, we will optimize the AutoEncoder of the true image, but we will also make an effort to increase the Loss of the Generator image as k gradually increases. At the end, k reaches an equilibrium state when $ (\ gamma * L (true image) --L (image generated by Generator)) = 0 $ (at that time, $ \ gamma $ comes into effect. Ne).
Since the Generator always devises the generated image so that the Loss of the AutoEncoder becomes smaller, the competition will gradually become more sophisticated.
This dilemma-like mechanism was very interesting, and I thought it was clearly expressed (well, I don't know much about other GANs ...).
https://github.com/mokemokechicken/keras_BEGAN It is placed in.
I think that it is suitable as an implementation because it can generate an image that looks like that ...
With Keras, it is difficult to perform such unusual models and learning (there are not many samples), but once you understand how to write it, it is actually not so difficult, and once you can make it, it is easy to read due to its high modularity. It has the advantage of being easy to apply in various ways.
Plotting the values of various Loss for each batch is as follows. I am training with $ \ gamma = 0.5 $.
The meaning of each value is as follows.
What I think of when I see Pat
loss_real_x * gamma = loss_gen_x
is properly convergedAny 64x64 Pixel square image is fine, but as a sample image, http://vis-www.cs.umass.edu/lfw/
[new] All images aligned with deep funneling
(111MB, md5sum 68331da3eb755a505a502b5aacb3c201)
I was allowed to use. Excluding the grayscale image, there are 13194 samples.
When I arranged the images generated according to the progress of learning, it looked like this.
Epoch 1 | |||||
---|---|---|---|---|---|
Epoch 25 | |||||
Epoch 50 | |||||
Epoch 75 | |||||
Epoch 100 | |||||
Epoch 125 | |||||
Epoch 150 | |||||
Epoch 175 | |||||
Epoch 200 | |||||
Epoch 215 |
As a face photo, up to about epoch125 is quite good. After that, probably because I tried to capture the background, the disorder of the face part is amazing. If you want to focus on the face and generate it, it may be a little more beautiful if you use the one with the background crushed and only the face. It seems that the number of Conv Layers of the Model is related to how beautiful the image is, and it may have been a little short.
It was about 680 seconds / epoch with the following machine specs.
Linux
Dataset: All images aligned with deep funneling (13194 samples)
Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
GeForce GTX 1080
Environment Variables
KERAS_BACKEND=theano
THEANO_FLAGS=device=gpu,floatX=float32,lib.cnmem=1.0
I feel like I finally learned a little about GAN.