[PYTHON] GAN and VAE

What is an autoencoder?
Variational autoencoder (VAE)
GAN

What is an autoencoder?

The figure of what I was doing is easy to understand. In short, the encoder compresses the information by reducing the dimension of the image data, and the decoder reconstructs the image using the compressed information. Calculate and optimize the distance MAE between the pixels of the input image and the output image. Unsupervised learning.

Now how do you generate an image using this autoencoder? First, train the autoencoder using images. Optimize so that the parameters of the two networks, the encoder and the decoder, are appropriate. Then you can get a rough feel of how the input image is represented in the latent space. When using this as a generative model, basically the encoder part is not used, only the latent space and the decoder are used.

Variational autoencoder

While ordinary autoencoders try to learn latent space as an array, variational encoders try to find appropriate parameters that define the distribution of latent space. Then, the image is reconstructed by sampling the value from this latent distribution to obtain a specific value and inputting it to the decoder.

スクリーンショット 2020-06-12 2.29.18.png The left is a normal autoencoder and the right is a variational encoder.

For details, this article is very easy to understand.

GAN

For more details on GAN's movements, see other articles. Here, the characteristic properties of GAN are described. In the GAN architecture, both the generator and the classifier are trained by the classifier's loss function. The discriminator's own training attempts to minimize discriminator loss for all training data. The generator, on the other hand, tries to maximize the loss of the classifier for the fake samples it makes. In other words, while ordinary neural network training is an optimization problem, GAN training is a game in which generators and classifiers compete rather than optimization. It stabilizes when the Nash equilibrium is reached.

The GAN training algorithm can be summarized as follows.

Each iterative step in training:
1.Discriminator training
　a.Randomly select samples from real data to make mini-batch X
　b.Make a mini-batch z of random vector and mini-batch G consisting of fake samples(Z)=X`make
　c. D(x)And D(x`)Calculate the discriminant loss for and update the classifier parameters by inversely propagating the total error

2.Generator training
　a.Make a mini-batch z of random vector and mini-batch G consisting of fake samples(z)=X`make
　b.D(x`)The discrimination error is calculated for and backpropagated to update the generator parameters to maximize the discrimination error.

Note that the generator parameters are not updated when training the classifier in step 1! Note that the classifier parameters are not updated when training the generator in step 2!

Using the basic knowledge of GAN so far, implement GAN as simplified as possible. (A more practical implementation will be given in a later article)

Simplified GAN implementation

This time, we will implement the code to generate an image from MNIST data with Keras's Sequential API.

1. import declaration

%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

from keras.datasets import mnist
from keras.layers import Dense, Flatten, Reshape
from keras.layers.advanced_activations import LeakyReLU
from keras.models import Sequential
from keras.optimizers import Adam

2. Input dimension setting

#Input dimension settings
img_rows = 28
img_cols = 28
channels = 1
img_shape = (img_rows, img_cols, channels)

#Dimension of input noise to generator
z_dim = 100

3. Generator generation

#Generator
def build_generator(img_shape, z_dim):
    model = Sequential()
    model.add(Dense(128, input_dim=z_dim))
    model.add(LeakyReLU(alpha=0.01))
    model.add(Dense(28*28*1, activation='tanh'))
    model.add(Reshape(img_shape))
    return model

4. Generate classifier

#Identifyer
def build_discriminator(image_shape):
    model = Sequential()
    model.add(Flatten(input_shape=img_shape))
    model.add(Dense(128))
    model.add(LeakyReLU(alpha=0.01))
    model.add(Dense(1, activation='sigmoid'))
    return model

5. Compile

#compile!

def build_gan(generator, discriminator):
    model = Sequential()
    model.add(generator)
    model.add(discriminator)
    return model

#Build and compile classifier
discriminator = build_discriminator(img_shape)
discriminator.compile(loss='binary_crossentropy',
                      optimizer=Adam(),
                      metrics=["accuracy"])

#Building a generator
generator = build_generator(img_shape, z_dim)
#Identifier parameters are fixed during generator construction
discriminator.trainable = False
#Build and compile GAN model
gan = build_gan(generator, discriminator)
gan.compile(loss="binary_crossentropy",optimizer=Adam())

6. Train settings

#Training!
losses = []
accuracies = []
iteration_checkpoints = []

def train(iterations, batch_size, sample_interval):
  (X_train, Y_train), (X_test, Y_test) = mnist.load_data() #X_train.shape=(60000, 28, 28)
  X_train = X_train /127.5 - 1.0
  X_train  = np.expand_dims(X_train, axis=3)

  real = np.ones((batch_size, 1))
  fake = np.zeros((batch_size,1))

  for iteration in range(iterations):

    #Make a randomly picked batch from a real image
    idx = np.random.randint(0, X_train.shape[0],batch_size)
    imgs = X_train[idx]
    #Create a batch of fake images
    z = np.random.normal(0, 1, (batch_size, 100))
    gen_imgs = generator.predict(z)

    #Discriminator training
    d_loss_real = discriminator.train_on_batch(imgs, real)
    d_loss_fake = discriminator.train_on_batch(gen_imgs, fake)
    d_loss, accuracy = 0.5 * np.add(d_loss_real, d_loss_fake)

    #Create a batch of fake images
    z = np.random.normal(0, 1, (batch_size, 100))
    ge_images = generator.predict(z)
    #Generator training
    g_loss = gan.train_on_batch(z, real)

    if (iteration+1) % sample_interval == 0:

      #Record the loss value and matching value of the iteration
      losses.append((d_loss, g_loss))
      accuracies.append(100 * accuracy)
      iteration_checkpoints.append(iteration+1)

      print("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" %
                  (iteration + 1, d_loss, 100.0 * accuracy, g_loss))
      sample_images(generator)

7. Image output

def sample_images(generator, image_grid_rows=4, image_grid_columns=4):

    #Random noise sampling
    z = np.random.normal(0, 1, (image_grid_rows * image_grid_columns, z_dim))
    gen_imgs = generator.predict(z)

    #Pixel value scale
    gen_imgs = 0.5 * gen_imgs + 0.5

    fig, axs = plt.subplots(image_grid_rows,
                            image_grid_columns,
                            figsize=(4, 4),
                            sharey=True,
                            sharex=True)
    cnt = 0
    for i in range(image_grid_rows):
        for j in range(image_grid_columns):
            # Output a grid of images
            axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray')
            axs[i, j].axis('off')
            cnt += 1

8. Let's learn!

iterations = 20000
batch_size = 128
sample_interval = 1000
train(iterations, batch_size, sample_interval)

result

1000 [D loss: 0.129656, acc.: 96.09%] [G loss: 3.387729]
2000 [D loss: 0.079047, acc.: 97.66%] [G loss: 3.964481]
3000 [D loss: 0.071152, acc.: 97.27%] [G loss: 5.072118]
4000 [D loss: 0.217956, acc.: 91.02%] [G loss: 3.993687]
5000 [D loss: 0.380112, acc.: 86.72%] [G loss: 3.941338]
6000 [D loss: 0.292950, acc.: 89.45%] [G loss: 4.491636]
7000 [D loss: 0.345073, acc.: 85.55%] [G loss: 4.056399]
8000 [D loss: 0.396545, acc.: 86.33%] [G loss: 3.101150]
9000 [D loss: 0.744731, acc.: 70.70%] [G loss: 2.761991]
10000 [D loss: 0.444913, acc.: 80.86%] [G loss: 3.474383]
11000 [D loss: 0.362310, acc.: 82.81%] [G loss: 3.101751]
12000 [D loss: 0.383188, acc.: 84.38%] [G loss: 3.111648]
13000 [D loss: 0.283140, acc.: 89.06%] [G loss: 3.082010]
14000 [D loss: 0.411019, acc.: 81.64%] [G loss: 2.747284]
15000 [D loss: 0.386751, acc.: 82.03%] [G loss: 2.795580]
16000 [D loss: 0.475734, acc.: 80.86%] [G loss: 2.436490]
17000 [D loss: 0.285364, acc.: 89.45%] [G loss: 2.764011]
18000 [D loss: 0.202013, acc.: 91.80%] [G loss: 4.058733]
19000 [D loss: 0.285773, acc.: 86.72%] [G loss: 3.038511]
20000 [D loss: 0.354960, acc.: 81.64%] [G loss: 2.719907]

↓1000iteration スクリーンショット 2020-06-12 20.27.24.png ↓2000iteration スクリーンショット 2020-06-12 20.27.39.png ↓10000iteration スクリーンショット 2020-06-12 20.28.31.png ↓20000iteration スクリーンショット 2020-06-12 20.29.26.png

At the beginning of learning, it was just a noise-like image, but in the end, it seems that even a simple two-layer generator can generate relatively realistic handwritten characters. However, white dots appear on the background of the handwritten image generated by simple GAN, and it is immediately noticeable that it is not handwritten. Next time, I would like to implement DCGAN using convolution to improve this weakness!