Continuing from the last time, I have implemented various things as a programming practice. This time I will use VAEGAN to reconstruct the propaganda poster. (Suddenly why the Soviet Union? It's really just a hobby. There are no political claims.)
If you have any technical suggestions, I would appreciate it if you could comment. The main things you can understand in this article are the level of VAEGAN reconstruction and points to keep in mind when learning GAN.
** The flow of reconstruction is as follows. ** **
In addition, the implementation of VAEGAN of PyTorch is based on here. Thank you very much.
Autoencoding beyond pixels using a learned similarity metric In a normal simple GAN, there are two models, Generator and Discriminator, and by learning hostile to each other, a good Generator can learn. However, the problem with GAN is that the ** Generator input is random noise. ** What I mean is that the input is random, so I want this data explicitly! Even so, it is not possible to generate the targeted data with normal GAN. VAEGAN, on the other hand, is designed to allow you to explicitly reconstruct the data you want. What it means is that the Generator has been replaced by VAE. This makes it possible to explicitly reconstruct data that is close to the input data. This time, I will explicitly reconstruct the data of Soviet-style posters.
I used Image Downloader. This extension for chrome is insanely useful when collecting training data. it's recommended. And we collected about 300 pieces of the following data.
No, this avant-garde feeling is wonderful! (It's just a hobby)
For implementation, I basically referred to the above. However, there were times when I couldn't learn well as it was (reconstruction failed), so I am making some corrections. ** There are two main types: error function and model definition. ** **
The error function changed L_llike of Enc_loss in train.py to none as follows. Also, Dec_loss's L_gan_fake has been changed without it. The above two L_llike and L_gan_fake seem to incorporate the features of Discriminator into the error function. In my environment, unfortunately it was not possible to converge with the original ver ...
train.py
# Enc_loss ---
Enc_loss = L_prior + L_recon
Enc_loss.backward()
Enc_optimizer.step()
Enc_running_loss += Enc_loss.item()
# train Decoder ===
Dec_optimizer.zero_grad()
x_real = Variable(data)
z_fake_p = Variable(torch.randn(opt.batch_size, opt.nz))
if opt.use_gpu:
x_real = x_real.cuda()
z_fake_p = z_fake_p.cuda()
x_fake, mu, logvar = G(x_real)
# L_gan ---
y_real_loss = bce_loss(D(x_real), t_fake)
y_fake_loss = bce_loss(D(x_fake), t_real)
y_fake_p_loss = bce_loss(D(G.decoder(z_fake_p)), t_real)
L_gan_fake = (y_real_loss + y_fake_loss + y_fake_p_loss) / 3.0
# L_llike ---
L_recon = opt.gamma * l1_loss(x_fake, x_real)
L_llike = l1_loss(D.feature(x_fake), D.feature(x_real))
# Dec_loss ---
Dec_loss = L_recon + L_llike
Dec_loss.backward()
Dec_optimizer.step()
Dec_running_loss += Dec_loss.item()
In the model definition, the main changes are that the activation function of each model is set to LeakyReLU and that BatchNorm2d of Discriminator is removed. Also, the input size is 128 x 128, and channel is 3 because it is a color image.
I turned train.py and saved the reconstruction result every 100 epoch in the logs directory. Calculated with epoch = 1000, batchsize = 16, and latent variable dimension = 32. The reconstruction result is as follows. The upper part of each image is the input image, and the lower part is the output result of VAEGAN.
100epoch 300epoch 600epoch 1000epoch
VAEGAN I honestly thought it was amazing .. At the 100th epoch, the reconstruction result is blurry, but if you turn 1000epoch, you can reproduce it perfectly so that there is almost no difference from the real thing (input). The VAEGAN paper mentioned at the beginning of this article also made a comparison with VAE, and I knew that it could be reconstructed extremely neatly, but this verification made me realize it again. ** But there are many things to keep in mind when learning GAN. ** ** First of all, learning GAN tends to be complicated and difficult to stabilize, so we must pay attention to the transition of Loss of Discriminator and other models. Cases where it doesn't work are when Discriminator's Loss converges at a tremendous rate, or when other Loss do not converge at all and rather diverge. This time, there was a case where Discriminator converged abruptly, so I removed BatchNorm2d of Discriminator. Also, it seems that Leaky ReLU is used rather than ReLU in recent GAN models. In addition, here is also helpful as a countermeasure for situations where GAN learning does not go well.