This time, I tried to implement CycleGAN. Basically, we will implement it based on the code published on github. On this page, we will explain and implement a light paper. I would like to do it next time when I apply it using my own dataset.
--About CycleGAN --Implementation on Linux
It's easy, but I'll explain it according to the above two items.
Paper: https://arxiv.org/pdf/1703.10593.pdf I will explain according to this paper.
CycleGAN is a Generative Adversarial Network (GAN) that enables style conversion. The above figure is described in the paper, but if you want to perform style conversion (color painting) as shown on the left, learning to use with input and output image pairs such as ** pix2pix ** The method was adopted. In other words, a one-to-one correspondence is required, as shown in Paired in the figure. On the other hand, a method that enables unpaired style conversion as shown on the right has also been proposed. In the Unpaired method, it was necessary to define various metric spaces such as class label space, image feature space, and pixel space for each style conversion task, and use them to bring the input and output distances closer.
Therefore, ** CycleGAN ** was proposed as such a method that does not require a one-to-one pair image and does not need to change the learning method according to the task. Here are the images converted by ** CycleGAN **. Pictures such as landscapes have been transformed into the style of world-famous painters Monet and Gogh. This cannot be done with learning that requires pairs like ** pix2pix **. Because, in order to take a picture of the landscape drawn by Van Gogh et al., You have to travel back in time. And it also enables conversion between Zebra and Horse, and conversion between Summer and Winter. With ** CycleGAN **, you can learn style conversion without changing the learning method according to the task.
The introduction of ** Cycle-consustency loss ** makes this possible. This is the heart of this method, so I will explain it later. The image above shows the loss used in ** Cycle-GAN **. First, ** (a) ** becomes ** Adversarial loss **, which is the general loss of ** GAN **. Formulated by the above formula, in the first term, Discriminator means to identify the real data * y * from the real thing. The second term means identifying the data generated by the Generator as fake. Learning is performed so that this ** Adversarial loss ** is maximized (correctly identified) for the Discriminator and minimized (misidentified) for the Generator. For Discriminator, the meaning of maximizing the first term is that the probability value of real data * y * is identified as 1 (genuine). Also, the meaning of maximizing the second term is to identify the probability value of the fake * G (z) * generated by using * G () * for * z * as 0 (fake). For Generator, the opposite is true. The purpose is to create * G () * that the Discriminator cannot identify. When maximizing / minimizing these, one is fixed. By alternately performing this maximization and minimization, we will proceed with learning. Do this for both domains. That is, <img width = "134" alt = "comment 2020-05-23 160615.png " src = "https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/642312" /be5f6f1c-6454-ec9c-6140-e4ded0610b8f.png "> and <img width =" 135 "alt =" comment 2020-05-23 160533.png "src =" https://qiita-image-store.s3. It means that we will optimize for ap-northeast-1.amazonaws.com/0/642312/5cb2a163-436d-2f39-72c4-c9405cf63283.png ">.
Next, let's talk about ** (b) ** and ** (c) **. This is called ** Cycle-consustency loss ** and is expressed by the following formula. In this first section, the data * x * generated using * G () * and the data * G (x) * returned to the original domain using * F () * * F (G (x) )) * Is properly * x *, using the L1 norm. In the second section, you do the opposite. The idea is simple.
Finally, combine ** (a) to (c) **, Set the objective function as shown here. By solving this optimization problem, you can learn the desired * G * and * F *.
If you want to see more results, please take a look at the paper.
Public code https://github.com/xhujoy/CycleGAN-tensorflow
Implementation environment
First, clone git to any directory.
Then change to the CycleGAN-tensorflow /
directory.
This time, we will download the ** horse2zebra ** dataset that was also used in the paper.
$ git clone https://github.com/xhujoy/CycleGAN-tensorflow
$ cd CycleGAN-tensorflow/
$ bash ./download_dataset.sh horse2zebra
Next, we will train with the downloaded ** horse2zebra ** dataset.
$ CUDA_VISIBLE_DEVICES=0 python main.py --dataset_dir=horse2zebra
When specifying GPU, specify with CUDA_VISIBLE_DEVICES =
.
Learning begins.
Epoch: [ 0] [ 0/1067] time: 14.2652
Epoch: [ 0] [ 1/1067] time: 16.9671
Epoch: [ 0] [ 2/1067] time: 17.6442
Epoch: [ 0] [ 3/1067] time: 18.3194
Epoch: [ 0] [ 4/1067] time: 19.0001
Epoch: [ 0] [ 5/1067] time: 19.6724
Epoch: [ 0] [ 6/1067] time: 20.3511
Epoch: [ 0] [ 7/1067] time: 21.0326
Epoch: [ 0] [ 8/1067] time: 21.7106
Epoch: [ 0] [ 9/1067] time: 22.3866
Epoch: [ 0] [ 10/1067] time: 23.0501
Epoch: [ 0] [ 11/1067] time: 23.7298
.
.
.
By default, Epoch is set to 200 times.
You can change this according to the dataset you apply.
If you're not learning transformations that make such a big difference, you might try reducing Epoch.
Note that there are testA /
, testB /
, trainA /
, and trainB /
in the downloaded datasets / horse2zebra /
directory, and there are images in each directory. ____ is inside.
Even at the time of learning, if there is no data in either testA /
or testB /
, the following error will be thrown.
ValueError: Cannot feed value of shape (1, 256, 256, 6) for Tensor 'real_A_and_B_images:0', which has shape '(?, 512, 512, 6)'
Be careful when building and implementing your own dataset.
The test is done with the following command.
$ CUDA_VISIBLE_DEVICES=0 python main.py --dataset_dir=horse2zebra --phase=test --which_direction=AtoB
Specify AtoB or BtoA with the --which_direction =
option.
Images in datasets / horse2zebra / testA
or datasets / horse2zebra / testB
are converted and saved in test /
.
Each image is easy to understand and is marked with ʻAtoB_or
BtoA_`.
The following is an example of the test results.
horse2zebra (AtoB)
zebra2horse (BtoA) The conversion is done firmly. it's great. The above is the implementation on Linux. Next time, I would like to apply it to my own dataset.
Paper: https://arxiv.org/pdf/1703.10593.pdf Github:https://github.com/xhujoy/CycleGAN-tensorflow
Recommended Posts