[PYTHON] [Introduction to Pytorch] I played with sinGAN ♬

I think it's probably one of the biggest discoveries of the year, so I played with it as an introduction to Pytorch. It's almost the same story as the following pioneers, so I'll focus on Uwan's hardships and a little commentary (what I noticed). The reference is as follows.

Citation
If you use this code for your research, please cite our paper:
@inproceedings{rottshaham2019singan,
  title={SinGAN: Learning a Generative Model from a Single Natural Image},
  author={Rott Shaham, Tamar and Dekel, Tali and Michaeli, Tomer},
  booktitle={Computer Vision (ICCV), IEEE International Conference on},
  year={2019}
}

【reference】 ①SinGAN: Learning a Generative Mode from a Single Natural Image@arXiv:1905.01164v2 [cs.CV] 4 Sep 2019 ②Code available at: https://github.com/tamarott/SinGAN ③ Tera was amazing when I read SinGAN's paper ④ [Paper commentary] SinGAN: Learning a Generative Model from a Single Natural Image ⑤ [SinGAN] Enables various image generation tasks from just one image

What i did

・ Environment and execution ・ Brief commentary on the dissertation ・ About Training ・ Animation ・ Super Resolution ・ Paint to Image

・ Environment and execution

First, download Zip from Github in Reference ② above and unzip it. You can install it in Pytorch environment the other day with the following command.

Install dependencies

python -m pip install -r requirements.txt

This code was tested with python 3.6 And what you can do is: (Translation of Github above)

Train To train a SinGAN model with your own image, place the training image under Input / Images and do the following:

python main_train.py --input_name <input_file_name>

You can also use the resulting trained model to generate random samples starting from the coarsest scale (n = 0). When trained, trained models are stored for each roughness scale (n).

To run this code on the CPU machine, specify --not_cuda when calling main_train.py

Random samples To generate a random sample from the roughness scale, first train the SinGAN model of the image you want, then do the following:

python random_samples.py --input_name <training_image_file_name> --mode random_samples --gen_start_scale <generation start scale number>

Note: Specify 0 for the starting roughness scale when using the full model and 1 for starting generation from the second scale. It seems that the beauty of the finished product should be scale 0.

Random samples of arbitrery sizes To generate a random sample of any size, first train the SinGAN model of the image you want (as above), then do the following:

python random_samples.py --input_name <training_image_file_name> --mode random_samples_arbitrary_sizes --scale_h <horizontal scaling factor> --scale_v <vertical scaling factor>

Animation from a single image To generate a short animation from a single image, do the following:

python animation.py --input_name <input_file_name>

This will automatically start a new training phase in noise padding mode. When the execution is finished, multiple Gif animations are automatically generated for each start roughness scale and stored in each Dir. The change is the largest when start_scale = 0, and the change becomes smaller as the start_scale increases. Harmonization To harmonize the pasted object with the image (see the example in Figure 13 of the paper), first train the SinGAN model for the background image you want (as above), then the naively pasted reference Save the image and it's binary mask to "Input / Harmonization" (see the example in the download file directory). Then do the following:

python harmonization.py --input_name <training_image_file_name> --ref_name <naively_pasted_reference_image_file_name> --harmonization_start_scale <scale to inject>

Note that different injection scales produce different harmony effects. The coarsest injection scale is 1.

Editing To edit the image (see the example in Figure 12 of the paper), first train the SinGAN model with the desired unedited image (as above), then a simple edited image, corresponding Save it with the binary map as a reference image under "Input / Editing" (see Example Saved Image). Then do the following:

python editing.py --input_name <training_image_file_name> --ref_name <edited_image_file_name> --editing_start_scale <scale to inject>

Both masked and unmasked output is saved. Again, different injection scales produce different editing effects. The coarsest injection scale is 1.

Paint to Image To convert the paint to a realistic image (see the example in Figure 11 of the paper), first train the SinGAN model with the image you want (as above), then under "Input / Paint" Save the paint to, and do the following:

python paint2image.py --input_name <training_image_file_name> --ref_name <paint_image_file_name> --paint_start_scale <scale to inject>

Again, different injection scales produce different editing effects. The coarsest injection scale is 1.

Advanced option: Specify quantization_flag to be True, to re-train only the injection level of the model, to get a on a color-quantized version of upsamled generated images from previous scale. For some images, this might lead to more realistic results.

Super Resolution To super-resolution the image, do the following:

python SR.py --input_name <LR_image_file_name>

This will automatically train the SinGAN model with a 4x upsampling factor (if it does not already exist). Specify the various SR coefficients using the parameter --sr_factor when calling the function. The SR factor is 4 by default, and the larger the value, the larger the finished image.

SinGAN results for the BSD100 dataset can be downloaded from the Downloads folder.

Additional Data and Functions Single Image Fréchet Inception Distance (SIFID score) To calculate the SIFID between the actual image and the corresponding fake sample, do the following:

python SIFID/sifid_score.py --path2real <real images path> --path2fake <fake images path> --images_suffix <e.g. jpg, png>

Make sure that each of the fake image file names is the same as the corresponding actual image file name.

・ Brief commentary on the dissertation

References are papers, etc., but I think the invention of sinGAN is as follows.

--One data learning --Use ResGAN (WGAN-GP Loss) ――Stepwise learning of features from global to local --Bonus; Supports multiple tasks

One data learning

Learning One data has probably become quite popular recently, but I think it's the first time I've actually learned and used it.

Use ResGAN (WGAN-GP Loss)

ResGAN is in Reference ⑥, and WGAN-GP is in Reference ⑦, and it is proposed as a method with high convergence performance. 【reference】 ⑥Generative Adversarial Network based on Resnet for Conditional Image Restoration@arXiv:1707.04881v1 [cs.CV] 16 Jul 2017 ⑦Improved Training of Wasserstein GANs First, ResGAN in Reference ⑥ is the following Generator.

On the other hand, the Generator of each stage of sinGAN is composed of ResGAN below the basic except for the first one. That is, $ (\ bar x_ {n-1}) ↑ ^ r $, which is an upsized image of $ z_n $ and a coarser image, is used as the input of $ G_n $, and the difference is learned to make it clearer. It generates the image $ \ bar x_n $. Note) Here, $ ↑ ^ r $ indicates the Upsizing of the image. By the way,

ReaGAN loss function

min_{G_n}max_{D_n}L_{adv}(G_n,D_n)+αL_{rec}(G_n)

The first term is WGAN-GP in Reference ⑦, which is expressed by the following formula. The second term is

L_{rec} = ||G_n(0,(\bar{x}^{rec}_{n+1}) ↑^r) − x_n||^2,

and for n = N, we use

L_{rec} = ||G_N (z^∗) − x_N||^2

"The input noise image at that time is $ z_n (n = 0, ..., N-1) = 0 $, and only $ z_N $ is a fixed random number set at the beginning of training." (Reference ④ Quoted from)

Global to local stepwise feature learning

Learning proceeds by repeating ResGAN as shown in the figure below. Here, learning starts from the bottom row, but here only $ z_N $ generated from random numbers is input. Dicriminator compares it with the reduced real image $ x_N $ of the original image, which is automatically determined when the number of learnings is determined. After that, input the image $ (\ bar x_ {n-1}) ↑ ^ r $ and $ z_ {N-1} $ which is the image $ \ bar x_ {n-1} $ generated in this way upsized. To do. In this way, various apps utilize the learned learning parameters and images.

・ About Training

As mentioned above, I think you can learn. Uwan's Pytorch environment uses 1060, so the GPU memory is about 3GB. With this, there were some images such as cows.png that could not be learned to the end. Therefore, I tried to reduce the size of the initial images (Input / images), but the size of the reduced image such as n = 0 for learning did not change and the Memmory error did not disappear easily. When I reduced it to the 1 / 3rd place, I managed to make the final value of n a little smaller and I was able to learn safely, but the result was that the learning image was small and not very interesting.

・ Animation

This is interesting because it is moving, but when you look at animation.py, it seems that you are moving by random numbers in the latent space by changing the locality of the feature (changing the value of start_scale). As a result, you can create an animation in which a small value of n fluctuates greatly and a large value of n hardly moves. Below are a few examples.

start_scale=0	start_scale=1	start_scale=2

・ Super Resolution

According to the table below in the paper, the accuracy is comparable to SRGAN, which Uwan introduced the other day. So, I tried the following. In the table below, the super resolution is increased toward the right. At the same time, the size is getting bigger as it goes to the right. You can feel the actual size and super resolution by clicking and displaying them independently.

original	Expansion 1	Expansion 2	Expansion 3

・ Paint to Image

It means converting a simple picture into an image. In the paper, the following example is published, and if you train the image on the left side, put the second simple picture in "Input / paints" and execute the command, the image on the right side will be output. This figure also shows that the sinGAN results are superior to other methods. The execution result of this Uwan is as follows. I tried to do this, but with 1060 I got a memory error and couldn't learn the image on the left. Therefore, The 250x141 image was reduced to 80x46. The Paint image is 300x200. The result is too small, but for the time being, the more coarse the learning parameters are, the coarser the image can be reproduced. On the other hand, when n = 1, a cow-like image appears to some extent.

I will do it on a machine with a little larger memory

The original image	Paint	n=1	n=2	n=3	n=4

Summary

・ I played with sinGAN ・ For the time being, I understood the principle ・ I was able to realize the power of local learning from the global area using the new ResGAN.

・ If it is 1060, the GPU memory will be insufficient and the size of the picture will be limited. ・ I think it's a discovery that gives us a sense of progress.

bonus

The parameters of ResGAN's Generator and Dicriminator are adjusted according to the input image size, and have the following structure.

GeneratorConcatSkip2CleanAdd(
  (head): ConvBlock(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Sequential(
    (0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1))
    (1): Tanh()
  )
)
WDiscriminator(
  (head): ConvBlock(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Conv2d(32, 1, kernel_size=(3, 3), stride=(1, 1))
．．．

GeneratorConcatSkip2CleanAdd(
  (head): ConvBlock(
    (conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Sequential(
    (0): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1))
    (1): Tanh()
  )
)
WDiscriminator(
  (head): ConvBlock(
    (conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1))
)