[PYTHON] [Introduction to Pytorch] I played with sinGAN ♬

I think it's probably one of the biggest discoveries of the year, so I played with it as an introduction to Pytorch. It's almost the same story as the following pioneers, so I'll focus on Uwan's hardships and a little commentary (what I noticed). The reference is as follows.

Citation
If you use this code for your research, please cite our paper:
@inproceedings{rottshaham2019singan,
  title={SinGAN: Learning a Generative Model from a Single Natural Image},
  author={Rott Shaham, Tamar and Dekel, Tali and Michaeli, Tomer},
  booktitle={Computer Vision (ICCV), IEEE International Conference on},
  year={2019}
}

【reference】 ①SinGAN: Learning a Generative Mode from a Single Natural Image@arXiv:1905.01164v2 [cs.CV] 4 Sep 2019Code available at: https://github.com/tamarott/SinGANTera was amazing when I read SinGAN's paper[Paper commentary] SinGAN: Learning a Generative Model from a Single Natural Image[SinGAN] Enables various image generation tasks from just one image

What i did

・ Environment and execution ・ Brief commentary on the dissertation ・ About Training ・ Animation ・ Super Resolution ・ Paint to Image

・ Environment and execution

First, download Zip from Github in Reference ② above and unzip it. You can install it in Pytorch environment the other day with the following command.

Install dependencies

python -m pip install -r requirements.txt

This code was tested with python 3.6 And what you can do is: (Translation of Github above)

Train To train a SinGAN model with your own image, place the training image under Input / Images and do the following:

python main_train.py --input_name <input_file_name>

You can also use the resulting trained model to generate random samples starting from the coarsest scale (n = 0). When trained, trained models are stored for each roughness scale (n).

To run this code on the CPU machine, specify --not_cuda when calling main_train.py

Random samples To generate a random sample from the roughness scale, first train the SinGAN model of the image you want, then do the following:

python random_samples.py --input_name <training_image_file_name> --mode random_samples --gen_start_scale <generation start scale number>

Note: Specify 0 for the starting roughness scale when using the full model and 1 for starting generation from the second scale. It seems that the beauty of the finished product should be scale 0.

Random samples of arbitrery sizes To generate a random sample of any size, first train the SinGAN model of the image you want (as above), then do the following:

python random_samples.py --input_name <training_image_file_name> --mode random_samples_arbitrary_sizes --scale_h <horizontal scaling factor> --scale_v <vertical scaling factor>

Animation from a single image To generate a short animation from a single image, do the following:

python animation.py --input_name <input_file_name> 

This will automatically start a new training phase in noise padding mode. When the execution is finished, multiple Gif animations are automatically generated for each start roughness scale and stored in each Dir. The change is the largest when start_scale = 0, and the change becomes smaller as the start_scale increases. Harmonization To harmonize the pasted object with the image (see the example in Figure 13 of the paper), first train the SinGAN model for the background image you want (as above), then the naively pasted reference Save the image and it's binary mask to "Input / Harmonization" (see the example in the download file directory). Then do the following:

python harmonization.py --input_name <training_image_file_name> --ref_name <naively_pasted_reference_image_file_name> --harmonization_start_scale <scale to inject>

Note that different injection scales produce different harmony effects. The coarsest injection scale is 1.

Editing To edit the image (see the example in Figure 12 of the paper), first train the SinGAN model with the desired unedited image (as above), then a simple edited image, corresponding Save it with the binary map as a reference image under "Input / Editing" (see Example Saved Image). Then do the following:

python editing.py --input_name <training_image_file_name> --ref_name <edited_image_file_name> --editing_start_scale <scale to inject>

Both masked and unmasked output is saved. Again, different injection scales produce different editing effects. The coarsest injection scale is 1.

Paint to Image To convert the paint to a realistic image (see the example in Figure 11 of the paper), first train the SinGAN model with the image you want (as above), then under "Input / Paint" Save the paint to, and do the following:

python paint2image.py --input_name <training_image_file_name> --ref_name <paint_image_file_name> --paint_start_scale <scale to inject>

Again, different injection scales produce different editing effects. The coarsest injection scale is 1.

Advanced option: Specify quantization_flag to be True, to re-train only the injection level of the model, to get a on a color-quantized version of upsamled generated images from previous scale. For some images, this might lead to more realistic results.

Super Resolution To super-resolution the image, do the following:

python SR.py --input_name <LR_image_file_name>

This will automatically train the SinGAN model with a 4x upsampling factor (if it does not already exist). Specify the various SR coefficients using the parameter --sr_factor when calling the function. The SR factor is 4 by default, and the larger the value, the larger the finished image.

SinGAN results for the BSD100 dataset can be downloaded from the Downloads folder.

Additional Data and Functions Single Image Fréchet Inception Distance (SIFID score) To calculate the SIFID between the actual image and the corresponding fake sample, do the following:

python SIFID/sifid_score.py --path2real <real images path> --path2fake <fake images path> --images_suffix <e.g. jpg, png>

Make sure that each of the fake image file names is the same as the corresponding actual image file name.

・ Brief commentary on the dissertation

References are papers, etc., but I think the invention of sinGAN is as follows.

--One data learning --Use ResGAN (WGAN-GP Loss) ――Stepwise learning of features from global to local --Bonus; Supports multiple tasks

One data learning

Learning One data has probably become quite popular recently, but I think it's the first time I've actually learned and used it.

Use ResGAN (WGAN-GP Loss)

ResGAN is in Reference ⑥, and WGAN-GP is in Reference ⑦, and it is proposed as a method with high convergence performance. 【reference】 ⑥Generative Adversarial Network based on Resnet for Conditional Image Restoration@arXiv:1707.04881v1 [cs.CV] 16 Jul 2017Improved Training of Wasserstein GANs First, ResGAN in Reference ⑥ is the following Generator. resGAN_original.jpg

On the other hand, the Generator of each stage of sinGAN is composed of ResGAN below the basic except for the first one. That is, $ (\ bar x_ {n-1}) ↑ ^ r $, which is an upsized image of $ z_n $ and a coarser image, is used as the input of $ G_n $, and the difference is learned to make it clearer. It generates the image $ \ bar x_n $. Note) Here, $ ↑ ^ r $ indicates the Upsizing of the image. resGAN.jpg By the way,

ReaGAN loss function

min_{G_n}max_{D_n}L_{adv}(G_n,D_n)+αL_{rec}(G_n)

The first term is WGAN-GP in Reference ⑦, which is expressed by the following formula. WGAN_loss.jpg The second term is

L_{rec} = ||G_n(0,(\bar{x}^{rec}_{n+1}) ↑^r) − x_n||^2, 

and for n = N, we use

L_{rec} = ||G_N (z^∗) − x_N||^2

"The input noise image at that time is $ z_n (n = 0, ..., N-1) = 0 $, and only $ z_N $ is a fixed random number set at the beginning of training." (Reference ④ Quoted from)

Global to local stepwise feature learning

Learning proceeds by repeating ResGAN as shown in the figure below. Here, learning starts from the bottom row, but here only $ z_N $ generated from random numbers is input. Dicriminator compares it with the reduced real image $ x_N $ of the original image, which is automatically determined when the number of learnings is determined. After that, input the image $ (\ bar x_ {n-1}) ↑ ^ r $ and $ z_ {N-1} $ which is the image $ \ bar x_ {n-1} $ generated in this way upsized. To do. multi_resGAN.jpg In this way, various apps utilize the learned learning parameters and images.

・ About Training

As mentioned above, I think you can learn. Uwan's Pytorch environment uses 1060, so the GPU memory is about 3GB. With this, there were some images such as cows.png that could not be learned to the end. Therefore, I tried to reduce the size of the initial images (Input / images), but the size of the reduced image such as n = 0 for learning did not change and the Memmory error did not disappear easily. When I reduced it to the 1 / 3rd place, I managed to make the final value of n a little smaller and I was able to learn safely, but the result was that the learning image was small and not very interesting.

・ Animation

This is interesting because it is moving, but when you look at animation.py, it seems that you are moving by random numbers in the latent space by changing the locality of the feature (changing the value of start_scale). As a result, you can create an animation in which a small value of n fluctuates greatly and a large value of n hardly moves. Below are a few examples.

start_scale=0 start_scale=1 start_scale=2
alpha=0.100000_beta=0.800000.gif alpha=0.100000_beta=0.800000.gif alpha=0.100000_beta=0.800000.gif
alpha=0.100000_beta=0.800000.gif alpha=0.100000_beta=0.800000.gif alpha=0.100000_beta=0.800000.gif
alpha=0.100000_beta=0.800000.gif alpha=0.100000_beta=0.800000.gif alpha=0.100000_beta=0.800000.gif

・ Super Resolution

According to the table below in the paper, the accuracy is comparable to SRGAN, which Uwan introduced the other day. SR_evaluation.jpg SR_comparison.jpg So, I tried the following. In the table below, the super resolution is increased toward the right. At the same time, the size is getting bigger as it goes to the right. You can feel the actual size and super resolution by clicking and displaying them independently.

original Expansion 1 Expansion 2 Expansion 3
mayuyu128.jpg mayuyu128_HR.png mayuyu128_HR.png mayuyu128_HR.png
33039_LR.png 33039_LR_HR.png 33039_LR_HR.png 33039_LR_HR.png
romanesco.jpg romanesco_HR.png romanesco_HR.png

・ Paint to Image

It means converting a simple picture into an image. In the paper, the following example is published, and if you train the image on the left side, put the second simple picture in "Input / paints" and execute the command, the image on the right side will be output. This figure also shows that the sinGAN results are superior to other methods. paint2image.jpg The execution result of this Uwan is as follows. I tried to do this, but with 1060 I got a memory error and couldn't learn the image on the left. Therefore, The 250x141 image was reduced to 80x46. The Paint image is 300x200. The result is too small, but for the time being, the more coarse the learning parameters are, the coarser the image can be reproduced. On the other hand, when n = 1, a cow-like image appears to some extent.

The original image Paint n=1 n=2 n=3 n=4
cows2.png cows.png start_scale=1.png start_scale=2.png start_scale=3.png start_scale=4.png

Summary

・ I played with sinGAN ・ For the time being, I understood the principle ・ I was able to realize the power of local learning from the global area using the new ResGAN.

・ If it is 1060, the GPU memory will be insufficient and the size of the picture will be limited. ・ I think it's a discovery that gives us a sense of progress.

bonus

The parameters of ResGAN's Generator and Dicriminator are adjusted according to the input image size, and have the following structure.

GeneratorConcatSkip2CleanAdd(
  (head): ConvBlock(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Sequential(
    (0): Conv2d(32, 3, kernel_size=(3, 3), stride=(1, 1))
    (1): Tanh()
  )
)
WDiscriminator(
  (head): ConvBlock(
    (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Conv2d(32, 1, kernel_size=(3, 3), stride=(1, 1))
...

GeneratorConcatSkip2CleanAdd(
  (head): ConvBlock(
    (conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Sequential(
    (0): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1))
    (1): Tanh()
  )
)
WDiscriminator(
  (head): ConvBlock(
    (conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1))
    (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
  )
  (body): Sequential(
    (block1): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block2): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
    (block3): ConvBlock(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (norm): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (LeakyRelu): LeakyReLU(negative_slope=0.2, inplace=True)
    )
  )
  (tail): Conv2d(64, 1, kernel_size=(3, 3), stride=(1, 1))
)

Recommended Posts

[Introduction to Pytorch] I played with sinGAN ♬
[Introduction to Pytorch] I tried categorizing Cifar10 with VGG16 ♬
[Python] Introduction to CNN with Pytorch MNIST
I tried to implement CVAE with PyTorch
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
[Introduction to sinGAN-Tensorflow] I played with the super-resolution "Challenge Big Imayuyu" ♬
[Introduction to Matplotlib] Axes 3D animation: I played with 3D Lissajous figures ♬
[Introduction to RasPi4] I played with "Hiroko / Hiromi's poisonous tongue conversation" ♪
[Introduction to StyleGAN] I played with "A woman transforms into Mayuyu" ♬
I played with wordcloud!
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
[Introduction to AWS] I played with male and female voices with Polly and Transcribe ♪
[Introduction to StyleGAN] I played with style_mixing "Woman who takes off glasses" ♬
Introduction to PyTorch (1) Automatic differentiation
Introduction to Nonlinear Optimization (I)
I made Word2Vec with Pytorch
I tried to implement SSD with PyTorch now (Dataset)
[Introduction to AWS] I tried playing with voice-text conversion ♪
[Introduction to system trading] I drew a Stochastic Oscillator with python and played with it ♬
[Introduction to pytorch] Preprocessing by audio I / O and torch audio (> <;)
I tried to classify MNIST by GNN (with PyTorch geometric)
[Introduction to Pytorch] I want to generate sentences in news articles
I tried to implement SSD with PyTorch now (model edition)
I tried moving food with SinGAN
I implemented Attention Seq2Seq with PyTorch
[Details (?)] Introduction to pytorch ~ CNN CIFAR10 ~
I tried to explain Pytorch dataset
Introduction to RDB with sqlalchemy II
I tried implementing DeepPose with PyTorch
How to Data Augmentation with PyTorch
I played with PyQt5 and Python3
I want to do ○○ with Pandas
I played with Mecab (morphological analysis)!
I want to debug with Python
I tried to implement sentence classification by Self Attention with PyTorch
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 1
[Introduction] I want to make a Mastodon Bot with Python! 【Beginners】
I read "Reinforcement Learning with Python: From Introduction to Practice" Chapter 2
[Introduction to WordCloud] Let's play with scraping ♬
I want to detect objects with OpenCV
I played with DragonRuby GTK (Game Toolkit)
I implemented Shake-Shake Regularization (ShakeNet) with PyTorch
I tried to implement Autoencoder with TensorFlow
I tried to visualize AutoEncoder with TensorFlow
I tried to get started with Hy
I want to blog with Jupyter Notebook
[Scikit-learn] I played with the ROC curve
I wanted to solve ABC160 with Python
[PyTorch] Introduction to document classification using BERT
I want to pip install with PythonAnywhere
[Introduction to Python] Let's use foreach with Python
I want to analyze logs with Python
I want to play with aws with python
I tried batch normalization with PyTorch (+ note)
I tried implementing DeepPose with PyTorch PartⅡ
I tried to solve TSP with QAOA
I wanted to solve ABC172 with Python
I really wanted to copy with selenium
I tried fMRI data analysis with python (Introduction to brain information decoding)
I tried to compare the accuracy of Japanese BERT and Japanese Distil BERT sentence classification with PyTorch & Introduction of BERT accuracy improvement technique