[PYTHON] I played with Single GANs (SinGAN) (+ I also summarized the points I had a hard time trying to implement (through path, Linux command, googlecolab utilization, etc.))

Introduction

GANs: The technology of hostile generation networks is advancing day by day. Even I, who works in a non-IT system (engineer in the manufacturing industry), is a technology that is of great interest. It's best to implement and play with it to gain a deeper understanding of the technology. Therefore, this time, I would like to implement an algorithm announced in 2019 called ** Single GANs **. Since it is an algorithm that generates a composite image from a single image, it is an article that I actually tried to move.

However, implementing this latest paper had some hurdles for me in itself, so I am writing an article focusing on those difficulties.

This paper

Here is the paper I want to implement this time.

SinGAN: Learning a Generative Model from a Single Natural Image https://arxiv.org/abs/1905.01164

Only a single image can be used as teacher data, and a new image close to that teacher data can be generated. In addition, you can generate an image close to the original image from a handwritten image (Paint to image), or superimpose another image and convert it to the same style (Harmonization).

image.png

image.png

I cannot fully understand the detailed algorithm explanation, so I would appreciate it if you could see the explanations of other people.

[Paper commentary] SinGAN: Learning a Generative Model from a Single Natural Image https://qiita.com/takoroy/items/27f918a2fe54954b29d6

When I read SinGAN's paper, it was amazing https://qiita.com/yoyoyo_/items/81f0b4ca899152ac8806

Well, in order to implement it, I first downloaded the set of programs from this github and unzipped the zip file. https://github.com/tamarott/SinGAN

Pass the environment file path

By the way, when implementing the content of the dissertation, it is often instructed by a command from the terminal as shown in the image below.

Terminal


python -m pip install -r requirements.txt

When I first tried to install the required libraries with this, I got the following error message. 001.png

This means that python.exe cannot be started by the command python = the path is not passed. Therefore, it is necessary to make settings for passing the path.

Right-click on the Windows icon ⇒ Make settings (our Windows 10 Home). image.png

Then, type environment in the search field, and the edit field for system environment variables will appear. image.png

Then click Environment Variables from System Properties. image.png

Edit the Path here. image.png Select New and enter the path to the folder where python.exe is stored. This will allow the path to pass and solve the problem. image.png

If successful, you can confirm it like this.

Understand Argument Parser

Next, I thought that the python command passed, and when I proceeded to the next, such a command appeared. I understand that I run random_samples.py, but it's a command with a double hyphen after that. Upon examination, this uses a module called Argument Parser that allows you to specify arguments from terminal commands.

Reference URL https://qiita.com/kzkadc/items/e4fc7bc9c003de1eb6d0

Terminal


python random_samples.py --input_name <training_image_file_name> --mode random_samples --gen_start_scale <generation start scale number>

It is convenient to be able to specify it from the command line, but what if you want to use it with the kernel on vs code or jupyter? It was described in detail at this URL. http://flat-leon.hatenablog.com/entry/python_argparse


# 3.Parse startup parameters using ArgumentParser object
args = parser.parse_args()

It seems that the parameters at startup are analyzed here, so it seems that it can be started on the kernel by creating a list etc. and passing it here.

Start learning

Now that you understand the file path and argument, let's run it. However, I found that it takes a lot of time to learn on a small PC.

In the first learning this time, about 3 hours passed in the cross section where 6 times (in scale 5) out of 9 times (scale 8) were executed. After all, you can see that the calculation of image processing including GAN takes a very long time. Therefore, I decided to use the GPU of Google Colab obediently here.

Upload this folder together on Google Drive. Then, first move the directory to that storage folder.

GoogleColab


cd  /content/drive/My Drive/SinGAN-master

Now you can move .py files etc. by using Linux commands.

Image generation from noise (Train)

We will learn to resemble the original image from the noise image. The initial learning process starts with a very small image size and gradually grows to the original image size.

GoogleColab


!python main_train.py --input_name cows.png

When using Linux commands, it works by putting! First. If you move it, you can proceed very quickly. The calculation was completed in about 30 minutes. Installing the library is also very easy, so the one that takes a lot of processing time is Google Colab. ..

image.png

Now, let's compare the generated image with the original image. The increase in the number of scales is the result of the increase in the number of calculations. ** Hmm, it's indistinguishable from the real thing. ** The image size is small when the number of scales is low, but it is the same size for easy comparison. By doing this, you can see that the image gradually changes to a clearer image closer to the original image. Furthermore, you can see that not only the image quality improves, but also the placement of the cows is different each time. You can see that it is not a process that simply improves the image quality.

Processing to resemble the original image from a handwritten image (Paint to Image)

Next, run a program that resembles the teacher data image from the handwritten image. When doing this, the teacher data you want to resemble must first be trained above.

GoogleColab


!python paint2image.py --input_name volacano.png --ref_name volacano3.png --paint_start_scale 1

image.png

Let's see the result. ** I couldn't reproduce it well. ** The original image is at the bottom right. And the smaller the value of start_scale, the higher the number of trainings. This time, I feel that start_scale3 and 4 are the closest.

Probably, it seems difficult to imitate if the original image is not similar in handwriting **.

Try changing the image size freely (Random samples of arbitrery sizes)

Next, the size of the image is changed based on the original image.

GoogleColab


!python random_samples.py --input_name cows.png --mode random_samples_arbitrary_sizes --scale_h 5 --scale_v 4

scale_h is the horizontal scale and 1 is 1x. Also, scale_v is the vertical scale.

49.png

As a test, I made a large image. ** But it feels bad. .. It has become an image of cows crowded in the prairie. Excuse me. .. ** **

Synthesize close to the original style (Harmonization)

Finally, the process is to modify it according to the style of the original image. For this process as well, you need to train with train first. This time, I tried to combine the photo I took with the free image of the fish.

GoogleColab


!python harmonization.py --input_name fish.png --ref_name fish1.png --harmonization_start_scale 1

image.png

What a big fish turned into a school of light blue like Swimmy (or Pokemon Yowashi). It must have been processed based on the original vermilion school of fish. It's a very interesting algorithm.

in conclusion

I actually moved my hands and played with Single GANs, the latest paper of GAN. It turned out to be very easy to use.

I learned a lot about building the environment when implementing it. In particular, I was impressed by the fact that Google Colab makes it easy to move even models with a high computational load and see the results. I felt the greatness of Google again.

This time, I focused on implementing and playing with it, so I would like to deepen my understanding of the theoretical contents. I have already published some derived papers, so I would like to learn about their relevance.

Recommended Posts

I played with Single GANs (SinGAN) (+ I also summarized the points I had a hard time trying to implement (through path, Linux command, googlecolab utilization, etc.))
I had a hard time trying to access Hadoop3.0.0 from a browser (and Arch Linux)
A story that I had a hard time trying to create an "app that converts images like paintings" with the first web application
[Introduction to StyleGAN] I played with "The Life of a Man" ♬