[PYTHON] The story of making a mel icon generator

Introduction

Do you know this icon?

Yes, it's the icon of the famous Melville . It is known that there are many people who have Melville draw their favorite characters and use them as thumbnails on twitter, and have gained great support. The icon drawn by this person is often called "Mel Icon" because of its unique style. Examples of typical mel icons

(Respectively of Yukatayu and Shun (As of February 19, 2020))

I also want an icon like this! !! !! !! !! !! So I made a mel icon generator by machine learning. In this article, I would like to briefly introduce the method used for it.

What is GAN

GAN (Generative adversarial networks) is used for generation.

is20tech001zu004-1.jpg Figure quote Original

This method combines a neural network (Generator) that generates an image and a neural network (Discriminator) that identifies whether the input data is a mel icon or not. The Generator tries to generate an image that resembles a mel icon as much as possible in order to deceive the Discriminator, and the Discriminator learns to identify the image more accurately. As the two neural networks train each other, the Generator will be able to generate images that are close to the Mel Icon.

Data set collection

In order for the Generator to be able to generate images that look like Mel icons, and for the Discriminator to be able to identify whether the input image is a Mel icon, bring as many real Mel icons as possible to the teacher. You need to create a dataset that will be the data and use it for training. So I went around twitter, found the thumbnail of Mel Icon, and saved it repeatedly, and got more than 100 sheets. Use this for learning.

Creating a Generator

Let the Generator look at the Mel icon prepared earlier and learn to generate an image that looks like it. The image to be generated is 64 x 64 pixels, and the color is rgb 3 channels. If the Generator generates similar data every time, learning will not proceed well, so it is necessary to be able to generate as many types of images as possible. Therefore, input a sequence of random numbers to the Generator for image generation. For this sequence, a process called "transpose convolution", which will be described later, is applied to each convolution layer to gradually bring it closer to an image with 3 channels of 64 x 64 pixels and rgb.

What is transpose convolution?

For normal convolution, as shown below, the sum product is taken and output while shifting the kernel. In pytorch, it can be implemented with torch.nn.Conv2d, for example.

Source Source

On the other hand, in the transposed convolution used this time, the product with the kernel is calculated for each element, and the sum of the results obtained is taken. As an image, it feels like expanding the target element. In pytorch, it can be implemented with torch.nn.ConvTranspose2d, for example.

Source

This transposed convolution layer and the self_attention layer (described later) are overlapped, and the number of output channels is 3 in the last layer. (Corresponds to rgb respectively) From the above contents, the outline of the Generator you are trying to make is as shown in the figure below.

generator_structure.png

This Generator has a total of 5 transposed convolution layers, with a layer called self_attention between the 3rd and 4th layers and between the 4th and 5th layers. By looking at pixels with similar values at once, it is possible to evaluate the entire image with a relatively small amount of calculation.

The Generator configured in this way outputs, for example, such an image if it is in an unlearned state. (The result depends on the sequence of random numbers you enter.) Since it has not been learned yet, only something like noise can be output. However, by training each other with a neural network (Discriminator) that identifies whether the input data is a mel icon or not, which will be explained next, it will be possible to output such an image.

Creating a Discriminator

Ask Discriminator to look at the image generated by the Generator above to see if it is a mel icon. The point is to make an image recognizer. The input image is 64 x 64 pixels, the color is rgb 3 channels, and the output is a value (range 0 to 1) that indicates how much it looks like a mel icon. The composition is to stack 5 ordinary convolution layers and sandwich a self_attention layer between the 3rd and 4th layers and between the 4th and 5th layers. The figure is as follows.

discriminator_structure.png

Learning method / error function

The learning methods for Discriminator and Generator are as described below.

Discriminator learning

When an image is input, Discriminator returns a number 0 to 1 that indicates how much it looks like a mel icon. First, enter the real Mel icon, and set the output (value from 0 to 1) at that time to $ d_ {real} $. Next, enter a random number into the Generator and have it generate an image. Entering this image into Discriminator will return a value between 0 and 1 as well. Let's call this $ d_ {fake} $. Input the $ d_ {real} $ and $ d_ {fake} $ that come out in this way into the loss function described below to obtain the value used for error propagation.

Loss function

One of GAN's methods, SAGAN's "hinge version of the adversarial loss," uses the loss function described below. Simply put, this function labels $ l_ {i} $ and $ l_ {i} ^ {\ prime} $ with the correct labels, and $ y_ {i} $ and $ y_ {i} ^ {\ prime} $ from the Discriminator. When the output value, $ M $, is the number of data per mini-batch

-\frac{1}{M}\sum_{i=1}^{M}(l_{i}min(0,-1+y_{i})+(1-l_{i}^{\prime})min(0,-1-y_{i}^{\prime}))

Is expressed as. [^ 1] This time $ y_ {i} = d_ {real} $, $ y_ {i} ^ {\ prime} = d_ {fake} $, $ l_ {i} = 1 $ (indicates that it is a 100% mel icon) , $ L_ {i} ^ {\ prime} = 0 $ (indicating that it is not an absolute mel icon)

-\frac{1}{M}\sum_{i=1}^{M}(min(0,-1+d_{real})+min(0,-1-d_{fake}))

will do. This is the loss function of the Discriminator used this time. Adam was used as the error propagation optimization method, and the learning rate was set to 0.0004, and Adam's primary moment and secondary moment (exponential attenuation factor used for moment estimation) were set to 0.0 and 0.9, respectively.

Learning Generator

When a sequence of random numbers is input, Generator will generate an image while trying to make it look like a mel icon as much as possible. First, input the sequence $ z_ {i} $ made of random numbers into the Generator to get an image. Input it to Discriminator and output a value that shows how much it looks like a mel icon. Let's call this $ r_ {i} $.

Loss function

In SAGAN's "hinge version of the adversarial loss", the generator's loss function is defined as follows:

-\frac{1}{M}\sum_{i=1}^{M}r_{i}

In SAGAN, it seems that it is empirically known that this definition works well. [^ 1] Considering that $ M $ is the number of data per mini-batch, the judgment result of Discriminator is used as it is. I was a little surprised at this, but how about it? Adam was used as the error propagation optimization method, and the learning rate was set to 0.0001, and Adam's primary and secondary moments were set to 0.0 and 0.9, respectively. (Same as Discriminator except learning rate)

Overall picture

The image introduced above is reprinted, but the Generator and Discriminator created earlier are combined in this way to form a GAN. is20tech001zu004-1.jpg

Generate

Learn using the collected real mel icons and let the Generator generate mel icons. Keep the number of data $ M $ per mini-batch at 5. The result is as follows. generated_img64_remastered.png __awesome! !! !! !! !! !! !! !! !! !! !! __ __ Impressed! !! !! !! !! !! !! !! !! !! !! __ For comparison, an example of the input data is displayed on the upper side, and the actually generated image is displayed on the lower side. Also, the generated result will change each time it is executed. Personally, I was quite surprised to be able to do this with source code that is not that long. GAN is really great! !! !! !! !! !! !! !!

Task

I made something that can do such a great thing, but there are still some points that have not been solved yet.

Source code

The code I wrote is in this repository. https://github.com/zassou65535/image_generator

Summary

GAN is an insanely great technique. Even though the mode collapsed, I was able to make something quite close to the Mel icon with only nearly 100 datasets. Let's generate a pounding image with GAN as well.

bonus

If you simply average all the Mel icons you have collected, you will see the following image.

References

[^ 1]: Learn while making-Development by PyTorch Deep learning

Recommended Posts

The story of making a mel icon generator
The story of making a lie news generator
The story of making a music generation neural network
The story of writing a program
The story of making a question box bot with discord.py
The story of making Python an exe
The story of making an immutable mold
The story of blackjack A processing (python)
The story of making a standard driver for db with python.
The story of making a module that skips mail with python
The story of making a university 100 yen breakfast LINE bot with Python
The story of sys.path.append ()
The story of making a sound camera with Touch Designer and ReSpeaker
The story of making a package that speeds up the operation of Juman (Juman ++) & KNP
The story of making a box that interconnects Pepper's AL Memory and MQTT
The story of making a Line Bot that tells us the schedule of competitive programming
The story of launching a Minecraft server from Discord
A story that reduces the effort of operation / maintenance
A story about changing the master name of BlueZ
Zip 4 Gbyte problem is a story of the past
A story that analyzed the delivery of Nico Nama.
The story of building Zabbix 4.4
[Apache] The story of prefork
The story of a Django model field disappearing from a class
The story of creating a database using the Google Analytics API
The story of Python and the story of NaN
The story of participating in AtCoder
The story of the "hole" in the file
The story of remounting the application server
A story stuck with the installation of the machine learning library JAX
A story that struggled to handle the Python package of PocketSphinx
The story of creating a site that lists the release dates of books
The story of making a tool that runs on Mac and Windows at the game development site
A story that visualizes the present of Qiita with Qiita API + Elasticsearch + Kibana
The story of a Parking Sensor in 10 minutes with GrovePi + Starter Kit
The story of trying to reconnect the client
The story of verifying the open data of COVID-19
The story of adding MeCab to ubuntu 16.04
The story of having a hard time introducing OpenCV with M1 MAC
Measure the relevance strength of a crosstab
A quick overview of the Linux kernel
The story of developing a web application that automatically generates catchphrases [MeCab]
The story of manipulating python global variables
[python] [meta] Is the type of python a type?
The story of making a slackbot that outputs as gif or png when you send the processing code
The story of trying deep3d and losing
The story of deciphering Keras' LSTM model.predict
The story of the escape probability of a random walk on an integer grid
Get the filename of a directory (glob)
The story of pep8 changing to pycodestyle
The story of making a tool to load an image with Python ⇒ save it as another name
Notice the completion of a time-consuming command
The story of IPv6 address that I want to keep at a minimum
The story of Django creating a library that might be a little more useful
The story of making soracom_exporter (I tried to monitor SORACOM Air with Prometheus)
A story of a high school graduate technician trying to predict the survival of the Titanic
How to calculate the volatility of a brand
The story of doing deep learning with TPU
Visualize the inner layer of a neural network
Make a copy of the list in Python
Find the number of days in a month