I want to color black-and-white photos of memories with GAN

Motivation

This is a theme that is often related to machine learning, but I would like to colorize black-and-white photographs. I also put the code on GitHub. [GitHub]

The other day, when I was looking back at my father's relics, an old black-and-white photograph suddenly appeared. I left it as it was when I died, but now I think I can color it as well. It's more like a personal hobby blog than a technical blog. In addition, there are blogs and books of those who are referred to when studying GAN. That person's article is very educational, and this article also has some content. I will post a link below, so please have a look.

[Reference URL] Shikoan's ML Blog Learn from mosaic removal, cutting-edge deep learning

Blog structure

-[Overview of pix2pix](Overview of # pix2pix) - GAN - PatchGAN - Unet -[About this dataset](#About this dataset) - Google PhotoScan -[Image size](## Image size) -[Start learning](# Start learning) -[Part 1](## Part 1) -[Part 2](## Part 2) -[Cross entropy loss and Hinge loss](### Cross entropy loss and Hinge loss) - Instance Normalization -[Part 3](## Part 3) -[Part 4](## Part 4) -[Summary. And the result of the important photo](#Summary. And the result of the important photo)

pix2 pix overview

GAN We will incorporate a Discriminator into the Generator and learn both in parallel. It is an image of adding a dynamic loss function using the predicted value of Discriminator to Loss of Generator. There are various types of GAN, and the division varies depending on the definition, but there are the following divisions.

type Overview
Conditional GAN There is a relationship between Generator input and output
Non-Conditional GAN Generator input and output are not related

This time pix2pix is a typical Conditional GAN (CGAN). DCGAN, which is often used in GAN tutorials, generates output from noise data, so it becomes a Non-Conditional GAN. Since pix2pix is CGAN, input information becomes very important. For example, in DCGAN, learning proceeds using Adversarial Loss that appears in relation to the Discriminator, but in the case of pix2pix, in addition to Adversarial Loss, the difference between the fake image and the real image (for example, L1 loss or MSE) is also used. Learning is progressing. By doing so, learning progresses faster than other GANs, and the results are more stable. On the contrary, in the case of a method of learning using only L1-loss, in order to reduce the loss, the output is vague as a whole, or the whole is painted solid with average pixels. It tends to feel like it's sloppy, and by adding Adversarial Loss, learning tends to progress so that even if the L1 loss may be rather large, it can output a more realistic image. The principle that there is a trade-off between perceptual quality and distortion is one of the very important factors in thinking about GAN.

Reference: The Perception-Distortion Tradeoff (2017) [arXiv]

PatchGAN The following is the original paper of pix2pix, but I think it is better to refer to here for details on PatchGAN.

Image-to-Image Translation with Conditional Adversarial Networks (2016) [arXiv]

In PatchGAN, when Discriminator judges the correctness of an image, it is divided into several areas and the correctness judgment is made in each area.

france.jpg

スクリーンショット 2020-05-26 15.57.00.png

Dividing the area does not mean that you actually divide the image and plunge it into the Discriminator separately. In theory, that's the case, but in terms of implementation, one image is inserted into the Discriminator and its output is a tensor on the second floor. At that time, the value of each pixel of the tensor is derived based on the information of the patch area of the input image, and as a result, the value of each pixel of the tensor is between the actual True or False (1 or 0). Patch GAN is realized by taking the Loss of. I don't think it's easy to explain in words, so I'll give an example with the French flag above.


fig, axes = plt.subplots(1,2)

#Loading images
img = cv2.imread('france.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (600, 300))   #original size-> (300, 600, 3)
axes[0].imshow(img)
axes[0].set_xticks([])
axes[0].set_yticks([])

img = ToTensor()(img)   # (300, 600, 3) -> (3, 300, 600)
img = img.view(1, 3, 300, 600)   # (3, 300, 600) -> (1, 3, 300, 600)
img = nn.AvgPool2d(100)(img)   # (1, 3, 300, 600) -> (1, 3, 3, 6)
img = nn.Conv2d(3, 1, kernel_size=1)(img)   # (1, 3, 3, 6) -> (1, 1, 3, 6)
img = np.squeeze(img.detach().numpy())

axes[1].imshow(img, cmap='gray')
axes[1].set_xticks([])
axes[1].set_yticks([])

[result]

download.png

In the above, the input image is dropped into a feature map of (3, 6) size, but this is nothing but compressing the entire patch area of the original image to (1, 1). In the above example, the authenticity is judged for 3 × 6 = 18 pixels.

Unet pix2pix uses U-net as the Generator. U-net, which is the same in Segmentation, has an Encoder-Decoder structure and Skip-Connection to prevent loss of input data information as much as possible. The following is the forward method of Generator used in the experiment, but you can see that the original information is concatenated each time with torch.cat.

    def forward(self, x):
        x1 = self.enc1(x)
        x2 = self.enc2(x1)
        x3 = self.enc3(x2)
        x4 = self.enc4(x3)
        out = self.dec1(x4)
        out = self.dec2(torch.cat([out, x3], dim=1))
        out = self.dec3(torch.cat([out, x2], dim=1))
        out = self.dec4(torch.cat([out, x1], dim=1))
        return out

By the way, the image of U-net below is an image slide related to machine learning released free of charge by Google, which was recently mentioned on twitter. [ML Visuals] There are many pictures that look good, so please take a look.

ML Visuals by dair.ai.jpg

About this dataset

Google PhotoScan Use MIT-Adobe FiveK Dataset as training data. This is often used in image enhance papers, and it is a set of pre-processed image and post-processed image by a professional editor, but this time we will use this processed image. Regarding the processed photos, there are many photos with brighter colors than ordinary color photos, so I thought that it would be suitable for this task as well. If the version has a small data size, the amount of data is not large and the download is reasonable to some extent. This is an actual black-and-white photo, but this was converted into data using an iPhone app called "Google PhotoScan". This app will come out in detail if you google it, but it is quite excellent and it will convert it into data quite beautifully without going to a photo shop (and in an instant). The original photo was quite old and yellowed, but when I converted it to a black-and-white image, it didn't look much different from a normal black-and-white image.

Image size

The image in fivek and the image you actually want to colorize are not square but rectangular, and the aspect ratios are different. Therefore, I decided to adopt one of the following four patterns.

  1. Resize to a uniform square
  2. Reisize the square without changing the original aspect ratio, and fill the blank area with black.
  3. Put Reisize in a square without changing the original aspect ratio, and put the information of the original image in the blank area.
  4. Rotate the portrait image, convert it to landscape, and then resize it to the same rectangular size.
  5. Crop into a square

For the time being, I decided to use method 3 this time. Regarding 1 and 4, the reason for adoption is that there is a possibility that the characteristics of the image will change significantly, and for 2 and 3, I thought that 3 would have more information. Of course, if you go straight, it might be number 5, but I don't like square crops until the actual photo. The output result will be cropped to the original aspect ratio in post-processing.

スクリーンショット 2020-05-26 17.43.00.png

Start learning

Part 1

Learning part 1

bce_loss.png l1_loss.png

You can see how the Discriminator gradually becomes stronger, starting from the place where it tends to rampage in the early stages, and the loss of the Generator increases. As for L1 loss, I'm not sure if it's decreasing. (The difficulty of evaluating GAN is that `L1loss is small = close to real color ≠ real".)

Part 2

Learning part 2. In Learning 1, the Discriminator tended to be stronger, so I decided to experiment by changing the following points.

- D,And G's Adversarial Loss changed from BCE to Hinge Loss
-Change D's Batch Normalization to Instance Normalization
-Halve the weight update frequency of D (half of G)

Cross entropy loss and Hinge loss

In Part 2, the change from Part 1 is to change the loss function from the binary cross entropy used in Part 1 to hinge loss. Roughly speaking, the aim is to make a weak loss to prevent one (D or G) network from becoming too strong. The following is the cross entropy (D version) of the iron plate.

D:loss = -\sum_{i=1}^{n}\bigl(t_ilogy_i - (1-t_i) log(1-y_i) \bigl)

In short, when target = 1, the output should be as high as possible (since the sigmoid is in the final layer, the larger it is, the closer it is to 1). When target = 0, if you train to increase the output in the negative direction, the loss will be smaller.

On the contrary, for G, we will train to maximize the above equation. Furthermore, in the case of G, only the fake image generated by itself is evaluated, so the left term of the above formula disappears and it becomes simpler.

G:loss = \sum_{i=1}^{n}log\Bigl(1 - D\bigl(G(z_i))\bigl) \Bigl)(Maximize)\\
= \sum_{i=1}^{n}log\Bigl(D\bigl(G(z_i)) -  1\bigl) \Bigl)(Minimize)\\
=  -\sum_{i=1}^{n}log\Bigl(D\bigl(G(z_i))\bigl) \Bigl) (It is also possible to think that this is minimized)

The above shows two patterns of optimization, but it seems that there are both implementation methods.

On the other hand, regarding hinge loss, I think this site is easy to understand. [Reference] The site says that it is not often used for anything other than SVM, but it is interesting that it is currently used for other methods. In cross entropy, target is expressed as (0,1), but in hinge, it is expressed as (-1,1).

t = ±1 \\
Loss = max(0, 1-t*y)

As for the formula, if you output a smaller output when target = -1, and conversely, output a larger output when target = 1, the loss will be small. However, unlike cross entropy, you can also see that the loss is cut off to 0 to some extent. In the case of cross entropy, the loss will never disappear unless it is completely predicted by (0,1), but this is not the case with hinges. This is the reason why it is called ``` weak loss` ``. The implementation of PyTorch is as follows. At the hinge, it is troublesome to divide the pattern by D and G, so I think it is better to classify them together.

# ones:Patch with all values 1
# zeros:Patch with all values 0

# Gloss(BCE)
loss = torch.nn.BCEWithLogitsLoss()(d_out_fake, ones)

# Gloss(Hinge)
loss = -1 * torch.mean(d_out_fake)

# Dloss(BCE)
loss_real = torch.nn.BCEWithLogitsLoss()(d_out_real, ones)
loss_fake = torch.nn.BCEWithLogitsLoss()(d_out_fake, zeros)
loss = loss_reak + loss_fake

# Dloss(Hinge)
loss_real = -1 * torch.mean(torch.min(d_out_real-1, zeros))
loss_fake = -1 * torch.mean(torch.min(-d_out_fake-1, zeros))
loss = loss_reak + loss_fake

Instance Normalization Instance Normalization is a derivative of Batch Normalization. The contents including Batch Normalization are well organized in the following articles.

[GIF] Explanation from CNN for beginners to batch normalization and friends

Batch Normalization performs standardization processing between the same channels of data contained in one mini-batch, but Instance Normalization performs not the entire mini-batch but the data alone. The point is Batch Normalization with batch size 1. For example, it is also used in pix2pix HD, which is a derivative of pix2pix, but its purpose is to make it difficult to converge learning by suppressing the increase in gradient. The main purpose is to balance D and G by applying this to D.

Below are the results. You can see that the convergence of Discriminator is clearly slower than before. I also feel that the decrease in L1 loss is greater than before. adv_loss.png l1_loss.png

Part 3

Learning # 3 changed the following points from Learning 1.

-Basically follow learning 1
-Align the learning rate with the original paper.(1e-4 -> 2e-4)
-Removed learning rate adjustment using Scheduler
-Image size changed from 320 to 256 (following the paper)
-Changed the number of PatchGAN areas from 10x10 to 4x4
-Removed Blur in augmentation on train

However, the result was almost the same as 1.

Part 4

Looking at the results so far, we can see the tendency that natural landscapes can be colored relatively, but people do not work at all. This is meaningless for the original purpose, so I decided to get the data myself again. As for how to collect data, I wrote in the previous Qiita article, but I used BingImageSearch. [Qiita article link] In addition, although it is based on Part 2, it has been slightly tampered with. Also, since the number of data has tripled (more than 13,000), 200 epoch has been reduced to 100 epoch.

#Changes from Learning 1
-Add training data (fivek)-> fivek+Person image)
-200 epoch numbers-> 97
- D,Changed Adversarial Loss from BCE to Hinge Loss(With part 2)
-Change D's Batch Normalization to Instance Normalization(With part 2)
-Halve the weight update frequency of D(With part 2)
-Removed learning rate adjustment using Scheduler(With part 3)
-Learning rate 1e-4->2e-Changed to 4 (with part 3)

#The reason why the number of pixels of the output was a little unsatisfactory may not be really good
-Change image size(320->352) (<-new)
-Along with the above, the number of PatchGAN areas(10,10)->(11,11)change to (<-new)

Below are the results. Compared to Part 2, the vertical movement at the end is large, probably because the scheduler was deleted. adv_loss.png l1_loss.png

Summary. And the result of the important photo

Through various trials and errors, landscape images can be colored without discomfort at a considerable rate, while portrait images and flashy contrast images (eg people, clothes, flowers, man-made objects, etc.) are not so colored. I found a tendency to not proceed.

(Left: fake image Right: real image) 000062.png 000133.png 000200.png 000219.png 000314.png 000324.png 000331.png 000372.png 000552.png 000553.png 000684.png 000758.png 000771.png 000909.png 001081.png 001179.png 001242.png 001494.png 001554.png 002079.png 002330.png 002377.png 002480.png On the contrary, I feel that this is mysterious. It's interesting to look at. Also, it seemed that my photo for the actual production was also colorized firmly (laughs). .. ..

(Finally considered) Although it is my test image (father's photo), the result is more uneven than the learning & verification data. Probably, I think that there is probably a resolution problem in the first place. The above photo is still good, and even if other black-and-white photos are converted to data, the outline can be seen when enlarged, but the resolution is uneven in small areas and the result is disappointing. So, if I want to do it in earnest, I will do the super-resolution task in parallel. Alternatively, it may be necessary to take measures such as blurring the learning data. I will do it next time (planned).

Recommended Posts

I want to color black-and-white photos of memories with GAN
I want to do ○○ with Pandas
I want to debug with Python
I want to output the beginning of the next month with Python
I want to check the position of my face with OpenCV!
I want to detect objects with OpenCV
I want to blog with Jupyter Notebook
I want to pip install with PythonAnywhere
I want to analyze logs with Python
I want to play with aws with python
I want to express my feelings with the lyrics of Mr. Children
I want to color a part of an Excel string in Python
I want to stop the automatic deletion of the tmp area with RHEL7
I want to use MATLAB feval with python
I tried to move GAN (mnist) with keras
I want to mock datetime.datetime.now () even with pytest!
I want to display multiple images with matplotlib.
I want to knock 100 data sciences with Colaboratory
I want to make a game with Python
I want to be an OREMO with setParam!
I want to analyze songs with Spotify API 1
I want to use Temporary Directory with Python2
I want to get League of Legends data ③
I want to get League of Legends data ②
I don't want to use -inf with np.log
#Unresolved I want to compile gobject-introspection with Python3
I want to use ip vrf with SONiC
I want to solve APG4b with Python (Chapter 2)
I want to start over with Django's Migrate
I want to customize the appearance of zabbix
I want to get League of Legends data ①
I want to write to a file with Python
I want to display only different lines of a text file with diff
I want to send Gmail with Python, but I can't because of an error
I want to convert an image to WebP with lollipop
I want to detect unauthorized login to facebook with Jubatus (1)
I tried to extract features with SIFT of OpenCV
I want to transition with a button in flask
I want to grep the execution result of strace
I want to handle optimization with python and cplex
I want to climb a mountain with reinforcement learning
I want to inherit to the back with python dataclass
I want to fully understand the basics of Bokeh
I want to work with a robot in python.
I want to split a character string with hiragana
I want to install a package of Php Redis
I want to AWS Lambda with Python on Mac!
I want to manually create a legend with matplotlib
[TensorFlow] I want to process windows with Ragged Tensor
[ML Ops] I want to do multi-project with Python
I want to run a quantum computer with Python
I want to bind a local variable with lambda
I want to increase the security of ssh connections
I want to plot the location information of GTFS Realtime on Jupyter! (With balloon)
I tried to find the entropy of the image with python
I want to be able to analyze data with Python (Part 3)
I want to remove Python's Unresolved Import Warning with vsCode
I want to use R functions easily with ipython notebook
I tried to find the average of the sequence with TensorFlow
I want to be able to analyze data with Python (Part 1)
I want to make a blog editor with django admin