Introduction

GAN: Content related to hostile generation networks. The model in GAN does not necessarily converge to an image that is indistinguishable from the real thing by training. The reason why the training does not proceed is the instability of gradient disappearance and mode collapse.

It is said that it is important to control the Lipschitz continuity and Lipschitz constant of the Discriminator for this instability. Spectral Normalization is useful for eliminating this instability.

Well, there are some words I don't understand. This time, I would like to summarize the contents of my own interpretation of these meanings.

Here is the book that I used as a reference this time as well.

I wrote a book to learn about deep learning and the latest GAN circumstances from Inpainting. https://qiita.com/koshian2/items/aefbe4b26a7a235b5a5e

What is Lipschitz continuity and Lipschitz function?
What is singular value decomposition?
What is Spectral Normalization?

What is Lipschitz continuity and Lipschitz function?

The function $ f (x) $ is Lipschitz continuous for any $ x_1 $, $ x_2 $.

|\frac{f(x_1)-f(x_2)}{x_1-x_2}|  \leq k formula 1

It means that there is a constant $ k $ that satisfies. This $ k $ is called the Lipschitz constant.

Now, before proceeding with the content of Lipschitz continuity, I would like to look back on the continuity of functions. If the function is simply continuous, it is as follows. What is continuous with $ x = x_0 $?

\lim_{x \to x_0} f(x) = f(x_0)Equation 2\\

It means that is established. And $ f (x) $ is a continuous function when it is continuous at all points of interest.

For example, the following example is a continuous function and not a continuous function.

I think it's easy to understand intuitively.

On the other hand, Lipschitz continuity is a function in which $ k $ that satisfies the above equation 1 exists.

In the figure above, if you draw a straight line with a slope of $ ± k $ at any point on the function, the state of the function graph is called Lipschitz continuity. Take $ y = x $ as an example. Equation 1

|\frac{f(x_1)-f(x_2)}{x_1-x_2}|  \leq k 　\\
\Rightarrow 1\leq　k

It will be. Therefore, if the value of $ k $ is 0.01, etc., the formula will not hold, and this function cannot be said to be Lipschitz continuous. Therefore, the fact that the function is continuous and that it is Lipschitz continuous

Lipschitz continuous\in continuous

It becomes a form that the continuation embraces.

In GAN, it is a rule of thumb that it is usually said that setting a constraint of $ k = 1 $ enhances stability.

Reference URL https://mathwords.net/lipschitz

What is singular value decomposition?

Next, we will explain singular value decomposition. This singular value decomposition is an operation in a matrix, which is necessary for Spectral Normalization below, so it is summarized here.

Singular value decomposition means that for any $ m × n $ matrix $ A $, the orthogonal matrix $ U, V $ where $ A = UΣV $ and the off-diagonal component are 0, and the diagonal component is non-negative and large. It is divided by the matrix $ Σ $ arranged in the order of. And this $ Σ $ component is called a singular value. Please refer to the following pdf for how to find $ U, V, Σ $.

http://www.cfme.chiba-u.jp/~haneishi/class/iyogazokougaku/SVD.pdf

Now, in Python, these singular value decompositions can be easily obtained.

`SN.ipynb`


import numpy as np
data = np.array([[1,2,3,4],[3,4,5,6]])
U, S, V = np.linalg.svd(data)
print(U)
print(S)
print(V)

[[-0.50566621 -0.86272921]
 [-0.86272921  0.50566621]]
[10.73807223  0.8329495 ]　#Singular value
[[-0.28812004 -0.41555404 -0.54298803 -0.67042202]　
 [ 0.7854851   0.35681206 -0.07186099 -0.50053403]
 [-0.40008743  0.25463292  0.69099646 -0.54554195]
 [-0.37407225  0.79697056 -0.47172438  0.04882607]]

In this way, the singular value was confirmed to be [10.73807223 0.8329495]. You can see that the maximum singular value is about 10.74.

Reference URL https://thinkit.co.jp/article/16884

What is Spectral Normalization?

Now, about this last Spectral Normalization. A method called Batch Normalization (hereinafter referred to as Batch Norm) is famous for creating layers of neural networks. This Batch Norm is a method proposed in 2015. It is a layer that is incorporated after the fully connected layer and the convolution layer. The effects are as follows.

Learning progresses quickly (= the value of the loss function tends to converge)
Less dependent on initial value (robust)
Suppression of overfitting

The processing is as follows.

As a mini-batch, $ x_1, x_2 ・・・ $ m $ of x_m $ For this input data, the average $ μB $ and the variance $ σ_B ^ 2 $ are calculated.

Batch Norm can enjoy these effects, but it is cited as a factor that impairs continuity when it comes to learning GAN. As you can see from the above formula, Batch Norm is a fractional function because it is divided by the standard deviation. It can be understood that the fractional function loses continuity because it is not continuous at $ x = 0 $.

Therefore, Spectral Normalization is the solution to this problem.

Spectral Normalization for Generative Adversarial Networks https://arxiv.org/abs/1802.05957

This is an author by a Japanese person and was announced by the people of Preferred Networks, Inc. Spectral Normalization is the idea of dividing the coefficient by the maximum singular value. You can ensure Lipschitz continuity and control the Lipschitz constant to be 1 for your model. To find this maximum singular value, use the above singular value decomposition.

It is very easy to implement. When using tensorflow, it can be implemented by specifying with ConvSN2D like the solution.

`SN.ipynb`


import tensorflow as tf
from inpainting_layers import ConvSN2D

inputs = tf.random.normal((16, 256, 256, 3))
x = ConvSN2D(64,3,padding='same')(inputs)

print(x.shape)

Now, this is a method to find the singular value, but if the svd method is applied as it is, the amount of calculation will be enormous, so we will use an algorithm called the power method.

The maximum singular value in the $ (N, M) $ matrix $ X $ is

Define a matrix $ U $ for $ U: (1, X) $. At that time, it is initialized as a normal random number.
Repeat the following P times

Estimate + $ V = L_2 (UX ^ T) $. However, $ L2 = x / \ sqrt (Σx_ {i, j}) + ε , ( ε $ is a minute amount) Estimate + $ U = L_2 (VX) $.

Estimate $ σ = VXT ^ T $. $ σ $ is the maximum singular value.

When implemented, it will be as follows. The original data matrix is the one used above.

`python`


results = []

for p in range(1, 6):
    U = np.random.randn(1, data.shape[1])
    for i in range(p):
        V = l2_normalize(np.dot(U, data.T))
        U = l2_normalize(np.dot(V, data))
    sigma = np.dot(np.dot(V, data), U.T)
    results.append(sigma.flatten())

plt.plot(np.arange(1, 6), results)
plt.ylim([10, 11])

Well, around 10.74, I got the same result as before. In this way, it is required for implementation.

At the end

This time, we have summarized the contents related to Spectral Normalization. Although I grasped the general flow, I still lacked understanding of mathematical aspects. I would like to deepen my understanding as I continue to implement it.

The program is stored here. https://github.com/Fumio-eisan/SN_20200404

[PYTHON] I tried hard to understand Spectral Normalization and singular value decomposition, which contribute to the stability of GAN.

Introduction

What is Lipschitz continuity and Lipschitz function?

What is singular value decomposition?

SN.ipynb

What is Spectral Normalization?

SN.ipynb

python

At the end

`SN.ipynb`

`SN.ipynb`

`python`