GAN: Content related to hostile generation networks. The model in GAN does not necessarily converge to an image that is indistinguishable from the real thing by training. The reason why the training does not proceed is the instability of gradient disappearance and mode collapse.
It is said that it is important to control the Lipschitz continuity and Lipschitz constant of the Discriminator for this instability. Spectral Normalization is useful for eliminating this instability.
Well, there are some words I don't understand. This time, I would like to summarize the contents of my own interpretation of these meanings.
Here is the book that I used as a reference this time as well.
I wrote a book to learn about deep learning and the latest GAN circumstances from Inpainting. https://qiita.com/koshian2/items/aefbe4b26a7a235b5a5e
The function $ f (x) $ is Lipschitz continuous for any $ x_1 $, $ x_2 $.
|\frac{f(x_1)-f(x_2)}{x_1-x_2}| \leq k formula 1
It means that there is a constant $ k $ that satisfies. This $ k $ is called the Lipschitz constant.
Now, before proceeding with the content of Lipschitz continuity, I would like to look back on the continuity of functions. If the function is simply continuous, it is as follows. What is continuous with $ x = x_0 $?
\lim_{x \to x_0} f(x) = f(x_0)Equation 2\\
It means that is established. And $ f (x) $ is a continuous function when it is continuous at all points of interest.
For example, the following example is a continuous function and not a continuous function.
I think it's easy to understand intuitively.
On the other hand, Lipschitz continuity is a function in which $ k $ that satisfies the above equation 1 exists.
In the figure above, if you draw a straight line with a slope of $ ± k $ at any point on the function, the state of the function graph is called Lipschitz continuity. Take $ y = x $ as an example. Equation 1
|\frac{f(x_1)-f(x_2)}{x_1-x_2}| \leq k \\
\Rightarrow 1\leq k
It will be. Therefore, if the value of $ k $ is 0.01, etc., the formula will not hold, and this function cannot be said to be Lipschitz continuous. Therefore, the fact that the function is continuous and that it is Lipschitz continuous
Lipschitz continuous\in continuous
It becomes a form that the continuation embraces.
In GAN, it is a rule of thumb that it is usually said that setting a constraint of $ k = 1 $ enhances stability.
Reference URL https://mathwords.net/lipschitz
Next, we will explain singular value decomposition. This singular value decomposition is an operation in a matrix, which is necessary for Spectral Normalization below, so it is summarized here.
Singular value decomposition means that for any $ m × n $ matrix $ A $, the orthogonal matrix $ U, V $ where $ A = UΣV $ and the off-diagonal component are 0, and the diagonal component is non-negative and large. It is divided by the matrix $ Σ $ arranged in the order of. And this $ Σ $ component is called a singular value. Please refer to the following pdf for how to find $ U, V, Σ $.
http://www.cfme.chiba-u.jp/~haneishi/class/iyogazokougaku/SVD.pdf
Now, in Python, these singular value decompositions can be easily obtained.
SN.ipynb
import numpy as np
data = np.array([[1,2,3,4],[3,4,5,6]])
U, S, V = np.linalg.svd(data)
print(U)
print(S)
print(V)
[[-0.50566621 -0.86272921]
[-0.86272921 0.50566621]]
[10.73807223 0.8329495 ] #Singular value
[[-0.28812004 -0.41555404 -0.54298803 -0.67042202]
[ 0.7854851 0.35681206 -0.07186099 -0.50053403]
[-0.40008743 0.25463292 0.69099646 -0.54554195]
[-0.37407225 0.79697056 -0.47172438 0.04882607]]
In this way, the singular value was confirmed to be [10.73807223 0.8329495]. You can see that the maximum singular value is about 10.74.
Reference URL https://thinkit.co.jp/article/16884
Now, about this last Spectral Normalization. A method called Batch Normalization (hereinafter referred to as Batch Norm) is famous for creating layers of neural networks. This Batch Norm is a method proposed in 2015. It is a layer that is incorporated after the fully connected layer and the convolution layer. The effects are as follows.
The processing is as follows.
As a mini-batch, $ x_1, x_2 ・ ・ ・ $ m $ of x_m $ For this input data, the average $ μB $ and the variance $ σ_B ^ 2 $ are calculated.
Batch Norm can enjoy these effects, but it is cited as a factor that impairs continuity when it comes to learning GAN. As you can see from the above formula, Batch Norm is a fractional function because it is divided by the standard deviation. It can be understood that the fractional function loses continuity because it is not continuous at $ x = 0 $.
Therefore, Spectral Normalization is the solution to this problem.
Spectral Normalization for Generative Adversarial Networks https://arxiv.org/abs/1802.05957
This is an author by a Japanese person and was announced by the people of Preferred Networks, Inc. Spectral Normalization is the idea of dividing the coefficient by the maximum singular value. You can ensure Lipschitz continuity and control the Lipschitz constant to be 1 for your model. To find this maximum singular value, use the above singular value decomposition.
It is very easy to implement. When using tensorflow, it can be implemented by specifying with ConvSN2D like the solution.
SN.ipynb
import tensorflow as tf
from inpainting_layers import ConvSN2D
inputs = tf.random.normal((16, 256, 256, 3))
x = ConvSN2D(64,3,padding='same')(inputs)
print(x.shape)
Now, this is a method to find the singular value, but if the svd method is applied as it is, the amount of calculation will be enormous, so we will use an algorithm called the power method.
The maximum singular value in the $ (N, M) $ matrix $ X $ is
Estimate + $ V = L_2 (UX ^ T) $. However, $ L2 = x / \ sqrt (Σx_ {i, j}) + ε
When implemented, it will be as follows. The original data matrix is the one used above.
python
results = []
for p in range(1, 6):
U = np.random.randn(1, data.shape[1])
for i in range(p):
V = l2_normalize(np.dot(U, data.T))
U = l2_normalize(np.dot(V, data))
sigma = np.dot(np.dot(V, data), U.T)
results.append(sigma.flatten())
plt.plot(np.arange(1, 6), results)
plt.ylim([10, 11])
Well, around 10.74, I got the same result as before. In this way, it is required for implementation.
This time, we have summarized the contents related to Spectral Normalization. Although I grasped the general flow, I still lacked understanding of mathematical aspects. I would like to deepen my understanding as I continue to implement it.
The program is stored here. https://github.com/Fumio-eisan/SN_20200404
Recommended Posts