Do you do singular value decomposition? > Greeting
At my current workplace, everyone announces at a weekly meeting, but it's almost like "I did SVD like this", so [I also tried SVD](http://qiita.com/kaityo256/items/ 48de63526b469235d16a), but it is a continuation.
The source code is https://github.com/kaityo256/image_svd It is in.
First of all, I'm not good at tensors, or linear algebra in general [^ denki]. Therefore, we cannot guarantee the accuracy of the following information. I'm going to be a memorandum because I'm confused with "What? Which leg are you crushing?"
Also, the [Conjugate Transpose] of the matrix $ A $ (https://ja.wikipedia.org/wiki/%E9%9A%8F%E4%BC%B4%E8%A1%8C%E5%88%97) ( We will transpose (the complex conjugate of each component)) as $ A ^ \ dagger $.
[^ denki]: That's why I've dropped the electromagnetic unit.
First, consider the matrix multiplication. By multiplying the matrix of m rows and n columns and the matrix of n rows and k columns, a matrix of m rows and k columns is created.
\begin{align}
C &= A B \\
C_{ik} &= A_{ij} B_{jk}
\end{align}
The above formula uses the Einstein notation. At this time, the leg in the middle is crushed. It looks like this when illustrated.
Now consider compressing the information in a matrix. That is equivalent to reducing the dimensions of the matrix. First, consider approximating, for example, the $ m $ row $ n $ matrix $ X $ as $ m $ row $ n'$ column $ (n'<n) $. Consider that there is a $ n $ row $ n'$ column matrix $ P $ that acts as its "compressor".
Multiply $ X $ by $ P $ to get $ \ chi $ with the number of columns reduced from $ n $ to $ n'$. Needless to say, it looks like this.
The height remains the same, but the width has become narrower.
Multiply $ P ^ \ dagger $ to revert from compressed $ \ chi $ to the original $ X $.
Then, the compressed matrix $ \ chi $ becomes a matrix of $ m $ row $ n $ column, but it does not return to $ X $ due to lack of information, and the matrix $ that approximates $ X $ It will be X'$.
In short, $ P $ is information compression, $ P ^ \ dagger $ is a matrix of information restoration, and the image is like Doraemon's tool "Gulliver Tunnel". However, in this case, it is irreversible ...
Now, such an information compression / decompression matrix $ P $ can be created by Singular Value Decomposition (SVD).
SVD is a matrix $ X $ in $ m $ row $ n $ column,
X = U \Sigma V^{\dagger}
It is disassembled into the form. However, $ U $ is a unitary matrix of $ m $ rows and $ m $ columns, $ V $ is a unitary matrix of $ n $ rows and $ n $ columns, and $ \ Sigma $ is a unitary matrix of $ m $ rows and $ n $ columns that have only diagonal components. It is a matrix. This diagonal component is called a singular value.
The $ U $ and $ V $ created in this way are information compressors. If you do SVD with Python etc., they will be arranged in the order of the size of the singular value. For example, the matrix $ v ^ \ dagger $ created by taking $ n'$ rows from the top of $ V ^ \ dagger $ can be used as a restorer, and its conjugate matrix $ v $ can be used as a compressor.
Similarly, the compressor is the one that takes the $ m'$ column from the left of $ U $, and the conjugate matrix is the restorer.
In the above example we compressed the matrix, but more generally you may want to compress a high rank (full leg) tensor. In machine learning, it seems that data is considered to be a tensor, compressed as preprocessing, and then subjected to learning. The Tucker decomposition is used to compress this tensor.
For example, consider a tensor $ X $ with three legs. Let's assume that the dimensions of each foot are $ m, n, k $. Consider approximating this with a tensor $ χ $ with legs of the smaller dimensions $ m', n', k'$.
At this time, the foot with the dimension of $ m $ must be crushed to $ m'$. Do this with SVD as you would with a matrix.
First, put together the legs other than the ones you want to crush.
We just regarded the tensor, which was originally the third floor of $ m, n, k $, as the second floor tensor of $ m, n * k $, that is, a matrix. Then since this is a matrix, SVD can be used.
X = U \Sigma V^\dagger
Here, since $ V $ is a unitary matrix, $ V V ^ \ dagger = I $, that is, the identity operator.
However, if the matrix that takes the upper $ m'$ column of $ V ^ \ dagger $ is $ v ^ \ dagger $, $ v $ will be compressed from $ m $ to $ m'$, and $ v ^ \ dagger $ is a restorer from $ m'$ to $ m $.
If you make this compressor for each leg and then run it, you will get a tensor $ \ chi $ which is a reduction of the original tensor $ X $.
The reduced tensor thus obtained is called a core tensor. To restore the original tensor from the core tensor, you can apply a restorer to each foot.
It seems that such decomposition is called Higher Order SVD, HOSVD.
Now let's compress the image using SVD.
Two-dimensional image data can be regarded as a third-order tensor because the values are determined by specifying the x-coordinate, y-coordinate, and color. If the width is $ w $ and the height is $ h $, the dimensions (thickness) of each foot are $ w, h, c $. However, c is color and the dimension is 3. If you read the data obediently, the order is height, width, and color, so we will express this as X [y, x, c]
.
The easiest way to compress this with SVD is to apply SVD to each of the RGB planes as you did last time.
img = Image.open(filename)
w = img.width
h = img.height
X = np.asarray(img)
r = perform_svd(X[:,:,0],rank).reshape(w*h)
g = perform_svd(X[:,:,1],rank).reshape(w*h)
b = perform_svd(X[:,:,2],rank).reshape(w*h)
B = np.asarray([r,g,b]).transpose(1,0).reshape(h,w,3)
If you open a file with PIL's Image and tuck it into numpy.asarray
, you'll get a (height, width, color) 3rd floor tensor X
. For example, if you use X [:,:, 0]
, you can get a red image, so approximate it by SVD. Similarly, after creating an approximate image of green and blue, you can restore the original image by appropriately transpose
and reshape
. [^ 1]
[^ 1]: To explain properly, perform_svd
is a function that returns as a one-dimensional array, so r
is a one-dimensional vector of size w * h
. If you attach 3 of them with [r, g, b]
and put them in numpy.asarray
, it becomes a second-order tensor of (c, h * w), so replace it with tranpose
(h * w, Set it to c) and set it to (h, w, c) with repshape
to complete.
Next, let's use HOSVD. It is the same up to the point where the image is changed to the tensor X
of (h, w, c). First, in order to get a matrix for crushing the width and height, let's create a matrix X1
that summarizes the height and color legs and a matrix X2
that summarizes the width and color legs.
X1 = X.transpose(0,2,1).reshape(h*3,w)
X2 = X.transpose(1,2,0).reshape(w*3,h)
The order of the legs is changed from (h, w, c) to (hc, w) and (wc, h), respectively. SVD each of these.
U,s,A1 = linalg.svd(X1)
U,s,A2 = linalg.svd(X2)
Restoration machines ʻa1 and ʻa2
can be made by taking the upper $ r $ column of the obtained ʻA1 and ʻA2
.
a1 = A1[:r2, :]
a2 = A2[:r2, :]
This time, these are execution columns, so if you transpose them, you can get the dagger. After that, ʻa1.T` is applied to crush it, and then $ a1 $ is applied to restore it, but since it is troublesome, a matrix (projector) that performs "compression / restoration" at once is created first.
pa1 = a1.T.dot(a1)
pa2 = a2.T.dot(a2)
You can get an approximate tensor by tensordot
the resulting projector with the appropriate foot.
X2 = np.tensordot(X,pa1,(1,0))
X3 = np.tensordot(X2,pa2,(0,0))
X4 = X3.transpose(2,1,0)
Here, only the x-coordinated foot and the y-coordinated foot are crushed, and the colored foot is not crushed. The above code
https://github.com/kaityo256/image_svd
It is implemented in color_tucker.
Let's compare SVD by thinking that each RGB plane is a matrix and applying SVD twice by combining three legs into two (HOSVD). First, find out how much information was compressed in each. Consider a color image of $ L_1 \ times L_2 $. Approximate this with ranks up to $ r $.
When SVDing each of the RGB planes, the images of each color are in a matrix of $ L_1 \ times L_2 $, which is compressed into $ L_1 r $, $ L_2 r $, and r singular values. Since it has 3 colors, the information of $ 3L_1 L_2 $ is compressed to $ 3r (L_1 + L_2 + r) $. $ L_1, L_2 \ gg r $ is $ 3r (L_1 + L_2) $.
For HOSVD, the $ L_1 \ times L_2 \ times 3 $ tensor is compressed into a $ r \ times r \ times 3 $ core tensor and a $ L_1 \ times r $, $ L_2 \ times r $ matrix. Therefore, the amount of data after compression is $ r (L_1 + L_2 + 3r) \ sim r (L_1 + L_2) $. It will be. Therefore, in order to compress to the same amount of information as when approximating with ranks up to $ r $ on the RGB plane, it is sufficient to take up to $ 3r $ on HOSVD.
Below is a comparison with appropriate images.
The original image
The top 10 ranks of each RGB plane
HOSVD, the top 30 projectors with x and y coordinates.
It should leave the same amount of information, but HOSVD seems to be better.
I thought that the image data was a 3rd floor tensor with 3 legs of height, width, and color, so I decomposed it with Tucker and made an approximate image. When comparing with the same amount of information, it seems that HOSVD, which SVDs the legs together, is better than SVDing each RGB plane. Also, this time I did SVD only once for each foot, but there seems to be a higher order orthogonal iteration (HOOI) that improves the approximation accuracy by repeating this.
Recommended Posts