When dealing with tensors, I often get confused as to "that? Which foot is where?" So, based on Low-rank approximation of images by HOSVD, let's proceed with the steps while checking each one.
Below, assuming practical training, I will write the process in persistent detail.
Below, for convenience of display, it is written in Python interactive mode, but in reality it is more convenient to do it with ipython.
First, let's import the required library.
>>> import numpy as np
>>> from scipy import linalg
>>> from PIL import Image
Next, load an appropriate image. I just had a reasonably sized image at hand, so let's use it.
This is the Itanum 2 spirit horse.
Let's read this and check the size.
>>> img = Image.open("uma.jpg ")
>>> img.size
(480, 340)
It is 480X340 pixel image data. For later, drop the height and width into variables.
>>> w, h = img.size
Now, let's make the image data an array of Numpy.
>>> X = np.asarray(img)
>>> X.shape
(340, 480, 3)
Note that the order is "height, width, 3 (RGB)".
I'm going to do SVD from now on, but I need to bring the foot to do SVD to the far right and make the tensor on the 3rd floor into a matrix. Use transpose to swap legs.
Check the dimension of X again.
>>> X.shape
(340, 480, 3)
This is now "height, width, color", but first I want to crush my legs about width, so I have to order "height, color, width". Since the order of "height, width, color" is (0,1,2), the target order is (0,2,1).
>>> X.transpose(0,2,1).shape
(340, 3, 480)
The order has changed. This is further combined into a matrix with the two left legs.
>>> X.transpose(0,2,1).reshape(h*3,w).shape
(1020, 480)
You have successfully created a matrix with dimensions (height and color, width). Let's save this as X1.
>>> X1 = X.transpose(0,2,1).reshape(h*3,w)
Similarly, I want to create a matrix with a height dimension on the far right. Look at the X dimension again.
>>> X.shape
(340, 480, 3)
I want this to be (480, 3, 340) [^ rgb]. For that purpose, the order should be (1,2,0).
>>> X.transpose(1,2,0).shape
(480, 3, 340)
Similarly, put the two left legs together in a matrix and save this in X2.
>>> X2 = X.transpose(1,2,0).reshape(w*3,h)
>>> X2.shape
(1440, 340)
The dimension of the right foot is the height. At this point, the SVD is ready.
First, SVD X1. Let's check each size further.
>>> U, s, A1 = linalg.svd(X1)
>>> U.shape
(1020, 1020)
>>> s.shape
(480,)
>>> A1.shape
(480, 480)
Now, compress the information of A1 to 5%. Originally it was 480 lines, so only the top 24 lines are fetched. Let's call this a1.
>>> a1 = A1[:24, :]
>>> a1.shape
(24, 480)
Similarly, SVD X2 and fetch only the top 17 lines (5% of 340) of A2.
>>> U, s, A2 = linalg.svd(X2)
>>> A2.shape
(340, 340)
>>> a2 = A2[:17,:]
>>> a2.shape
(17, 340)
Taking the transpose of a1 and a2 obtained here, we get a compression matrix that crushes the width and height.
>>> a1.T.shape
(480, 24)
>>> a2.T.shape
(340, 17)
It can be seen that the matrix is crushed into 480 → 24 and 340 → 17, respectively. The core tensor is obtained by taking the inner product of this with the appropriate foot of X.
First, multiply X by a1.T. You can tell which leg and which leg to crush by looking at the shape.
>>> X.shape
(340, 480, 3)
>>> a1.T.shape
(480, 24)
With the leftmost as the 0th, crush the 1st leg of X and the 0th leg of a1.T. Substitute this for X3.
>>> X3 = np.tensordot(X, a1.T, (1,0))
>>> X3.shape
(340, 3, 24)
It was successfully compressed from 480 to 24. What should be noted here is the order of the legs after the tensor dot. Until you get used to it, you should check the shape of which foot of the new tensor each time.
Next, apply a2.T to the foot that represents the height of X3 and compress it. Find out which one should be crushed again.
>>> X3.shape
(340, 3, 24)
>>> a2.T.shape
(340, 17)
If all the feet are different in size, it is convenient to know immediately that you should crush the same dimension. In this case, you can crush the numbers 0, so
>>> x = np.tensordot(X3,a2.T,(0,0))
>>> x.shape
(3, 24, 17)
We have successfully obtained a core tensor x with reduced dimensions of X.
Finally, X was compressed into a width compression / restoration matrix a1, a height compression / restoration matrix a2, and a core tensor x. Let's examine this compression ratio.
>>> float(x.size + a1.size + a2.size)/X.size
0.03783496732026144
So, we were able to compress up to about 3.8% of the original data.
Now, I have a core tensor x and width and height reconstruction matrices a1 and a2. If you multiply the core tensor by a1 and a2, you will get the approximate tensor Y of the original tensor, so let's make it.
First, check the dimensions.
>>> x.shape
(3, 24, 17)
>>> a1.shape
(24, 480)
Because you only have to crush the 1st of x and the 0th of a1
>>> x2 = np.tensordot(x,a1,(1,0))
>>> x2.shape
(3, 17, 480)
The width has been restored from 24 to 480. Similarly, let x3 be the one whose height is restored.
>>> x3 = np.tensordot(x2,a2,(1,0))
>>> x3.shape
(3, 480, 340)
However, the order of the legs is different as it is. Take a look at X again.
>>> X.shape
(340, 480, 3)
If you compare it with X, you can see that the legs should be in the order of (2,1,0), so
>>> Y = x3.transpose(2,1,0)
>>> Y.shape
(340, 480, 3)
This is the tensor restored from the core tensor x using the restoration matrices a1 and a2. Now that we have the same dimensions as the original data, we can create an Image object. However, it must be uint8 type before it can be inserted into ʻImage.fromarray`.
>>> img2 = Image.fromarray(np.uint8(Y))
>>> img2.save("approximated.jpg ")
The restored image created in this way looks like this.
Well, I feel that I was able to restore it from the data compressed to 3.8% without using the information that "the original is an image" at all.
I tried to compress and restore the image using HOSVD (Higher Order Singular Value Decomposition). IPython is very convenient because it works like tab completion. After crushing the tensor's leg, it is easy to get confused as to what number is the leg that represents what, but I feel that mistakes / confusion will be reduced if you work while checking the dimensions with each shape.
Python is convenient, isn't it ... [^ ruby].
[^ rgb]: Here, the reason why the height and color of the legs are not exchanged in the order of (3,480,340) is that the summarized foot data is lined up with (R, G, B, R, G, B ...). I want you to. In the order of (3,480,340), the data will be arranged in (RRR ..., GGG ...., BBB ...), which is different from the order of X1.
[^ ruby]: For the time being, my favorite language is Ruby, but the library is rich enough that Python has a long day ...
Recommended Posts