Decomposing a tensor into a set of low-ranked tensors (core tensors) and matrices is called Tucker decomposition. Among the methods, HOSVD (higher order singular value decomposition) and HOOI (higher orthogonal iteration of tensors) are used to perform low-rank approximation of images.
The source code is https://github.com/kaityo256/hooi_sample At.
For example, suppose you have a third-order tensor $ X $, and the dimensions of each foot are $ R_1, R_2, R_3 $. At this time, the third-order tensor $ \ chi $ whose foot dimensions are $ r_1, r_2, r_3 $ and the matrix $ A_i (i = 1,2,3) $ in the $ r_i $ row $ R_i $ column. Disassemble into. This matrix $ A_i $ changes its dimension from $ r_i $ to $ R_i $ when it is contracted with the $ i $ th bar of the core tensor, that is, $ A_i $ returns from the core tensor to the original tensor. It becomes a matrix. Conversely, if we take the $ i $ th foot of the original tensor and the contraction of $ A_i ^ \ dagger $, the dimension of the foot changes from $ R_i $ to $ r_i $, that is, $ A_i ^ \ dagger $ is projected. It is a matrix. It looks like this when each is illustrated.
First, the core tensor $ \ chi $ is obtained by multiplying each leg of the original tensor $ X $ by a projection matrix.
Also, the result of multiplying each leg of the core tensor $ \ chi $ by the reconstruction matrix is the tensor $ X'$, which is an approximation of the original tensor.
Originally, the amount of information in the tensor was $ R_0 R_1 R_2 $, but since the number of elements in the core tensor is $ r_0 r_1 r_2 $ and the number of elements in the $ A_i $ matrix is $ R_i r_i $, the number of elements after compression is It becomes $ r_0 r_1 r_2 + R_0 r_0 + R_1 r_1 + R_2 r_2 $.
The Tucker decomposition is the problem of finding the set $ A_i $ of the projection matrix that approximates the original tensor as accurately as possible given the dimensions of the original tensor $ X $ and the core tensor. Let's do this with two methods, HOSVD and HOOI, based on the low-rank approximation of the image.
HOSVD
In HOSVD (higher order singular value decomposition), the tensor with many legs is first divided into two legs, the one to pay attention to and the other to. Then, since this is a matrix, SVD (singular value decomposition) can be applied. The one that has taken the upper rank of the matrix on the right side after SVD is the restoration matrix of the foot to be noted. HOSVD is a procedure that repeats this for all legs.
In the figure above, we focus on the first leg of the three-legged tensor $ X $, and SVD the second and third legs together as a matrix of $ R_2 R_3 \ times R_1 $. .. Then, a square matrix of $ R_2 R_3 \ times R_2 R_3 $ appears on the left side and $ R_1 \ times R_1 $ appears on the right side, but the one that takes the upper $ r_1 $ row on the right side is the restoration matrix $ A_1 $ of the first bar. Become. Similarly, $ A_2 $ and $ A_3 $ can be obtained.
HOOI
Now, it is known that HOSVD does not give the best approximation. HOOI (higher orthogonal iteration of tensors) improves this by iteration.
First, prepare the restoration matrix $ A_i $ randomly at first. Then, except for the foot of interest (for example, No. 1), crush the foot of the original tensor with $ A_i (i \ neq 1) $. Then, like HOSVD, the matrix of the legs of interest and the other legs is multiplied by SVD, and the upper $ r_1 $ of the resulting matrix is taken as the new $ A_1 $.
Note that the amount of calculation required for SVD has dropped significantly because the dimensions have been dropped using the projection matrix except for the foot of interest.
This is executed for all legs as one loop, and the accuracy is improved by turning this loop.
A two-dimensional image can be regarded as a third-order tensor with dimensions (height, width, 3) because the pixel value is determined by specifying the x-coordinate, y-coordinate, and color (RGB). Of these, the matrix that crushes the height and width is obtained by HOSVD and HOOI, and the image is low-rank approximated. For HOSVD, see Past Articles.
For an image with a width of $ w $ and a height of $ h $, consider crushing the width dimension to $ r1 $ and the height dimension to $ r2 $. What we want to find is a projection matrix that crushes the width and height. Let this be $ a_1 ^ \ dagger, a_2 ^ \ dagger $. As for the color, there are only 3 dimensions, so I will not crush it.
First, it is necessary to prepare the initial values of $ a_1 and a_2 $. The initial value can be anything [^ 1], but each line must be orthogonal. I couldn't think of a good way to give it, so I prepared a random matrix of $ w \ times w $ and $ h \ times h $, SVD it, and fetched the top $ r_1 and r_2 $ rows. I tried it as a value.
[^ 1]: For the purpose of improving the approximation of HOSVD, it is effective to use the result of HOSVD as the initial value. However, if you want to lower the order of calculation than HOSVD, you can prepare the initial value randomly. In any case, iterates several times to obtain sufficient accuracy.
X1 = np.random.rand(w,w)
X2 = np.random.rand(h,h)
U,s,A1 = linalg.svd(X1)
U,s,A2 = linalg.svd(X2)
a1 = A1[:r1, :]
a2 = A2[:r2, :]
For your convenience later, let's prepare a function that gives a compressed and restored tensor using $ a_1 and a_2 $.
def restored_tensor(X,a1,a2):
pa1 = a1.T.dot(a1)
pa2 = a2.T.dot(a2)
X2 = np.tensordot(X,pa1,(1,0))
X3 = np.tensordot(X2,pa2,(0,0))
return X3.transpose(2,1,0)
This is a function that multiplies $ X $ by $ a_i ^ \ dagger $ to create a core tensor, and then multiplies $ a_i $ to return the restored tensor.
After that
Should be repeated. The following code implements it.
for i in range(10):
X1 = np.tensordot(X,a2.T,(0,0)).transpose(2,1,0).reshape(r2*3,w)
U,s,A1 = linalg.svd(X1)
a1 = A1[:r1, :]
X2 = np.tensordot(X,a1.T,(1,0)).transpose(2,1,0).reshape(r1*3,h)
U,s,A2 = linalg.svd(X2)
a2 = A2[:r2, :]
For the original tensor $ X $, given its approximate tensor $ X'$, the residual $ r $
Defined in. However
Consider a transformation that takes a suitable image as input and compresses each of the width and height dimensions to 20%. As an input image, I use a photo of the Itanium2 spirit horse that I happened to have at hand.
The following image is an approximation of this with HOSVD.
The residual was 0.966415890942.
When HOOI is turned 10 steps, the residuals decrease as follows.
1.28857987024
0.97532217632
0.95714093422
0.952636586271
0.950770629606
0.949809071863
0.949257702502
0.948920613334
0.948705104294
0.948562468306
The first line is the value without any optimization. If you turn the loop twice from there, the residual will be smaller than that of HOSVD. The image after turning 10 times is as follows.
The residual in this image is 0.948562468306, a little less than 2% improvement over HOSVD, but the difference is hard to distinguish visually (at least to me).
I thought the image was a tensor on the 3rd floor, and tried low-rank approximation using two methods, HOSVD and HOOI. For an image with a size of $ (w, h) $, if you crush the width to $ r_1 $ and the height to $ r_2 $, HOSVD has two, $ (3h, w) $ and $ (3w, h) $. You need to SVD the matrix once, but HOOI needs to loop several times instead of SVDing a fairly small matrix with $ (3r_2, w) $ and $ (3r_1, w) $. .. However, in the range tested in the image, the residual became smaller than that of HOSVD after turning it several times, so if the dimension of the original tensor is large, the total amount of calculation seems to be smaller in HOOI.
Recommended Posts