Past daimyo matrix factorization articles.

Introduction

Linear algebra has an operation called Singular Value Decomposition (SVD), which is used to approximate matrices. For details, refer to "Singular value decomposition of daimyo matrix". The low-rank approximation of the matrix by SVD is the best approximation in the sense of the Frobenius norm. Now, there is a need to approximate not only matrices but also higher-order tensors with many legs. The number of tensors is the same as the original tensor, but it is decomposed by a combination of a core tensor with lower-dimensional legs and a reconstruction matrix that increases the dimension of each leg of the core tensor, and this decomposition is called Tucker decomposition. When the tensor is decomposed into Tucker, it is unclear how to get the best approximation. In the above-mentioned "Tucker decomposition of daimyo matrix", I tried Tucker decomposition using Higher Order SVD (HOSVD), which is a straightforward application of SVD to a tensor, but this is the best approximation. It is known that it is not. In this article, we will apply the Higher Order Orthogonal Iteration of tensors (HOOI), which improves the approximation accuracy by iteration, and compare it with the results of HOSVD.

The code is below.

https://github.com/kaityo256/daimyo_hooi

If you want to open Jupyter Notebook in Google Colab, click here (https://colab.research.google.com/github/kaityo256/daimyo_hooi/blob/main/daimyo_hooi.ipynb).

What is Tucker decomposition?

Below, the tensor is represented in a graph, so if you are not familiar with it, please refer to Graph representation of tensor.

The Tucker decomposition decomposes a tensor with thick legs into a core tensor with thin legs and a reconstruction matrix that returns from thin legs to thick legs.

The reconstruction matrix is Isometry (a part of the orthogonal matrix extracted), and when transposed, it becomes a compression matrix that narrows down "thick (large dimension) legs" to "thin (low dimension) legs". .. You can get the core tensor by adding each compression matrix to the foot of the original tensor.

Therefore, Tucker decomposition is a problem that "when a tensor is given, find the restoration/compression matrix corresponding to each leg by the number of legs".

Now, the core tensor is the original tensor's legs multiplied by the compression matrix, and the core tensor's legs multiplied by the restoration matrix is the approximate tensor. If you draw them all together, it will look like this.

In other words, the Tucker decomposition is a low-rank approximation by narrowing the thick legs of the original tensor in the middle. Since the paired triangles are transposed (accompanied) to each other, the Tucker decomposition is to find this triangle for the number of legs.

Now, from this point of view, the SVD matrix approximation also seems to be "squeezed". Consider the appropriate $ D $ dimension Hermitian matrix $ V $. From the definition of the Hermitian matrix, the product with the conjugate matrix $ V ^ \ dagger $ is the identity matrix.

V V^\dagger = I

Now, $ V $ is a square matrix, but consider a matrix $ \ tilde {V} $ that takes only the $ d (<D) $ columns from the left. The corresponding conjugate transpose is the matrix $ \ tilde {V} ^ \ dagger $, which is the $ r $ column from the top of $ V ^ \ dagger $. $ \ Tilde {V} $ is a D-by-d matrix, which is a compressed matrix that drops the "foot dimension" from D to d. Conversely, $ \ tilde {V} ^ \ dagger $ acts as a restore matrix that returns the dimension of the foot from d to D.

Now, the singular value decomposition of the matrix $ M $

M = U \Sigma V^{\dagger}

Think about. Creating a compression/decompression matrix $ \ tilde {V}, \ tilde {V} ^ \ dagger $ using the $ V $ that appears in this decomposition best low-rank approximates the original matrix in the sense of the Frobenius norm. can do. This is the principle of matrix approximation by singular value decomposition.

Looking at the singular value decomposition of a matrix, we can create a compression matrix and a restoration matrix using the matrix obtained by decomposition (corresponding to the right foot in the above figure). HOSVD applies this to the tensor as it is.

In HOSVD, everything except the foot of interest is put together and decomposed into singular values as a matrix. Then you will get the compression/decompression matrix corresponding to the bar of interest. Doing this for each bar will give you a compression/decompression matrix for each bar.

The HOSVD algorithm is simple, but it has the problem that it is computationally expensive and the resulting decomposition is not the best approximation. HOOI (Higher Order Orthogonal Iteration of tensors) solves this problem by iteration. First, prepare the initial values of the compression/decompression matrix to be applied to each bar. The values can be random, but they must be Isometry (column or row vectors are orthogonal). Then, the compression matrix is crushed for the legs other than the one of interest, and the compression matrix is updated by SVDing the remaining legs. Then do the same for the other bar, but use the updated compression matrix for the other bar. HOOI tries to improve the compression matrix by repeating this procedure.

Let's look at the specific procedure. Suppose you have a three-legged tensor and you want to disassemble it into a Tucker. Name each foot 1, 2, and 3. Ultimately, we want a compression matrix for each bar.

First, prepare the initial value of the compression matrix appropriately. You'll need a random orthogonal matrix, but you can use SciPy's ortho_group.rvs to create it in one shot. Let's set the compression matrix of the kth step corresponding to the i-th bar as $ V_i ^ {(k)} $. First, pay attention to the 1st leg, and crush the 2nd and 3rd legs with $ V_2 ^ {(0)} $ and $ V_3 ^ {(0)} $. In that state, the 2nd and 3rd legs are put together and regarded as a matrix with only the 1st thickest leg, and singular value decomposition is performed. Then, "in the state where the 2nd and 3rd legs are crushed by the compression matrix that currently has", the compression matrix that crushes the 1st leg with the highest accuracy is obtained. Consider this as the "compression matrix of the first leg of the next step" and set it to $ V_1 ^ {(1)} $.

Next, focus on number 2 and do the same. However, I will use the updated $ V_1 ^ {(1)} $ to crush the first leg. This will give you $ V_2 ^ {(1)} $. Similarly, when $ V_3 ^ {(1)} $ is obtained, one step of HOOI iteration is completed. After that, HOOI is to go around this step and update $ V_i ^ {(k)} $.

Since low-rank approximation by singular value decomposition is the best approximation in the sense of Frobenius norm, the process of finding a new compression matrix at each step applies the least squares method with the "state in which the other legs are crushed" fixed. It will be. Therefore, HOOI is a method of "obtaining the method of crushing the foot of interest by the least squares method so as to minimize the Frobenius norm while fixing the method of crushing the other foot at each step" ALS (Alternating Least Square) Is equivalent to [^ 1].

[^ 1]: I first learned about HOOI, but later it was pointed out on Twitter that it was equivalent to ALS. I'm not sure which one comes first.

Therefore, if this iteration converges, the resulting Tucker decomposition is "the best approximation of the original tensor in the sense of the Frobenius norm". However, I couldn't prove the convergence, and it seems that it is generally unknown whether it will converge.

HOOI the hay procession

The introduction has become long, but let's break down the hay process into Tucker with HOOI. Clicking here (https://colab.research.google.com/github/kaityo256/daimyo_hooi/blob/main/daimyo_hooi.ipynb) will open the Jupyter Notebook in Google Colab and you can do the same as below.

What is actually decomposed is a color image drawn as "Daimyo Procession". The color image can be regarded as a tensor on the third floor with three legs of height (H), width (W), and color (C), but since the colored legs are thin, I decided to crush only the legs of height and width. let's do it. Finally make such a decomposition.

The legs with dimensions of H and W are crushed into r. The colored legs remain the same. Since the sizes of H and W are different, it would be better to crush them in proportion to them, but since it is troublesome, we will crush them in the same dimension. Let L be the restoration matrix corresponding to the left foot and R be the restoration matrix corresponding to the right foot. When each transposes, it becomes a compressed matrix.

Create a color image to be decomposed in the same way as in the previous article (https://qiita.com/kaityo256/items/2e3f45377a6b9760f3e0).

from PIL import Image, ImageDraw, ImageFont
import numpy as np
from scipy import linalg
from IPython.display import display
from scipy.stats import ortho_group
import matplotlib.pyplot as plt

!apt-get -y install fonts-ipafont-gothic

fpath='/usr/share/fonts/opentype/ipafont-gothic/ipagp.ttf'
fontsize = 50
font = ImageFont.truetype(fpath, fontsize)

W = 200
H = fontsize
img  = Image.new('RGB', (W,H),color="black")
draw = ImageDraw.Draw(img)
draw.text((0,0), "Big", fill=(255,255,0), font=font)
draw.text((0,0), "Name", fill=(0,255,255), font=font)
draw.text((0,0), "line", fill=(255,0,255), font=font)
draw.text((0,0), "Column", fill="white", font=font)
img

You can get such an image.

daimyo1.png.png

After that, you can get the tensor corresponding to the image by doing the following.

data = np.array(img.getdata()).reshape(H,W,3)

It is convenient to have the colored legs in the middle, so let's transpose [^ dot].

[^ dot]: If you do not devise the position of your feet, the position of your feet will be incorrect after you do np.tensordot. If you put the colored feet in the middle and apply the compression matrix from the left and right, the feet will not shift. When applying to a general tensor, transpose is required after np.tensordot as appropriate.

X = data.transpose(0,2,1)

This tensor X is the tensor you want to decompose.

Now, let L be the restoration matrix corresponding to the foot of height H, and R be the restoration matrix corresponding to the foot of width W. The initial value is randomly created.

L = ortho_group.rvs(H)[:, :r]
R = ortho_group.rvs(W)[:r, :]

r is the dimension you want to keep.

First, let's update R. Put your L on X, crush your legs, decompose the singular value, and then take only the r line from the top as the new R.

    LX = np.tensordot(L.transpose(), X, (1,0)).reshape(r*3, W)
    U,s,V = linalg.svd(LX)
    R = V[:r, :]

Next, use the updated R to crush X, and take only the r column from the left of the matrix obtained by singular value decomposition, and let it be the new L [^ 2].

[^ 2]: I'm not sure if the expressions such as from the right, from the top, rows, and columns are correct. Please check for yourself.

    XR = np.tensordot(X, R.transpose(), (2,0)).reshape(H, 3*r)
    U,s,V = linalg.svd(XR)
    L = U[:, :r]

This completes one step of HOOI. After that, if you turn this around, the accuracy will increase. It depends on the r to be left, but if it is the size of this image, it will converge in 5 to 10 steps.

After convergence, the restore matrices L and R are obtained. Each transpose is a compression matrix. If you multiply it from the left and right of X of the original tensor, you will get the core tensor C.

  LX = np.tensordot(L.transpose(), X, (1,0))
  C = np.tensordot(LX, R.transpose(), (2,0))

It's not that difficult. The function that summarizes the above processing looks like this.

def tucker_hooi(X, r):
  L = ortho_group.rvs(H)[:, :r]
  R = ortho_group.rvs(W)[:r, :]
  X = X.transpose(0,2,1)
  for i in range(20):
    LX = np.tensordot(L.transpose(), X, (1,0)).reshape(r*3, W)
    U,s,V = linalg.svd(LX)
    R = V[:r, :]
    XR = np.tensordot(X, R.transpose(), (2,0)).reshape(H, 3*r)
    U,s,V = linalg.svd(XR)
    L = U[:, :r]
  LX = np.tensordot(L.transpose(), X, (1,0))
  C = np.tensordot(LX, R.transpose(), (2,0))
  return L, C, R

A function that takes a third-order tensor X and returns the core tensor C approximated by the specified rank r and the left and right reconstruction matrices L and R.

Comparison of HOSVD and HOOI

Let's create a function that restores the image from the obtained Tucker decomposition (X ~ LCR).

def restore_image(L, C, R):
  LC = np.tensordot(L,C,(1,0))
  Y = np.tensordot(LC, R, (2,0))
  Y = Y.transpose((0,2,1))
  Y = np.clip(Y, 0, 255)
  Y = np.uint8(Y)
  return Image.fromarray(Y)

When the Tucker is disassembled, the legs are in the order of H, C, W, so it is necessary to change the order of H, W, C. Also, originally it was np.uint8 in the range of 0 to 255, but it extends beyond that range by approximation, so correct it with clip.

Let's compare the images created by HOSVD and HOOI with r = 10. First, HOSVD.

L, C, R = tucker_hosvd(data, 10)
restore_image(L, C, R)

Next is HOOI.

L, C, R = tucker_hooi(data, 10)
restore_image(L, C, R)

Yeah, it's indistinguishable. Let's look at the residuals. Create a function that returns the difference between the Frobenius norms with the original tensor and the reconstruction matrices L and R.

def restore(L, C, R):
  LC = np.tensordot(L,C,(1,0))
  Y = np.tensordot(LC, R, (2,0))
  Y = Y.transpose((0,2,1))
  return Y

def residual(X, U, C, V):
  Y = restore(U, C, V)
  return np.square(X-Y).sum() / np.square(X).sum()

Let's use this to see the rank dependence of the residuals.

fig, ax = plt.subplots()
r_hosvd = []
r_hooi = []
D = 10
for r in range(D,H):
  L, C, R = tucker_hosvd(data, r)
  r_hosvd.append(residual(data, L, C, R))
  L, C, R = tucker_hooi(data, r)
  r_hooi.append(residual(data, L, C, R))

x = list(range(D, H))
ax.set_xlabel('Rank')
ax.set_ylabel('Error')
ax.plot(x, r_hosvd,label="HOSVD")
ax.plot(x, r_hooi, label="HOOI")
ax.legend(loc=0)

The result looks like this.

Yeah, it's almost indistinguishable.

Let's also look at the difference between HOSVD and HOOI.

diff = np.array(r_hosvd) - np.array(r_hooi)
fig, ax = plt.subplots()
ax.set_xlabel('Rank')
ax.set_ylabel('HOSVD - HOOI')
plt.plot(x, diff)

The difference of "norm difference from the original tensor" is plotted. Since HOOI is subtracted from the value of HOSVD, the larger the positive direction, the higher the accuracy of HOOI. It is true that HOOI is more accurate, but when the error is on the order of 0.14, the difference is 0.004, so there is not much difference to worry about.

Computational complexity

There was not much difference in accuracy between HOSVD and HOOI in the range I tried this time. However, the amount of calculation is lighter in HOOI. For the sake of simplicity, let's consider the Tucker decomposition that drops the third-order tensor, each of which has a D dimension, into the d dimension.

The complexity of the singular value decomposition of m rows and n columns is $ O (\ mathrm {min} (mn ^ 2, m ^ 2n)) $. In HOSVD, the original tensor is regarded as a matrix of $ D ^ 2 $ rows and $ D $ columns and decomposed into singular values, so the amount of calculation is $ O (D ^ 5) $.

On the other hand, HOOI decomposes the original tensor as a matrix of $ d ^ 2 $ rows and $ D $ columns. For $ D \ gg d $, the complexity is $ O (D ^ 2d) $. You can see that the larger $ D $, the easier the calculation.

Summary

I thought that the image of the color "Daimyo procession" was a three-legged tensor, and tried disassembling the Tucker using HOOI. Among HOOI and HOSVD, HOOI was slightly more accurate, but there was not much difference. The amount of calculation should be lighter for HOOI, but this time it was not measured properly. I'll leave that to someone.

I hope this article helps those who have trouble understanding Tucker decomposition.

References

Derivation and implementation of Tucker decomposition Looking here, I understand that HOOI and ALS are equivalent.

[PYTHON] Tucker decomposition of the hay process with HOOI