Linear algebra has an operation called singular value decomposition, which is used to approximate matrices. For details, refer to the above "Singular value decomposition of daimyo matrix".
Now, a matrix is like a two-dimensional array in a program. You can get the value of a two-dimensional array by specifying two indexes. Similarly, a matrix can also specify an element by specifying two indexes, a row and a column. A tensor is an extension of a matrix to a multidimensional array. It's a lot of trouble to discuss tensors seriously, but in this article, it's okay to just think of it as a "multidimensional array".
Now, like matrices, there is a need to approximate tensors. In the case of matrices, the singular value decomposition was the best approximation (in the sense of the Frobenius norm), but it is not well understood how to approximate a general tensor [^ 1]. The decomposition that divides a tensor into a small tensor (core tensor) and a set of matrices corresponding to each mode is called Tucker decomposition. In the above, I thought that the monochrome image was a matrix as it was and decomposed it into singular values, but in this article, I think that the color image is a third-order tensor and decompose it by Tucker. Several algorithms for obtaining Tucker decomposition have been proposed, but in this article, we will introduce the simplest algorithm called Higher Order SVD (HOSVD), which uses singular value decomposition obediently.
[^ 1]: Maybe there is a general theory, but I don't know.
TODO: Upload code to GitHub
Before explaining the Tucker decomposition, I will summarize the terms, matrices, and graph representations of tensors used in this paper [^ 2]. The index of a matrix or tensor is called a "foot". The procession has two legs. This number of rows and columns is called the "dimension" of the foot. Note that in multidimensional arrays, the "number of legs" is called the dimension, but in matrices and tensors, the "type of values that can be indexed" is called the dimension.
[^ 2]: I'm not very familiar with it, so please be aware that the following may have non-industry notation.
Matrix and tensor are represented by "squares" with "lines". The "line" is the foot. The procession has two legs, and the tensor on the third floor has three legs. If you want to clarify the dimension of each foot, write it near it.
By the way, the most important thing in graph representation is the operation of connecting lines. A matrix can be created as a new matrix by summing the legs of the same dimension. For example, a matrix can be a matrix product of an m-by-n matrix and an n-by-l matrix, resulting in an m-by-l matrix. Expressing this with squares and lines looks like this.
Even with tensors, you can crush the legs to create a new tensor by summing the legs with the same dimension. In a graph representation, you connect your legs like a matrix. The resulting tensor is a tensor with the dimension of each leg, with the number of remaining legs as the rank.
Now, color image data can be represented by a three-dimensional array with three indexes: height, width, and color. Think of this as a three-legged tensor and approximate it by singular value decomposition. Generally, the Tucker decomposition approximates all feet, but this time, since there are only three dimensions of the "color" foot, we will approximate only the "height" and "width" feet.
Originally, a tensor with three legs of height h, width w, and color c is divided into a product of one tensor and two matrices. The central tensor is called the "core tensor", and in this case it has colored legs and legs connected to the left and right matrices by dimension r. The left and right matrices have a foot of height h and a foot of dimension r, a foot of width w and a foot of dimension r, respectively, and r is a foot that "transmits information". Tucker decomposition compresses information by reducing the dimension of r. Let's try it below.
First, let's create an image of the target daimyo procession. Last time I made it in monochrome, but this time I will make a color image. To make it easier to see the color "bleeding" due to approximation, the four letters are yellow (R, G, 0), cyan (0, G, B), magenta (R, 0, B), and white (R, G), respectively. Let's color with, B). Is it like this? The font is installed for Google Colab on the way and the font path is specified, but if you use another environment, please correct it accordingly.
from PIL import Image, ImageDraw, ImageFont
import numpy as np
from scipy import linalg
from IPython.display import display
!apt-get -y install fonts-ipafont-gothic
fpath='/usr/share/fonts/opentype/ipafont-gothic/ipagp.ttf'
fontsize = 50
font = ImageFont.truetype(fpath, fontsize)
LX = 200
LY = fontsize
img = Image.new('RGB', (LX,LY),color="black")
draw = ImageDraw.Draw(img)
draw.text((0,0), "Big", fill=(255,255,0), font=font)
draw.text((0,0), "Name", fill=(0,255,255), font=font)
draw.text((0,0), "line", fill=(255,0,255), font=font)
draw.text((0,0), "Column", fill="white", font=font)
img
When executed, such an image will be created.
Let's get the data from the image.
data = np.array(img.getdata()).reshape(LY,LX,3)
You can think of data
as a three-dimensional array, a tensor with three legs, each with three dimensions: height LY, width LX, and color. The task of this article is to approximate this. Before that, let's take a look at each of the RGB planes.
R = data[:,:,0]
G = data[:,:,1]
B = data[:,:,2]
display(Image.fromarray(np.uint8(R)))
display(Image.fromarray(np.uint8(G)))
display(Image.fromarray(np.uint8(B)))
If you look at the world of R only, the world of G only, and the world of B only, one letter is missing from each. Also, since the "column" is white, it will not be chipped.
First, let's create a function that decomposes singular values. See above for details. A function that takes a matrix and returns two matrices approximated by a given rank would look like this:
def perform_svd(X, rank):
U, s, V = linalg.svd(X)
Ur = U[:, :rank]
Sr = np.diag(np.sqrt(s[:rank]))
Vr = V[:rank, :]
A = Ur @ Sr
B = Sr @ Vr
return A, B
If you insert the matrix X
, two matrices ʻA, B will be returned. By calculating ʻA @ B
, you can get a low-rank approximation of X
by rank
.
Now, let's approximate the tensor, but since it's a big deal, let's try various approximation methods.
This tensor has a height of 50, a width of 200, and a color of 3. The number of elements is 30,000. Ignore the original structure altogether, think of it as a 200 * 150 matrix, decompose it into singular values, and approximate it. Will it be such a function?
def simple_image(X, rank):
X = data.reshape((200,150))
A, B = perform_svd(X, rank)
Y = (A @ B).reshape(LY,LX,3)
Y = np.clip(Y, 0, 255)
Y = np.uint8(Y)
print((A.size+B.size)/X.size)
return Image.fromarray(Y)
The execution result looks like this.
display(simple_image(data, 50))
display(simple_image(data, 21))
display(simple_image(data, 8))
The data compression ratio is also displayed. From the top are 58.3%, 24.5% and 9.3%. This is the sum of the number of elements of A and B divided by the number of elements of X when the matrix X of 200 * 150 is divided into two matrices A and B. In spite of the very rough method, the original structure remains as it is.
The previous method originally completely ignored the three structures of height, width and color. Now let's consider the original structure a bit. A simple way to approximate a tensor that represents a color image is to think of the R, G, and B planes as "matrix", decompose them into singular values, and then combine them. let's do it.
def rgb_svd(X, rank):
R = X[:,:,0]
G = X[:,:,1]
B = X[:,:,2]
Rh, Rw = perform_svd(R, rank)
Gh, Gw = perform_svd(G, rank)
Bh, Bw = perform_svd(B, rank)
return Rh, Rw, Gh, Gw, Bh, Bw
The tensor X
is separated into a matrix representing the RGB plane, R is separated into Rh and Rw, G is separated into Gh and Gw, and B is separated into Bh and Bw, respectively. If you do Rh @ Rw
, the R plane will be restored, so if you restore the G and B planes in the same way, the original image will be restored. Image restoration will look like this.
def rgb_image(X, rank):
Rh, Rw, Gh, Gw, Bh, Bw = rgb_svd(X, rank)
R = Rh @ Rw
G = Gh @ Gw
B = Bh @ Bw
Y = np.asarray([R,G,B]).transpose(1,2,0)
Y = np.clip(Y, 0, 255)
Y = np.uint8(Y)
print((Rh.size+Rw.size)*3/X.size)
return Image.fromarray(Y)
Let's run it.
display(rgb_image(data, 50))
display(rgb_image(data, 10))
display(rgb_image(data, 4))
The compression ratios are 125%, 25%, and 10% from the top. Singular value decomposition may have more elements than the original data if the rank value to be left is large. For the other two, I tried to match the compression ratio with the rough method I mentioned earlier. I thought that the approximation would be better because the structure was taken into consideration, but it was unexpectedly bad.
Now let's break down the main subject of Tucker. See also Low-rank approximation of images with Tucker decomposition for the principle of Tucker decomposition. This function divides the three-legged tensor X
into the core tensor C
and the left and right matrices ʻU and
V`.
def tucker_decomposition(X, rank):
X = X.transpose(0,2,1)
XR = X.reshape(LY*3, LX)
_, _, V = linalg.svd(XR)
V = V[:rank,:]
Vt = V.transpose(1,0)
XL = X.reshape(LY, LX*3)
U, _, _ = linalg.svd(XL)
U = U[:,:rank]
Ut = U.transpose(1,0)
# Make a core tensor
UX = np.tensordot(Ut, X, (1,0))
C = np.tensordot(UX, Vt, (2, 0))
return U, C, V
A function that restores an image from a disassembled matrix and tensor looks like this.
def tucker_image(X, rank):
U, C, V = tucker_decomposition(X, rank)
UC = np.tensordot(U,C,(1,0))
Y = np.tensordot(UC, V, (2,0))
Y = Y.transpose((0,2,1))
Y = np.clip(Y, 0, 255)
Y = np.uint8(Y)
print((U.size+C.size+V.size)/X.size)
return Image.fromarray(Y)
Let's run it.
display(tucker_image(data, 50))
display(tucker_image(data, 23))
display(tucker_image(data, 10))
The compression ratios are 66.7%, 24.5%, and 9.3% from the top. It's clearly better than singular value decomposition for each of RGB, but it seems that the comparison with the rough method is subtle.
I thought that the image of the color "Daimyo Matrix" was a three-legged tensor, and tried to compress the information using singular value decomposition. It was surprising that it would be better to forcibly decompose the entire RGB plane into singular values than to decompose each RGB plane into singular values. Unlike the case of matrices, the tensor approximation is not very trivial as to what to do. I hope you'll try this code and think, "I see, the low-rank approximation of tensors is deep."
TODO: Write a commentary on the Tucker decomposition algorithm in this article.
Recommended Posts