numpy.matrix is deprecated. Use the
numpy.ndarray and the
A while ago, when I wrote an article "Singular value decomposition of daimyo matrix", a former colleague said "
numpy.matrix is deprecated. I was surprised when I was told [^ 1].
[^ 1]: Currently, it has been modified so that
matrix is not used, but it was used when it first appeared.
When I looked it up, there was a very detailed explanation on StackOverflow, so I will explain it based on that.
Linear algebra has a process called Singular Value Decomposition (SVD). The matrix X of m rows and n columns is divided into the products of the unitary matrix U of m rows and m columns, the diagonal matrix S of m rows and n columns, and the unitary matrix V of n rows and n columns. Let's make a suitable matrix.
import numpy as np from scipy import linalg X = (np.arange(6)+1).reshape(2,3)
Now X becomes a 2x3 matrix like this:
[[1 2 3] [4 5 6]]
Let's SVD this.
U, s, V = linalg.svd(X)
linalg.svd returns three
ndarrays for singular value decomposition. U is an m-by-n unitary matrix and V is an n-by-n unitary matrix. The matrix in the middle is a diagonal matrix with m rows and n columns, but it is a one-dimensional array because there are only diagonal elements.
print(s) # => [9.508032 0.77286964]
Now, the singular value decomposition returns to the original matrix by taking the matrix product in the order of U, S, V. However, since the diagonal matrix in the center is a one-dimensional array, it is necessary to reshape it into a matrix once. There is a function called
linalg.diagsvd for that.
S = linalg.diagsvd(s, X.shape, X.shape)
Now S is a 2x3 diagonal matrix.
print(S) # => # [[9.508032 0. 0. ] # [0. 0.77286964 0. ]]
After that, if you take the product of three matrices, it will return to the original matrix, but the product
numpy.ndarray will be the product of each element. You need to use
numpy.ndarray.dot for matrix multiplication, which is a crushing calculation.
print(U.dot(S.dot(V))) # => # [[1. 2. 3.] # [4. 5. 6.]]
It's back to normal, but I want to use an infix operator like
* instead of a method call like ʻU.dot (S.dot (V))
. That's why numpy.matrix
was prepared. Let's change the diagonal matrixS
Z = np.matrix(S)
type(S) # => numpy.ndarray type(Z) # => numpy.matrix
numpy.matrix becomes a matrix product when
*is used as an infix operator. It also makes the product of
numpy.matrix a matrix product. From here you can write something like ʻU * Z * V`.
print(U * Z * V) # => # [[1. 2. 3.] # [4. 5. 6.]]
As you can see,
matrix was used rather because it was convenient, but there were some problems. In particular, it may behave unintuitively when mixed with
ndarray. Let me give you an example from the previous SO.
Let's prepare a suitable
ndarray matrix ʻarr
and make amat
that makes it amatrix`.
arr = np.zeros((3,4)) mat = np.matrix(arr)
andmat` represent a 3-by-4 matrix.
First, let's add the two.
(arr+mat).shape # => (3, 4)
There is no problem because we are adding matrices of the same shape. Next, let's slice it. Let's add only the first line of each. Since I added only the first row of 3 rows and 4 columns, I would like it to be a 1 row and 4 column matrix.
(arr[0,:] + mat[0,:]).shape # => (1, 4)
This is no problem either.
Now let's add columns instead of rows. Let's add the first row of each. Expected to be 3 rows and 1 column.
(arr[:,0] + mat[:,0]).shape # => (3, 3)
It has become 3 rows and 3 columns. What happened?
This is because ʻarr [: .0]
has the form(3,)
, while mat [:, 0]
has the form(3,1)`, so Broadcasting Rules has been applied.
Let's actually create a matrix of (3,1) and a matrix of (3,) and add them.
x = np.ones(3) y = np.arange(3).reshape(3,1) x.shape # => (3,) y.shape # => (3, 1)
The situation is the same as before. Let's add it.
(x+y).shape # => (3, 3)
It is (3, 3). By the way, the value looks like this.
print(x) # => # [1. 1. 1.] print(y) # => #[ #  # ] print(x+y) # => #[[1. 1. 1.] # [2. 2. 2.] # [3. 3. 3.]]
matrix is a matrix product, but
/is a division for each element.
a = np.matrix([[2,4],[6,8]]) b = np.matrix(([2,2],[2,2]]) print(a) # => # [[2 4] # [6 8]] print(b) # => # [[2 2] # [2 2]] print(a*b) # => # [[12 12] # [28 28]]← I understand this print(a/b) # => # [[1. 2.] # [3. 4.]] ← ！！！
This is also unintuitive.
Well, the reason we wanted to use
matrix was that the matrix product of
ndarray was hard to use in the method and we wanted to write the matrix product in infix notation like ʻA * B
. However, starting with Python 3.5, the infix operator @
, which represents the matrix product of ndarray`, was introduced (PEP 465. )). Now, the singular value decomposition and the undo process can be written as follows.
import numpy as np from scipy import linalg X = (np.arange(6)+1).reshape(2,3) print(X) # => # [[1 2 3] # [4 5 6]] U, s, V = linalg.svd(X) S = linalg.diagsvd(s, X.shape, X.shape) print(U @ S @ V) # => # [[1. 2. 3.] # [4. 5. 6.]]
It's easy. If $ m <n $ in m rows and n columns, all the elements of $ V $ are not used, so you can write as follows using
np.diag instead of
U, s, V = linalg.svd(X) S = np.diag(s) print(U @ S @ V[:2,:])
Even when performing singular value decomposition and performing low-rank approximation of rank r, it can be written as
np.diag as follows. The following is a sample of a 200-by-50 matrix that is singularly decomposed and low-rank approximated. First, let's create a matrix X with 200 rows and 50 columns.
m = 200 n = 50 X = np.arange(m*n).reshape(m,n) print(X)
X is such a matrix.
[[ 0 1 2 ... 47 48 49] [ 50 51 52 ... 97 98 99] [ 100 101 102 ... 147 148 149] ... [9850 9851 9852 ... 9897 9898 9899] [9900 9901 9902 ... 9947 9948 9949] [9950 9951 9952 ... 9997 9998 9999]]
To approximate this with rank r = 10, do the following:
U, s, V = linalg.svd(X) S = np.diag(s) r = 10 Ur = U[:, :r] Sr = S[:r, :r] Vr = V[:r, :] Y = Ur @ Sr @ Vr print(np.array(Y, np.int))
Y is a real number, but it looks like this when converted to an integer.
[[ 0 0 1 ... 46 47 48] [ 50 50 51 ... 97 98 98] [ 100 101 102 ... 146 147 149] ... [9850 9851 9851 ... 9897 9897 9899] [9900 9901 9901 ... 9947 9947 9949] [9950 9951 9952 ... 9996 9998 9999]]
It feels pretty good.
With the introduction of the
@ operator, which represents the matrix product in infix notation of
numpy.ndarray, the biggest advantage of using
numpy.matrix has disappeared. From now on, let's use