[PYTHON] Notes on studying multidimensional scaling

Overview

Since I had the opportunity to touch on data analysis for the first time, I tried to organize the contents of my research on quantitative multidimensional scaling as a memorandum. Since I am an amateur in data analysis, python and mathematics, there may be some mistakes in the content, but please forgive me.

What is multidimensional scaling?

It is a method to obtain some degree of dissimilarity (distance, etc.) from given data and calculate coordinate values ​​in low dimensions such as two dimensions. This makes it possible to grasp the relative positional relationship even with multidimensional data from a 2D scatter plot. For example, you can obtain a "relative positional relationship" by using only the dissimilarity of "distance and travel time between cities such as Tokyo / Osaka" as input data. I think that the distance between cities can be calculated from low-dimensional data such as latitude and longitude, but even if this is any multidimensional, it can be reduced to low dimension, and you can see the general relationship between the data. I can do it.

Axiom of distance

There are several types of distances used as a measure of dissimilarity, but the following distance axioms must be met.

name Contents Formula
Non-negative The distance is positive d_{ij} \geq 0
Non-degenerate 0 if the start and end points match d_{ii} = 0
Symmetry Matches even if the start point and end point are exchanged d_{ij} = d_{ji}
Triangle inequality holds The shortest straight line connects without going through any midpoint d_{ij} \leq d_{ik} + d_{kj}

Flow of Quantitative Multidimensional Scaling

The contents that are being executed are roughly as follows. (1) Create a matrix whose elements are the squares of dissimilarity (distance) (2) Young householder transformation (3) Calculate eigenvalues ​​and eigenvectors (4) Perform spectral decomposition and approximate the eigenvalues (5) Get relative coordinates

Theoretically

For the time being, I want to understand a little about the theory, so I will try to deal with mathematical formulas for the first time in a few years. First, consider the following situations as a premise.

item Notation
Number of input data n
Elements (dimensions) of each data r(However,r \lt nAssumed)
iCoordinates of the second input data (x_{i1}, x_{i2}, ... , x_{ir})
iPosition vector of the second input data X_i

Representation of distance-squared matrix

The square of the distance (Euclidean distance) between the $ i $ th and $ j $ th data points can be calculated as follows.

d_{ij}^2 = \sum_{k=1}^{r}(x_{ik} - x_{jk})^2

If the distance squared matrix with this as an element is $ D ^ 2 $, it becomes a square matrix of $ n × n $ as shown below. Here, due to the non-degenerateness and symmetry of the axiom of distance, the diagonal component of the matrix is ​​0.

\textbf{D}^2 = 
\begin{pmatrix}
  d_{11}^2 & d_{12}^2 & \cdots & d_{1n}^2 \\
  d_{21}^2 & d_{22}^2 & \cdots & d_{2n}^2 \\
  \vdots & \vdots & \ddots &\vdots \\
  d_{n1}^2 & d_{n2}^2 & \cdots & d_{nn}^2
\end{pmatrix}
 = 
\begin{pmatrix}
  0 & d_{12}^2 & \cdots & d_{1n}^2 \\
  d_{21}^2 & 0 & \cdots & d_{2n}^2 \\
  \vdots & \vdots & \ddots &\vdots \\
  d_{n1}^2 & d_{n2}^2 & \cdots & 0
\end{pmatrix}

This matrix $ \ textbf {D} ^ 2 $ is the coordinate information of the input data $ (x_ {i1}, x_ {i2}, ..., x_ {ir}) $ (however, $ 1 \ leq i \ leq r Expressed using the coordinate matrix $ \ textbf {X} $ whose components are $), it can be expressed as follows. In multidimensional scaling, we will use this formula (1).

\textbf{D}^2 = diag(\textbf{XX}^T)\textbf{11}^T - 2\textbf{XX}^T + \textbf{11}^Tdiag(\textbf{XX}^T) \hspace{1cm} \cdots \:①

However, here it is as follows. $ \ Textbf {X} $ is a $ n × r $ matrix that corresponds to data with one row.

\textbf{X} = 
\begin{pmatrix} 
x_{11} & \cdots & x_{1r} \\
\vdots & \ddots & \vdots \\
x_{n1} & \cdots & x_{nr} \\
\end{pmatrix}

$ \ Textbf {11} ^ T $ is a $ n × n $ square matrix with all components 1. Corresponds to $ \ textbf {1} ^ T = \ left (1, \ cdots, 1 \ right) $.

\textbf{11}^T = 
\begin{pmatrix} 
1 & \cdots & 1 \\
\vdots & \ddots & \vdots \\
1 & \cdots & 1 \\
\end{pmatrix}

$ diag () $ is the one obtained by extracting only the diagonal components of the symmetric matrix. Here, a $ n × n $ square diagonal matrix with only the diagonal components of $ \ textbf {XX} ^ T $ extracted.

diag(\textbf{XX}^T) = 
\begin{pmatrix} 
\sum_{i=1}^r(x_{1i}^2) & 0 & \cdots & 0 \\
0 & \sum_{i=1}^r(x_{2i}^2) & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & \sum_{i=1}^r(x_{ni}^2) \\
\end{pmatrix}

Young householder transformation

Young Householder transformation is performed on the relational expression ① of this matrix $ \ textbf {D} ^ 2 $ and $ \ textbf {X} $. In the Young Householder transformation, the centralized matrix is ​​applied from the left and right of equation (1). This operation removes extra terms and allows the result to be represented only by a centralized coordinate matrix.

The centering matrix corresponds to operations such that the center of $ n $ of input data shifts to the origin. If the average coordinates of the data are $ \ left (m_1, \ m_2, \ \ cdots, \ m_r \ right) $, it can be expressed as follows.

m_j = \frac{\sum_{i=1}^nx_{ij}}{n}

If the centralized coordinate matrix is ​​$ \ textbf {X} ^ * $ and its elements are $ x_ {ij} ^ * $, the centralization is $ x_ {ij} ^ * = x_ {ij} --m_j $ Correspond. If you write this in a matrix,

\begin{align}
\textbf{X}^* &= 
\begin{pmatrix} 
x_{11}^* & \cdots & x_{1r}^* \\
\vdots & \ddots & \vdots \\
x_{n1}^* & \cdots & x_{nr}^*
\end{pmatrix} \\
&=\begin{pmatrix} 
x_{11} - m_1 & \cdots & x_{1r} - m_r \\
\vdots & \ddots & \vdots \\
x_{n1} - m_1 & \cdots & x_{nr} - m_r
\end{pmatrix} \\
&= \textbf{X}
- \begin{pmatrix} 
m_1 & \cdots & m_r \\
\vdots & \ddots & \vdots \\
m_1 & \cdots & m_r
\end{pmatrix} \\
&= \textbf{X}
- \frac{1}{n}\begin{pmatrix} 
\sum_{i=1}^nx_{i1} & \cdots & \sum_{i=1}^nx_{ir} \\
\vdots & \ddots & \vdots \\
\sum_{i=1}^nx_{i1} & \cdots & \sum_{i=1}^nx_{ir}
\end{pmatrix} \\
&= \textbf{X}
- \frac{1}{n}
\begin{pmatrix} 
1 & \cdots & 1 \\
\vdots & \ddots & \vdots \\
1 & \cdots & 1
\end{pmatrix}
\begin{pmatrix} 
x_{11} & \cdots & x_{1r} \\
\vdots & \ddots & \vdots \\
x_{n1} & \cdots & x_{nr}
\end{pmatrix} \\
&=\left(\textbf{I}-\frac{\textbf{11}^T}{n} \right) \textbf{X}
 \\
\end{align}

However, $ \ textbf {I} $ is the identity matrix of $ n × n . So, if the centralized matrix is ​​ \ textbf {Q} $,

\textbf{Q} = \textbf{I}-\frac{\textbf{11}^T}{n} \hspace{1cm} \cdots \: ②

It turned out to be. This centering matrix $ \ textbf {Q} $ is applied from both sides of equation (1), but since $ \ textbf {Q1} = \ textbf {1} ^ T \ textbf {Q} = 0 $, the right side is Only the second item remains.

\begin{align}
\textbf{QD}^2\textbf{Q} &= 
\textbf{Q}\left(
diag(\textbf{XX}^T)\textbf{11}*^T - 2\textbf{XX}^T + \textbf{11}^Tdiag(\textbf{XX}^T)
\right)\textbf{Q} \\
&= -2\textbf{QXX}^T\textbf{Q} \\
&= -2\left(\textbf{QX}\right) \left(\textbf{QX}\right)^T \\
&= -2\textbf{X}^* \textbf{X}^{*T}
\end{align}

Therefore, for the centralized coordinate matrix $ \ textbf {X} ^ * $

\textbf{P} = \textbf{X}^* \textbf{X}^{*T} = -\frac{1}{2}\textbf{QD}^2\textbf{Q}

It will be. Since the matrix $ \ textbf {P} $ can be calculated from equations ① and ②, it is possible to obtain a specific value. From the following, we can see that P is a symmetric matrix.

\begin{align}
\textbf{P}^T &= \left(\textbf{X}^{*}\textbf{X}^{*T}\right)^T \\
&=\left(\textbf{X}^{*T} \right)^T \left(\textbf{X}^{*}\right)^T \\
&=\textbf{X}^{*}\textbf{X}^{*T} \\
&=\textbf{P}
\end{align}

Spectral decomposition

Since the matrix $ \ textbf {P} $ obtained by the Young Householder transformation is a real symmetric matrix of $ n × n $, only the orthogonal matrix $ \ Gamma $ consisting of eigenvectors and the eigenvalues ​​are used as diagonal components. It can be expressed as follows using the diagonal matrix $ \ Lambda $.

\begin{align}
\textbf{P} &=\textbf{X}^{*}\textbf{X}^{*T} \\
&=\Gamma\Lambda\Gamma^T \\
&=\Gamma\Lambda^{1/2}\Lambda^{1/2}\Gamma^T \\
&=\Gamma\Lambda^{1/2}\left(\Gamma\Lambda^{1/2}\right)^T
\end{align}

From this, about the centralized coordinate matrix

\textbf{X}^{*} = \Gamma\Lambda^{1/2} \hspace{1cm} \cdots \: ③

Therefore, it can be obtained from the eigenvalues ​​and eigenvectors. For each matrix, the $ i $ eigenvalue is $ \ lambda_i $, and the corresponding eigenvector is $ \ gamma_ {i} = \ left (\ gamma_ {1i}, \ gamma_ {2i}, \ \ cdots , \ \ gamma_ {ni} \ right) ^ T $ is as follows.

\Lambda = 
\begin{pmatrix} 
\lambda_1 & & 0 \\
 & \ddots &  \\
0 & & \lambda_n 
\end{pmatrix} \\

\Gamma = 
\begin{pmatrix} 
\gamma_{1} & \cdots & \gamma_{n} 
\end{pmatrix}
=
\begin{pmatrix} 
\gamma_{11} & \cdots & \gamma_{1n} \\
\vdots & \ddots & \vdots \\
\gamma_{n1} & \cdots & \gamma_{nn} \\
\end{pmatrix} \\

In spectral decomposition, the number of dimensions of the output coordinate information is reduced by approximately considering the eigenvalues ​​with a low contribution rate as zero. For example, if you want to obtain a two-dimensional relative positional relationship, consider up to $ \ lambda_1 $ and $ \ lambda_2 $, and set the eigenvalues ​​after $ \ lambda_3 $ to zero. Here, it is assumed that the eigenvalues ​​are used up to $ m \ left (\ leq r \ right) $ as the spectral decomposition, and the rest are approximated to 0. At this time, $ \ textbf {X} ^ {*} $ is as follows from equation ③.

\begin{align}
\textbf{X}^{*} 
&= \Gamma\Lambda^{1/2} \\
&= 
\begin{pmatrix} 
\gamma_{1} & \cdots & \gamma_{n} 
\end{pmatrix}
\begin{pmatrix} 
\sqrt{\lambda_1} & & 0 \\
 & \ddots &  \\
0 & & \sqrt{\lambda_n}
\end{pmatrix} \\
&=
\begin{pmatrix} 
\sqrt{\lambda_1}\gamma_{1} & \cdots & \sqrt{\lambda_m}\gamma_{m} & \sqrt{\lambda_{m+1}}\gamma_{m+1} & \cdots & \sqrt{\lambda_n}\gamma_{n}
\end{pmatrix} \\
&\sim
 \begin{pmatrix} 
\sqrt{\lambda_1}\gamma_{1} & \cdots & \sqrt{\lambda_m}\gamma_{m} & 0 & \cdots & 0
\end{pmatrix} \\
&=
\begin{pmatrix} 
\sqrt{\lambda_1}\gamma_{11} & \cdots & \sqrt{\lambda_m}\gamma_{1m} & \\
\vdots & \ddots & \vdots & 0\\
\sqrt{\lambda_1}\gamma_{n1} & \cdots & \sqrt{\lambda_m}\gamma_{nm} & 
\end{pmatrix}  \hspace{1cm} \cdots \: ④
\end{align}

From equation (4), the number of dimensions $ r $ of the centralized coordinate matrix $ \ textbf {X} ^ {*} $ is reduced to $ m $ by spectral decomposition. This is the output value in Multidimensional Scaling.

Write in python

Speaking of data analysis, it is python, so I wrote it while touching it for the first time. I think that it can be easily implemented by using sklearn, but I did not dare to use it here, but I tried to follow the steps below to grasp the flow. (1) Create a matrix whose elements are the squares of dissimilarity (distance) (2) Young householder transformation (3) Calculate eigenvalues ​​and eigenvectors (4) Perform spectral decomposition and approximate the eigenvalues (5) Get relative coordinates (6) Plot

(1) Create a matrix whose elements are the squares of dissimilarity (distance)

The input data used is from the Open University of Japan lecture: Data Analysis and Knowledge Discovery here. The contents are a line consisting of the time required between stations on the Yamanote Line, so this time we will consider this as a distance.

# -*- coding=utf_8_sig -*-
#Data reading
import pandas as pd
url="http://www.is.ouj.ac.jp/lec/16data/slide/Data/yamatesjis.csv"
df = pd.read_csv(filepath_or_buffer=url, encoding="shift_jis", sep=",", index_col=0)

#Create a matrix of distance squares
d2 = df*df

(2) Young householder transformation

Here we create a centralized matrix and apply it to the squared distance matrix (d2).

import numpy as np
#Identity matrix
e = np.eye(len(d2))

#Centralized matrix
q = e - 1/len(d2)

#Young householder transformation
p = np.dot(np.dot( q, d2 ), q) * (-1/2)

(3) Calculate eigenvalues ​​and eigenvectors

import numpy.linalg as la

#Calculation of eigenvalues, calculation of eigenvectors
eig_val, eig_vec = la.eig(p)

(4) Perform spectral decomposition and approximate the eigenvalues

The number of eigenvalues ​​used in spectral decomposition (the number of dimensions of the coordinates obtained in the output) is set to 2 set by the variable dim. In addition, the contribution rate is calculated to confirm the significance of the positional relationship that is the output.

#Calculation of contribution rate
con_rate = []
tot_eig_val = np.sum(np.abs(eig_val))
for i in range(len(eig_val)):
    con_rate.append(np.abs(eig_val[i])/tot_eig_val)

#Spectral decomposition
dim=2

for i in range(dim):
    print('Eigenvalue#' + str(i+1) + ' : ' + str(eig_val[i].real))
    print('Contribution Rate = ' + str(con_rate[i]) + '\n----')
  
    #To get the coordinates, calculate the route of the eigenvalue and take the product with the eigenvector.
    mod_eig_vec = np.sqrt(eig_val[i]) * eig_vec[:,i:i+1]
  
    #Get coordinates
    if i == 0:
        crd = (mod_eig_vec)
    else:
        crd = np.append(crd, mod_eig_vec, axis=1)

(5) Get relative coordinates

Write the obtained coordinate information to the file result_mds.csv.

#Output file
outfile = open("result_mds.csv", encoding="utf_8_sig",mode="w")

#Export results
i = 0
while i < len(df.iloc[1:1,1:].columns):
    outfile.write(df.iloc[1:1,1:].columns[i])
    j = 0
    while j < dim:
        outfile.write(',' + str(crd[i,j].real))
        j+=1
    i+=1
    outfile.write('\n')

(6) Plot

Let's draw the obtained relative coordinates as a scatter plot with matplotlib.

import matplotlib.pyplot as plt
import japanize_matplotlib 
import numpy as np

#Read file
dat = np.loadtxt("result_mds.csv", encoding="utf_8_sig", delimiter=",",dtype="unicode")
rows = dat.shape[0]

#Graph and font size settings
plt.rcParams["font.size"] = 16
plt.figure(figsize=(8.0, 6.0))

#Data reading
label = dat[:,0:1]
x = dat[:,1:2].astype(np.float64)
y = dat[:,2:3].astype(np.float64)

#Label the axes
plt.xlabel("X")
plt.ylabel("Y")

#Creating a scatter plot
plt.scatter(x, y, c="b")

#Label settings
for i in range(rows):
    plt.annotate(label[i][0], xy=(x[i][0],y[i][0]))

#Drawing a scatter plot
plt.show()

As a result, the following is obtained, and the relative positional relationship is obtained from the distance information. index.png

Finally

Basically, I focused on information on the Web, but I feel that I have somehow understood the flow and outline of multidimensional scaling. I also enjoyed Python for the first time, so I'd like to take this opportunity to make various efforts.

reference

Recommended Posts

Notes on studying multidimensional scaling
Notes on Flask
Notes on neural networks
Notes on installing PycURL
Notes on using Alembic
Notes on SciPy.linalg functions
Notes on tf.function and Tracing
Notes on installing dlib on mac
Notes on python's sqlite3 module
Notes on * args and ** kargs
Notes on defining PySide slots (2)
[Django] Notes on using django-debug-toolbar
Notes on pyenv and Atom
Notes on defining PySide slots
[Python] Notes on data analysis
Notes on optimization using Pytorch
Notes on installing Python on Mac
Notes on installing pipenv on Mac
Notes on installing Anaconda 3 on Windows
Notes on imshow () in OpenCV
Notes on installing Python on CentOS
Notes on package management with conda
Notes on using MeCab from Python
[Golang] Notes on frequently used functions
Notes on how to use pywinauto
Notes on using post-receive and post-merge
Notes on how to use featuretools
Notes on using rstrip with python.
Notes on accessing dashDB from python
Notes on how to use doctest
Notes on using matplotlib on the server
Notes on how to write requirements.txt
Notes on installing Ubuntu 18.04 on the XPS 15 7590
(Beginner) Notes on using pyenv on Mac