Python: Unsupervised Learning: Principal Component Analysis

Principal component analysis

Principal component analysis

Principal Component Analysis (PCA) It is one of the powerful methods of summarizing data (representing the original data with a small number of data).

As an example, using principal component analysis, compress the math and national language score data (2D) of 10 students. The image to be converted to one dimension is shown.

image.png

Rather than explaining each individual's score with only the mathematical score (one dimension) as shown in the figure on the right. It is possible to explain with a smaller error by preparing one new axis and creating new one-dimensional data as shown in the figure on the left.

The figure on the left is a diagram of data compression using principal component analysis. Using principal component analysis, the axis (the axis of the first principal component) that can explain all the data most efficiently An axis (axis of the second principal component) that explains the data that cannot be explained by itself most efficiently is created.

Because the first principal component can express the original data well Data can be compressed efficiently by discarding (not using) the information of the second principal component.

As a practical example of principal component analysis, scoring and comparison of products and services (compressed to one dimension) Data visualization (compressed to 2D and 3D), regression analysis preprocessing, etc.

Principal component analysis is highly practical and has become one of the important themes in the field of machine learning.

Flow until feature conversion

Using principal component analysis, perform data compression (feature conversion) according to the following procedure.

The figure below is an image of a wine dataset that has been feature-transformed and the data summarized from 13 dimensions to 2 dimensions.

image.png

Data preparation

We will perform principal component analysis. The data used is wine data published in the "UCI Machine Learning Repository". Represents grape type data (labels 1-3) and wine chemistry for 178 lines of wine sample It consists of feature data (13 types).

Get the data as follows.

import pandas as pd
df_wine = pd.read_csv("./5030_unsupervised_learning_data/wine.csv", header = None)
#Feature data is stored in X and label data is stored in y.
# df_The first column of wine is label data, and the second and subsequent columns are feature data.
X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values

print(X.shape)

Standardization

The wine data is converted in advance so that the average is 0 and the variance is 1 for each feature. This is called standardization.

Standardization makes it possible to handle various types of data with different units and standard values, such as alcohol content and wine hue, in the same way.

import numpy as np
#Standardization
X = (X - X.mean(axis=0)) / X.std(axis=0)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df_wine = pd.read_csv("./5030_unsupervised_learning_data/wine.csv", header=None)
X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values

#Visualize data before standardization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 3))
ax1.set_title('before')
ax2.set_title('before')
ax1.scatter(X[:, 0], X[:, 1])
ax2.scatter(X[:, 5], X[:, 6])
plt.show()

print("before")
print("mean: ", X.mean(axis=0), "\nstd: ", X.std(axis=0))

#Substitute X-standardized data
X = (X - X.mean(axis=0)) / X.std(axis=0)

#Visualize data after standardization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 3))
ax1.set_title('after')
ax2.set_title('after')
ax1.scatter(X[:, 0], X[:, 1])
ax2.scatter(X[:, 5], X[:, 6])
plt.show()

print("after")
print("mean: ", X.mean(axis=0), "\nstd: ", X.std(axis=0))

Correlation matrix calculation

Calculate the correlation matrix of the data to check the similarity of each feature.

The correlation coefficient is an indicator of the strength of the linear relationship between two data and takes a value between -1 and 1. When the correlation coefficient is close to 1 (the positive correlation is strong), one of the two data increases as shown in the figure on the left. It has a linear distribution in which the other also increases.

When the negative correlation is strong, it has a linear distribution in which one increases and the other decreases.

When the correlation coefficient is close to 0, there is not much linear relationship as shown in the figure below (r = 0).

image.png

Here, the correlation coefficient of each of the 13 types of wine characteristic data is retained. Find the 13x13 correlation matrix. The correlation matrix to be obtained has the following form.

image.png

Get the correlation matrix as follows. The corrcoef () function is not the correlation between columns (horizontally) Make each row-to-row (vertical) correlation a correlation matrix. Therefore, if nothing is done, the correlation matrix between the data will be I will be asked.

X is transposed in X.T to find the correlation matrix between features rather than data.

import numpy as np
R = np.corrcoef(X.T)

It's an evolutionary story, but the correlation matrix itself can be calculated in the same way using X before standardization. Pre-standardized for later.

There is also principal component analysis using the covariance matrix instead of the correlation matrix.

The code to create the identity matrix of a square matrix is as follows.

import numpy as np

# np.identity(Matrix size)
identity = np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

As a practical method

import pandas as pd
import numpy as np

df_wine = pd.read_csv("./5030_unsupervised_learning_data/wine.csv", header=None)
X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values

#Create a correlation matrix (13x13)
R = np.corrcoef(X.T)

#Diagonal component 0
_R = R - np.identity(13)

#Get the index that takes the maximum correlation coefficient
index = np.where(_R == _R.max())

print(R[index[0][0], index[1][0]])
print(index)

Eigenvalue decomposition

Next, apply a mathematical method called eigenvalue decomposition to the obtained correlation matrix. Gets the eigenvectors and eigenvalues.

After eigenvalue decomposition, the original 13x13 dimensional matrix R is 13 special 13 dimensional vectors. (Eigenvector)Image.png And 13 special numbers (Eigenvalue)Image.png Will be disassembled into.

Intuitively, the original matrix has information concentrated in the direction of the eigenvectors. The corresponding eigenvalues can be said to indicate the degree of information concentration.

You can use numpy to calculate the eigenvalue decomposition as follows: 13 eigenvalues of R and 13 eigenvectors are stored in eigvals and eigvecs, respectively.

import numpy as np
#Get the eigenpair from the correlation matrix. numpy.linalg.eigh returns them in ascending order of eigenvalues
eigvals, eigvecs = np.linalg.eigh(R)

The correlation matrix R, the matrix V in which the eigenvectors are arranged, and the diagonal matrix D in which the eigenvalues are arranged satisfy the following equations. image.png The elements are as follows.

image.png

The eigenvector of the correlation matrix represents the principal component vector The components of the eigenvectors show the effect of each feature on the principal components.

Also, the eigenvectors corresponding to the larger eigenvalues are deeply involved in the composition of the original matrix.

In other words, by ignoring the eigenvectors (principal component vectors) that correspond to the small eigenvalues Information loss can be suppressed while reducing features.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df_wine = pd.read_csv("./5030_unsupervised_learning_data/wine.csv", header=None)
X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values

#Create a correlation matrix (13x13)
R = np.corrcoef(X.T)

#Eigenvalue decomposition
eigvals, eigvecs = np.linalg.eigh(R)

#Visualization
plt.bar(range(13), eigvals)
plt.title("distribution of eigvals")
plt.xlabel("index")
plt.ylabel("eigvals")
plt.show()

#Please do not erase it. It is used to check the execution result.
print(eigvals)

Feature conversion

In the previous session, we decomposed the correlation matrix into eigenvalues and eigenvectors. Then, using the two eigenvectors corresponding to the largest and second largest eigenvalues.

Create a 13x2 matrix W that converts 13-dimensional features to 2D, and create Wine data X with 13-dimensional features. Converts to new Wine data X'with only two-dimensional features of the first and second principal components.

Create the transformation matrix W as follows.

#Concatenates the eigenvectors corresponding to the largest and second largest eigenvalues in the column direction
W = np.c_[eigvecs[:,-1], eigvecs[:,-2]]

From the above, we have created a 13-by-2 matrix. Furthermore, by multiplying this matrix W with the original data X, we can generate a matrix X'compressed with X.

image.png

The calculation of the matrix product is performed by the following code

import numpy as np
X_pca = X.dot(W)

Using the eigenvectors and eigenvalues, the following equation also holds:

image.png

Let the two eigenvalues from the largest be λ1, λ2 and the corresponding eigenvectors be v1 and v2. These v1 and v2 are the transformation matrix.

Multiplying R by the eigenvector v1 yields new data that extends well in the v1 direction (large variance).

Multiplying R by the eigenvector v2 is orthogonal to the v1 vector (cannot be explained by v1) You can get new data that grows well in the v2 direction (large variance).

Similarly, multiplying X by the eigenvectors v1 and v2 You can get well-extended feature data in two orthogonal directions.

Principal component analysis using scikit-learn

So far, we have implemented feature conversion using principal component analysis. In fact, you can easily do the same thing with the PCA class in sklearn.decomposition. Use the PCA class as follows.

from sklearn.decomposition import PCA
#Create an instance of PCA by specifying the number of principal components. Specify the number of dimensions after conversion with an argument.
pca = PCA(n_components=2)
#Learn the transformation model from the data and transform.
X_pca = pca.fit_transform(X)

The fit_transform () method automatically generates a transformation matrix internally.

Principal component analysis as pretreatment

Applying what you have learned so far, you can apply principal component analysis to the preprocessing of regression analysis. By compressing the data in advance, it is possible to generate a more versatile regression analysis model that is resistant to disturbances such as outliers.

First, split the data into training data and test data.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0)

When performing feature conversion, if different transformation matrices are obtained for training data and test data and feature conversion is performed. It is not possible to compare the data after feature conversion because the transformation matrix is different.

The same is true for standardization. This can be inconvenient

When performing standardization and principal component analysis, use common criteria for training and test data.
When standardizing, it is convenient to use the StandardScalar class as follows.
from sklearn.preprocessing import StandardScaler
#Create an instance for standardization
sc = StandardScaler()
#Learn the transformation model from the training data and apply the same model to the test data
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

When performing principal component analysis, use the PCA class as follows.

from sklearn.decomposition import PCA
#Create an instance of principal component analysis
pca = PCA(n_components=2)
#Learn the transformation model from the training data and apply the same model to the test data
X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = pca.transform(X_test_std)

As a review, the regression analysis is done as follows.

from sklearn.linear_model import LogisticRegression
#Instantiate logistic regression
lr = LogisticRegression()
#Learn classification model
lr.fit(X, y)
#Display score
print(lr.score(X, y))
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression

df_wine = pd.read_csv("./5030_unsupervised_learning_data/wine.csv", header=None)

X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.4, random_state=0)

#Create an instance for standardization
sc = StandardScaler()
#Learn transformation model from training data and apply to test data
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

#Create an instance of principal component analysis
pca = PCA(n_components=2)
#Learn transformation model from training data and apply to test data
X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = pca.transform(X_test_std)

#Instantiate logistic regression
lr = LogisticRegression()
#Learn classification model with training data after dimension reduction
lr.fit(X_train_pca, y_train)

#Display score
print(lr.score(X_train_pca, y_train))
print(lr.score(X_test_pca, y_test))

Kernel principal component analysis

Kernel principal component analysis

Many machine learning algorithms, such as regression analysis, can be linearly separated It is assumed that the data will be given. However, as a practical matter, data that is difficult to linearly separate In other words, most of the data needs to be non-linearly separated. In this session, a kernelized PCA that can handle data that needs to be non-linearly separated.

"Kernel PCA(kernel PCA)Will be covered in this section.

First of all with kernel PCA Data X of a given NxM (number of data x type of feature) Remake it into data K of completely new NxM´ (number of data x type of features) (kernel trick).

When kernel tricks are used, the types of features generally increase (features are expanded). It makes it easier to perform linear separation.

It is known that using principal component analysis on highly non-linear data does not work. Principal component analysis can be performed by expanding the data to the kernel matrix K.

The figure below shows two-dimensional data distributed in a circle after increasing the features using kernel tricks. It is a diagram that is plotted by performing principal component analysis and returning the features to two.

Using kernel PCA, the following data that cannot be linearly separated in two-dimensional space It can be converted to linearly separable data.

image.png

Kernel trick 1

First, calculate the kernel (similarity) matrix K. The following matrix is called a kernel matrix, and the similarity is calculated for each pair of sample data. The kernel matrix of data X of N x M (number of data x type of features) is It becomes N x N (number of data x number of data). You can treat the kernel matrix K like data and perform analysis such as regression and classification.

image.png

Shown hereImage.png Is

It is called a "kernel function" and there are several types.

This time, we will use the kernel function called "Gaussian kernel" in "Radial basis function (RBF)" and expressed by the following formula. Two data with this formulaImage.png Represents the similarity.

image.png Like X, K represents individual data in rows and features (similarity to other data) in columns.

HereImage.png It is a function of the graph like.

image.png

image.pngを大きくするとより近接するものだけに注目したような特徴量行列Kが作られます。

Kernel trick 2

Implement the following functions used for kernel tricks.

image.png

You can calculate the kernel matrix as follows:

#Calculate the square of the distance between data (square Euclidean distance)
M = np.sum((X - X[:, np.newaxis])**2, axis=2)
#Calculate kernel matrix
K = np.exp(-gamma * M)

Here, to get the distance between data, a function called broadcasting a numpy array (Automatically expands the matrix, aligns the shape of the matrix, and executes the operation).

import numpy as np

np.random.seed(39)

X = np.random.rand(8, 3)

#Calculate the square Euclidean distance for each pair
M =np.sum((X - X[:, np.newaxis])**2, axis=2)

#Calculate kernel matrix
gamma = 15
K = np.exp(-gamma * M)

# ---------------------------
#K is a numpy array.
#You can get and display the size of the numpy array A as follows:
print(K.shape)
# 
# ---------------------------

print(M)  #Please do not erase it. It is used to check the execution result.
print(K)  #Please do not erase it. It is used to check the execution result.

Feature conversion

Substituting the kernel matrix K for the original data X, for K as in the standard principal component analysis method Eigenvalue decomposition, feature conversion, etc. can be performed to convert to linearly separable data X'.

Originally, K is an expansion of the features of X, so the matrix obtained by converting the features of K is It can be treated as a transformed matrix of X features.

Using what we have learned so far, we will transform the circular data using kernel principal component analysis.

image.png

As a development The transformation matrix W created from the eigenvectors of the kernel matrix K is You can also treat X as it is as X'compressed and summarized.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_circles

#Get data where the data is distributed in a circle
X, y = make_circles(n_samples=1000, random_state=123, noise=0.1, factor=0.2)

#Calculate the square Euclidean distance for each pair
M = np.sum((X - X[:, np.newaxis])**2, axis=2)

#Calculate symmetric kernel matrix
gamma = 15
K = np.exp(-gamma * M)

#Get the unique pair from the kernel matrix. numpy.linalg.eigh returns them in ascending order of eigenvalues
eigvals, eigvecs = np.linalg.eigh(K)
#Top k eigenvectors(Projected sample)Collect
W = np.column_stack((eigvecs[:, -1], eigvecs[:, -2]))

#Find the inner product of K and W to obtain linearly separable data.
X_kpca = K.dot(W)

#Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 3))
ax1.scatter(X[y == 0, 0], X[y == 0, 1], color="r", marker="^")
ax1.scatter(X[y == 1, 0], X[y == 1, 1], color="b", marker="o")
ax2.scatter(X_kpca[y == 0, 0], X_kpca[y == 0, 1], color="r", marker="^")
ax2.scatter(X_kpca[y == 1, 0], X_kpca[y == 1, 1], color="b", marker="o")
ax1.set_title("circle_data")
ax2.set_title("kernel_pca")
plt.show()

print(M)  #Please do not erase it. It is used to check the execution result.
print(K)  #Please do not erase it. It is used to check the execution result.
print(W)  #Please do not erase it. It is used to check the execution result.
print(X_kpca)  #Please do not erase it. It is used to check the execution result.

Kernel principal component analysis using scikit-learn

Kernel principal component analysis is similar to standard PCA

sklearn.It is easy to implement using decomposition.

Usage is almost the same as standard PCA. Arguments allow you to specify the number of compressed dimensions and the kernel type not found in standard PCA.

from sklearn.decomposition import KernelPCA
#The kernel (radial basis function) used this time is kernel="rbf"Can be specified with.
kpca = KernelPCA(n_components=2, kernel="rbf", gamma=15)
X_kpca = kpca.fit_transform(X)

Here, the moon-shaped data is separated as shown in the figure below. Get the data as follows:

from sklearn.datasets import make_moons
#Get moon data
X, y = make_moons(n_samples=100, random_state=123)

image.png

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import make_moons
#Import Kernel PCA
# ---------------------------
from sklearn.decomposition import KernelPCA
# ---------------------------

#Get moon data
X, y = make_moons(n_samples=100, random_state=123)

#Instantiate KernelPCA class
kpca = KernelPCA(n_components=2, kernel="rbf", gamma=15)
#Convert data X using Kernel PCA
X_kpca = kpca.fit_transform(X)

#Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 3))
ax1.scatter(X[y == 0, 0], X[y == 0, 1], c="r")
ax1.scatter(X[y == 1, 0], X[y == 1, 1], c="b")
ax1.set_title("moon_data")
ax2.scatter(X_kpca[y == 0, 0], X_kpca[y == 0, 1], c="r")
ax2.scatter(X_kpca[y == 1, 0], X_kpca[y == 1, 1], c="b")
ax2.set_title("kernel_PCA")
plt.show()

print(X_kpca)  #Please do not erase it. It is used to check the execution result.

Recommended Posts

Python: Unsupervised Learning: Principal Component Analysis
Unsupervised learning 3 Principal component analysis
Introduction to Python Basics of Machine Learning (Unsupervised Learning / Principal Component Analysis)
Principal component analysis
Coursera Machine Learning Challenges in Python: ex7-2 (Principal Component Analysis)
Principal component analysis with Power BI + Python
<Course> Machine learning Chapter 4: Principal component analysis
Python: Unsupervised Learning: Basics
Principal component analysis (Principal component analysis: PCA)
Robot grip position (Python PCA principal component analysis)
Python data analysis learning notes
Python: Unsupervised Learning: Non-hierarchical clustering
2. Multivariate analysis spelled out in Python 3-2. Principal component analysis (algorithm)
2. Multivariate analysis spelled out in Python 3-1. Principal component analysis (scikit-learn)
Principal component analysis with Spark ML
python learning
Visualize the correlation matrix by principal component analysis in Python
Principal Component Analysis with Livedoor News Corpus-Practice-
Machine learning with python (2) Simple regression analysis
[Python] First data analysis / machine learning (Kaggle)
Data analysis starting with python (data preprocessing-machine learning)
[Python] Learning Note 1
Python learning notes
Data analysis python
python learning output
Python learning site
Python learning day 4
Python Deep Learning
Unsupervised learning 1 Basics
Python learning (supplement)
Deep learning × Python
python learning notes
Principal component analysis with Livedoor News Corpus --Preparation--
Dimensional compression with self-encoder and principal component analysis
Python 3 Engineer Certification Data Analysis Exam Pre-Exam Learning
I tried principal component analysis with Titanic data!
[Python] Data analysis, machine learning practice (Kaggle) -Data preprocessing-
[Python] Comparison of Principal Component Analysis Theory and Implementation by Python (PCA, Kernel PCA, 2DPCA)
Data analysis with python 2
Python: Time Series Analysis
Python class (Python learning memo ⑦)
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
Python module (Python learning memo ④)
Reinforcement learning 1 Python installation
Collaborative filtering with principal component analysis and K-means clustering
Learning Python with ChemTHEATER 05-1
Python ~ Grammar speed learning ~
Data analysis using Python 0
Unsupervised learning 2 non-hierarchical clustering
Voice analysis with python
Mathematical understanding of principal component analysis from the beginning
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
Clustering and principal component analysis by K-means method (beginner)
Private Python learning procedure
Learning Python with ChemTHEATER 02
Learning Python with ChemTHEATER 01
Principal component analysis Analyze handwritten numbers using PCA. Part 2
Python: Deep Learning Tuning
Association analysis in Python
Principal component analysis Analyze handwritten numbers using PCA. Part 1