[PYTHON] Principal component analysis

Principal component analysis

Comparison of probabilistic principal component analysis, Bayesian principal component analysis, and kernel principal component analysis, which are extensions of principal component analysis.

Principal component analysis (PCA)

How to reduce high-dimensional data to low-dimensional data There are various ways to find it, but it is quick to interpret it as singular value decomposition.

X = UDV^T

Further dimensionality reduction vector

X_{pca} = XV_{pca}

Can be obtained with. However, $ V_ {pca} $ is created from the number of dimensions reduced from the matrix V. (If the dimension is reduced to two dimensions, $ V_ {pca} = V [:, [0,1]] $)

Probabilistic Principal Component Analysis (Probabilistic PCA)

Probabilistic dimensionality reduction using Gaussian distribution There are multiple ways to find it, but when finding it with the EM algorithm, At E-step

M = W^TW+\sigma^2I \\
E[z_n] = M^{-1}W^T(x_n-\bar{x}) \\
E[z_{n}z_{n}^T]=\sigma^2M^{-1}+E[z_n]E[z_n]^T

However,

In M-step

W = \bigl[\sum_{n=1}^{N}(x_n-\bar{x})E[z_n]^T\bigr]\bigl[\sum_{n=1}^{N}E[z_nz_n^T]\bigr]^{-1}\\
\sigma^{2} = \frac{1}{ND}\sum_{n=1}^{N}\bigl\{||x_n-\bar{x}||^2 - 2E[z_n]^TW^T(x_n-\bar{x}) + Tr(E[z_nz_n^T]W^TW)\bigr\}

However,

Can be obtained with.

Bayesian Principal Component Analysis (Bayes PCA)

Bayesian estimation is performed by introducing hyperparameters into the Gaussian distribution.

Compared to the case of Probabilistic PCA, the M-step is different,

\alpha_i = \frac{D}{w_i^Tw_i} \\
W = \bigl[\sum_{n=1}^{N}(x_n-\bar{x})E[z_n]^T\bigr]\bigl[\sum_{n=1}^{N}E[z_nz_n^T] + \sigma^2A \bigr]^{-1}\\
\sigma^{2} = \frac{1}{ND}\sum_{n=1}^{N}\bigl\{||x_n-\bar{x}||^2 - 2E[z_n]^TW^T(x_n-\bar{x}) + Tr(E[z_nz_n^T]W^TW)\bigr\}

However,

Is.

Kernel principal component analysis

Principal component analysis is performed after converting a matrix of number of data x number of dimensions into a matrix of number of data x number of data by the kernel.

\tilde{K} = K - 1_{N}K - K1_N+1_NK1_N

However,

For $ \ tilde {K} $ obtained in this way, dimension reduction is performed by finding the eigenvalues and eigenvectors, as in the case of principal component analysis.

Experiment

Dimensionality reduction is performed using principal component analysis (PCA), probabilistic principal component analysis (PPCA), Bayesian principal component analysis (BPCA), and kernel principal component analysis (KPCA).

The data used is iris data (data of 3 types of plants are represented by 4-dimensional vectors, and there are 50 data for each type).

The code is here https://github.com/kenchin110100/machine_learning

The figure below is a plot after reducing the dimensions to two dimensions.

The boundaries between types can be clearly seen with PPCA and BPCA than with PCA. KPCA feels different, but it certainly has plots for each type.

At the end

Four types of principal component analysis were performed, and it seems to be easy to use per BPCA. There are two axes of probabilistic calculation and kernel as an extension method of PCA. There seems to be the strongest principal component analysis that combines them ...

Recommended Posts

Principal component analysis
Principal component analysis (Principal component analysis: PCA)
Unsupervised learning 3 Principal component analysis
Face recognition using principal component analysis
Principal component analysis with Spark ML
Python: Unsupervised Learning: Principal Component Analysis
Principal Component Analysis with Livedoor News Corpus-Practice-
Principal component analysis with Power BI + Python
<Course> Machine learning Chapter 4: Principal component analysis
Principal component analysis with Livedoor News Corpus --Preparation--
Dimensional compression with self-encoder and principal component analysis
I tried principal component analysis with Titanic data!
PRML Chapter 12 Bayesian Principal Component Analysis Python Implementation
Robot grip position (Python PCA principal component analysis)
2. Multivariate analysis spelled out in Python 3-2. Principal component analysis (algorithm)
Collaborative filtering with principal component analysis and K-means clustering
Mathematical understanding of principal component analysis from the beginning
Clustering and principal component analysis by K-means method (beginner)
Challenge principal component analysis of text data with Python
Implementation of independent component analysis
Principal component analysis Analyze handwritten numbers using PCA. Part 2
Principal component analysis using python from nim with nimpy
Principal component analysis (PCA) and independent component analysis (ICA) in python
Principal component analysis Analyze handwritten numbers using PCA. Part 1
2. Multivariate analysis spelled out in Python 3-1. Principal component analysis (scikit-learn)
Coursera Machine Learning Challenges in Python: ex7-2 (Principal Component Analysis)
Visualize the correlation matrix by principal component analysis in Python
Principal component analysis hands-on with PyCaret [normalization + visualization (plotly)] memo
Introduction to Python Basics of Machine Learning (Unsupervised Learning / Principal Component Analysis)
[GWAS] Plot the results of principal component analysis (PCA) by PLINK
100 Language Processing Knock-85 (Truncated SVD): Dimensional compression by principal component analysis
Parabolic analysis
[Python] Comparison of Principal Component Analysis Theory and Implementation by Python (PCA, Kernel PCA, 2DPCA)
Let's start multivariate analysis and principal component analysis with Pokemon! Collaboration between R and Tableau