PRML Chapter 12 Bayesian Principal Component Analysis Python Implementation

In this article, we will implement Bayesian principal component analysis. I think the main use of Principal Component Analysis (PCA) is to find the projection from the observation space (high dimension) where the target data exists to the latent space (low dimension). If the purpose is visualization, the latent space is made into 2 (or 3) dimensions, but for data preprocessing, it is rare to know how many dimensions of the latent space should be set. There is also a method of calculating the contribution rate, but in the end we have to set the threshold value at that time. Therefore, in Bayesian principal component analysis, the dimension of the latent space is automatically determined by the automatic determination of the degree of relevance.

Stochastic principal component analysis

By probabilistically interpreting principal component analysis, it will be possible to handle it in a Bayesian manner later. In the probabilistic principal component analysis, the data $ x $ (D dimension) we observed is a projection of $ z $ (M dimension) sampled from the latent space with the matrix $ W $ ((D, M) matrix). Then, it is interpreted as moving in parallel ($ + \ mu $ (D dimension)) and adding noise ($ + \ epsilon $ (D dimension)).

{\bf x = Wz + \mu + \epsilon}

Now assume that $ z $ and $ \ epsilon $ follow a Gaussian distribution. There are three parameters to estimate, $ W, \ mu, \ sigma ^ 2 $ (variance of Gaussian distribution followed by $ \ epsilon $), and its likelihood function is as follows.

p({\bf x|W,\mu},\sigma^2) = \int p({\bf x|Wz+\mu},\sigma^2)p({\bf z}){\rm d}{\bf z}

Two methods for maximizing this likelihood function are introduced in PRML. The first is a method that simply uses singular value decomposition, and the second is when the posterior distribution update (E step) for $ z $ and the complete data $ x, z $ are given using the EM algorithm. It repeats the maximization (M step) of the likelihood function of.

Bayesian principal component analysis

In the above example, the dimension of the latent variable space was fixed to M. Bayesian principal component analysis uses automatic relevance determination to prun the extra dimensions. (However, the value of M does not actually decrease.) Therefore, the parameter $ W $ has the following prior distribution.

p({\bf W}|{\bf \alpha}) = \prod_{i=1}^M\left({\alpha_i\over2\pi}\right)^{D/2}\exp\left\{-{1\over2}\alpha_i{\bf w}_i^\top {\bf w}_i\right\}

Where $ {\ bf w} \ _i $ is the $ i $ th column vector of $ {\ bf W} $. And $ \ alpha \ _i $ acts as an accuracy parameter for the individual Gaussian distributions. Estimating $ \ alpha $, some components have very large values. In that case, the precision is very high, so the component of the corresponding column vector of $ {\ bf} W $ is only 0, that is, the dimension is pruned.

code

Library

Use only matplotlib and numpy as usual.

import matplotlib.pyplot as plt
import numpy as np

Principal component analysis by maximum likelihood method

Method using eigenvalue decomposition as usual

#Class for principal component analysis
class PCA(object):

    def __init__(self, n_component):
        #Specify the dimension of the latent space
        self.n_component = n_component

    #Principal component analysis by maximum likelihood method
    def fit(self, X):
        #PRML formula(12.1)Calculate maximum likelihood estimate of mu
        self.mean = np.mean(X, axis=0)

        #PRML formula(12.2)Data covariance matrix
        cov = np.cov(X, rowvar=False)

        #Eigenvalue decomposition
        values, vectors = np.linalg.eigh(cov)
        index = np.size(X, 1) - self.n_component

        #PRML formula(12.46) sigma^Maximum likelihood estimate of 2
        if index == 0:
            self.var = 0
        else:
            self.var = np.mean(values[:index])

        #PRML formula(12.45)Maximum likelihood estimate of the projection matrix W
        self.W = vectors[:, index:].dot(np.sqrt(np.diag(values[index:]) - self.var * np.eye(self.n_component)))

Bayesian principal component analysis

The methods below are all the methods in the PCA class. This time, for comparison, the parameters are first estimated with maximum likelihood, and they are used as initial values for pruning by automatic determination of relevance.

    #Bayesian principal component analysis
    def fit_bayesian(self, X, iter_max=100):
        #Data space dimensions
        self.ndim = np.size(X, 1)

        #Estimate the parameter by maximum likelihood estimation and use it as the initial value
        self.fit(X)

        #Initialization of precision parameters(First estimate)
        self.alpha = self.ndim / np.sum(self.W ** 2, axis=0)

        #Zero averaging of data
        D = X - self.mean

        #Repeat the EM algorithm a specified number of times
        for i in xrange(iter_max):
            #Sufficient statistic for step z
            Ez, Ezz = self.expectation(D)

            #M step W,sigma^2 estimates
            self.maximize(D, Ez, Ezz)

            #PRML formula(12.62)Hyperparameter updates
            self.alpha = self.ndim / np.sum(self.W ** 2, axis=0).clip(min=1e-10)

    #Sufficient statistic of E step z E[z]、E[zz^T]Calculation
    def expectation(self, D):
        #PRML formula(12.41)
        M = self.W.T.dot(self.W) + self.var * np.eye(self.n_component)
        Minv = np.linalg.inv(M)

        #PRML formula(12.54) E[z]
        Ez = D.dot(self.W).dot(Minv)

        #PRML formula(12.55) E[zz^T]
        Ezz = self.var * Minv + np.einsum('ni,nj->nij', Ez, Ez)
        return Ez, Ezz

    #M step W,sigma^2 estimates
    def maximize(self, D, Ez, Ezz):
        #PRML formula(12.63)Estimate of W
        self.W = D.T.dot(Ez).dot(np.linalg.inv(np.sum(Ezz, axis=0) + self.var * np.diag(self.alpha)))

        #PRML formula(12.57) sigma^2 estimates
        self.var = np.mean(
            np.mean(D ** 2, axis=-1)
            - 2 * np.mean(Ez.dot(self.W.T) * D, axis=-1)
            + np.trace(Ezz.dot(self.W.T).dot(self.W).T) / self.ndim)

Hinton diagram

This time, we will reproduce PRML Figure 12.14. To do this, we need a function to draw a Hinton diagram that represents each element of the matrix as a square. Page that came out by googled with matplotlib hinton is slightly modified.

def hinton(matrix, max_weight=None, ax=None):
    """Draw Hinton diagram for visualizing a weight matrix."""
    ax = ax if ax is not None else plt.gca()

    if not max_weight:
        max_weight = 2 ** np.ceil(np.log(np.abs(matrix).max()) / np.log(2))

    ax.patch.set_facecolor('gray')
    ax.set_aspect('equal', 'box')
    ax.xaxis.set_major_locator(plt.NullLocator())
    ax.yaxis.set_major_locator(plt.NullLocator())

    for (x, y), w in np.ndenumerate(matrix):
        color = 'white' if w > 0 else 'black'
        size = np.sqrt(np.abs(w) / max_weight)
        rect = plt.Rectangle([y - size / 2, x - size / 2], size, size,
                             facecolor=color, edgecolor=color)
        ax.add_patch(rect)

    ax.autoscale_view()
    ax.invert_yaxis()
    plt.xlim(-0.5, np.size(matrix, 1) - 0.5)
    plt.ylim(-0.5, len(matrix) - 0.5)
    plt.show()

Main function

def create_toy_data(sample_size=100, ndim_hidden=1, ndim_observe=2, std=1.):
    Z = np.random.normal(size=(sample_size, ndim_hidden))
    mu = np.random.uniform(-5, 5, size=(ndim_observe))
    W = np.random.uniform(-5, 5, (ndim_hidden, ndim_observe))

    #PRML formula(12.33)
    X = Z.dot(W) + mu + np.random.normal(scale=std, size=(sample_size, ndim_observe))
    return X


def main():
    #Create 100 points of data created by projecting from a 3D latent space to a 10D space
    X = create_toy_data(sample_size=100, ndim_hidden=3, ndim_observe=10, std=1.)

    #Perform PCA by maximum likelihood method with latent space as 9 dimensions
    pca = PCA(9)
    pca.fit(X)
    hinton(pca.W)

    #Perform Bayesian PCA by pruning from up to 9 dimensions
    pca.fit_bayesian(X)
    hinton(pca.W)

Whole code

pca.py


import matplotlib.pyplot as plt
import numpy as np


class PCA(object):

    def __init__(self, n_component):
        self.n_component = n_component

    def fit(self, X):
        self.mean = np.mean(X, axis=0)
        cov = np.cov(X, rowvar=False)
        values, vectors = np.linalg.eigh(cov)
        index = np.size(X, 1) - self.n_component
        if index == 0:
            self.var = 0
        else:
            self.var = np.mean(values[:index])
        self.W = vectors[:, index:].dot(np.sqrt(np.diag(values[index:]) - self.var * np.eye(self.n_component)))

    def fit_bayesian(self, X, iter_max=100):
        self.ndim = np.size(X, 1)
        self.fit(X)
        self.alpha = self.ndim / np.sum(self.W ** 2, axis=0)
        D = X - self.mean
        for i in xrange(iter_max):
            Ez, Ezz = self.expectation(D)
            self.maximize(D, Ez, Ezz)
            self.alpha = self.ndim / np.sum(self.W ** 2, axis=0).clip(min=1e-10)

    def expectation(self, D):
        M = self.W.T.dot(self.W) + self.var * np.eye(self.n_component)
        Minv = np.linalg.inv(M)
        Ez = D.dot(self.W).dot(Minv)
        Ezz = self.var * Minv + np.einsum('ni,nj->nij', Ez, Ez)
        return Ez, Ezz

    def maximize(self, D, Ez, Ezz):
        self.W = D.T.dot(Ez).dot(np.linalg.inv(np.sum(Ezz, axis=0) + self.var * np.diag(self.alpha)))
        self.var = np.mean(
            np.mean(D ** 2, axis=-1)
            - 2 * np.mean(Ez.dot(self.W.T) * D, axis=-1)
            + np.trace(Ezz.dot(self.W.T).dot(self.W).T) / self.ndim)


def hinton(matrix, max_weight=None, ax=None):
    """Draw Hinton diagram for visualizing a weight matrix."""
    ax = ax if ax is not None else plt.gca()

    if not max_weight:
        max_weight = 2 ** np.ceil(np.log(np.abs(matrix).max()) / np.log(2))

    ax.patch.set_facecolor('gray')
    ax.set_aspect('equal', 'box')
    ax.xaxis.set_major_locator(plt.NullLocator())
    ax.yaxis.set_major_locator(plt.NullLocator())

    for (x, y), w in np.ndenumerate(matrix):
        color = 'white' if w > 0 else 'black'
        size = np.sqrt(np.abs(w) / max_weight)
        rect = plt.Rectangle([y - size / 2, x - size / 2], size, size,
                             facecolor=color, edgecolor=color)
        ax.add_patch(rect)

    ax.autoscale_view()
    ax.invert_yaxis()
    plt.xlim(-0.5, np.size(matrix, 1) - 0.5)
    plt.ylim(-0.5, len(matrix) - 0.5)
    plt.show()


def create_toy_data(sample_size=100, ndim_hidden=1, ndim_observe=2, std=1.):
    Z = np.random.normal(size=(sample_size, ndim_hidden))
    mu = np.random.uniform(-5, 5, size=(ndim_observe))
    W = np.random.uniform(-5, 5, (ndim_hidden, ndim_observe))
    X = Z.dot(W) + mu + np.random.normal(scale=std, size=(sample_size, ndim_observe))
    return X


def main():
    X = create_toy_data(sample_size=100, ndim_hidden=3, ndim_observe=10, std=1.)

    pca = PCA(9)
    pca.fit(X)
    hinton(pca.W)

    pca.fit_bayesian(X)
    hinton(pca.W)


if __name__ == '__main__':
    main()

result

The result of estimating the projection matrix W by PCA by the maximum likelihood method is as follows. mle.png Then, when the dimension of the latent space is pruned using Bayesian principal component analysis, the projection matrix W becomes as shown in the figure below. The six columns on the left have disappeared and the corresponding dimensions have been pruned. It is possible to capture the fact that it has been projected from three dimensions. bayesian.png PRML The results shown in Figure 12.14 were obtained.

At the end

When PRML came to this point, it became more common to combine what we had learned so far and apply it to new models. This time, by combining the automatic determination of relevance in Chapter 6 and the EM algorithm in Chapter 9, the observation data is actually applied to the model that is projected from a lower dimensional space. It seems that it can be applied to data with missing values, so I would like to try that as well.

Recommended Posts

PRML Chapter 12 Bayesian Principal Component Analysis Python Implementation
PRML Chapter 4 Bayesian Logistic Regression Python Implementation
PRML Chapter 1 Bayesian Curve Fitting Python Implementation
PRML Chapter 5 Neural Network Python Implementation
PRML Chapter 3 Evidence Approximation Python Implementation
Python: Unsupervised Learning: Principal Component Analysis
Principal component analysis
PRML Chapter 8 Product Sum Algorithm Python Implementation
PRML Chapter 5 Mixed Density Network Python Implementation
PRML Chapter 9 Mixed Gaussian Distribution Python Implementation
PRML Chapter 14 Conditional Mixed Model Python Implementation
PRML Chapter 10 Variational Gaussian Distribution Python Implementation
PRML Chapter 6 Gaussian Process Regression Python Implementation
PRML Chapter 2 Student's t Distribution Python Implementation
Principal component analysis with Power BI + Python
<Course> Machine learning Chapter 4: Principal component analysis
Implemented in Python PRML Chapter 1 Bayesian Inference
Implemented in Python PRML Chapter 3 Bayesian Linear Regression
PRML Chapter 11 Markov Chain Monte Carlo Python Implementation
Principal component analysis (Principal component analysis: PCA)
Robot grip position (Python PCA principal component analysis)
[Python] Comparison of Principal Component Analysis Theory and Implementation by Python (PCA, Kernel PCA, 2DPCA)
2. Multivariate analysis spelled out in Python 3-2. Principal component analysis (algorithm)
Principal component analysis using python from nim with nimpy
Principal component analysis (PCA) and independent component analysis (ICA) in python
2. Multivariate analysis spelled out in Python 3-1. Principal component analysis (scikit-learn)
Python for Data Analysis Chapter 4
Unsupervised learning 3 Principal component analysis
Implementation of independent component analysis
Python for Data Analysis Chapter 2
Python for Data Analysis Chapter 3
Coursera Machine Learning Challenges in Python: ex7-2 (Principal Component Analysis)
Visualize the correlation matrix by principal component analysis in Python
PRML Chapter 7 Related Vector Machine Python Implementation for Regression Problems
Explanation and implementation of PRML Chapter 4
Principal component analysis with Spark ML
Introduction to Python Basics of Machine Learning (Unsupervised Learning / Principal Component Analysis)
PRML Chapter 13 Maximum Likelihood Estimating Python Implementation of Hidden Markov Models
Principal Component Analysis with Livedoor News Corpus-Practice-
PRML implementation Chapter 3 Linear basis function model
Recommendation tutorial using association analysis (python implementation)
Implemented in Python PRML Chapter 5 Neural Networks
Principal component analysis with Livedoor News Corpus --Preparation--
Dimensional compression with self-encoder and principal component analysis
I tried principal component analysis with Titanic data!
Implemented in Python PRML Chapter 1 Polynomial Curve Fitting
Data analysis python
Collaborative filtering with principal component analysis and K-means clustering
Implemented in Python PRML Chapter 4 Classification by Perceptron Algorithm
Merge sort implementation / complexity analysis and experiment in Python
A python implementation of the Bayesian linear regression class
Mathematical understanding of principal component analysis from the beginning
Clustering and principal component analysis by K-means method (beginner)
Principal component analysis Analyze handwritten numbers using PCA. Part 2
Principal component analysis Analyze handwritten numbers using PCA. Part 1
Variational Bayesian inference implementation of topic model in python
[Technical book] Introduction to data analysis using Python -1 Chapter Introduction-