PRML Chapter 9 Mixed Gaussian Distribution Python Implementation

Chapter 9 of PRML introduces the EM algorithm. The EM algorithm itself is a technique that can be used in various places, and I myself have combined the EM algorithm with a classifier to perform more noise-resistant classification. However, since we have never clustered with a mixed Gaussian distribution, which is the most famous application example of applying the EM algorithm, we implemented ** maximum likelihood estimation of the mixed Gaussian distribution with the EM algorithm ** this time.

Maximum likelihood estimation of mixed Gaussian distribution

blobs.png For example, a normal Gaussian distribution of data points such as the blue dots in the figure above.

p({\bf x}) = \mathcal{N}({\bf x}|{\bf\mu},{\bf\Sigma})

When fitting with, gauss_fitting.png This is why the single-peak Gaussian distribution is not a good model in this case. Focusing on the fact that the data points are divided into three groups, in this case, a mixed Gaussian distribution using three Gaussian distributions.

p({\bf x}) = \sum_{k=1}^3 \pi_k\mathcal{N}({\bf x}|{\bf\mu}_k,{\bf\Sigma}_k)

Is considered to be a suitable model. However, let $ \ sum_k \ pi_k = 1 $. $ {\ Bf \ mu} \ _1 $ is the top, $ {\ bf \ mu} \ _2 $ is the bottom right, and $ {\ bf \ mu} \ _3 $ is the average of the chunks of data points at the bottom left. I will. Therefore, ** fitting each block with a Gaussian distribution is fine, but it is obvious to us humans which data points belong to which block, but machines do not know that.

Which data points belong to which of the K chunks, with the coordinates $ \ {{\ bf x} \ _ n \} \ _ {n = 1} ^ N $ of N data points as observation variables The latent variable $ \ {{\ bf z} \ _n \} \ _ {n = 1} ^ N $ and the parameter $ \ {{\ bf \ pi} \ _k , {\ bf \ mu} \ _ k, {\ bf \ Sigma} \ _ k \} \ _ {k = 1} ^ K $ are estimated at the same time. When there are latent variables like this, it is common to use the EM algorithm.

EM algorithm

The procedure for maximum likelihood estimation of the mixed Gaussian distribution by the EM algorithm (PRML equations (9.23) to (9.27)) is summarized in Section 9.2.2 of PRML, so it is omitted here.

code

Library

Use Numpy and matplotlib as usual.

import matplotlib.pyplot as plt
import numpy as np

Mixed Gaussian distribution

class GaussianMixture(object):

    def __init__(self, n_component):
        #Number of Gaussian distributions
        self.n_component = n_component

    #Maximum likelihood estimation using EM algorithm
    def fit(self, X, iter_max=10):
        #Data dimension
        self.ndim = np.size(X, 1)
        #Initialization of mixing coefficient
        self.weights = np.ones(self.n_component) / self.n_component
        #Average initialization
        self.means = np.random.uniform(X.min(), X.max(), (self.ndim, self.n_component))
        #Initialization of covariance matrix
        self.covs = np.repeat(10 * np.eye(self.ndim), self.n_component).reshape(self.ndim, self.ndim, self.n_component)

        #Repeat E step and M step
        for i in xrange(iter_max):
            params = np.hstack((self.weights.ravel(), self.means.ravel(), self.covs.ravel()))
            #E step, calculate burden rate
            resps = self.expectation(X)
            #M step, update parameters
            self.maximization(X, resps)
            #Check if the parameters have converged
            if np.allclose(params, np.hstack((self.weights.ravel(), self.means.ravel(), self.covs.ravel()))):
                break
        else:
            print("parameters may not have converged")

    #Gaussian function
    def gauss(self, X):
        precisions = np.linalg.inv(self.covs.T).T
        diffs = X[:, :, None] - self.means
        assert diffs.shape == (len(X), self.ndim, self.n_component)
        exponents = np.sum(np.einsum('nik,ijk->njk', diffs, precisions) * diffs, axis=1)
        assert exponents.shape == (len(X), self.n_component)
        return np.exp(-0.5 * exponents) / np.sqrt(np.linalg.det(self.covs.T).T * (2 * np.pi) ** self.ndim)

    #E step
    def expectation(self, X):
        #PRML formula(9.23)
        resps = self.weights * self.gauss(X)
        resps /= resps.sum(axis=-1, keepdims=True)
        return resps

    #M step
    def maximization(self, X, resps):
        #PRML formula(9.27)
        Nk = np.sum(resps, axis=0)

        #PRML formula(9.26)
        self.weights = Nk / len(X)

        #PRML formula(9.24)
        self.means = X.T.dot(resps) / Nk

        diffs = X[:, :, None] - self.means
        #PRML formula(9.25)
        self.covs = np.einsum('nik,njk->ijk', diffs, diffs * np.expand_dims(resps, 1)) / Nk

    #Probability distribution p(x)Calculate
    def predict_proba(self, X):
        #PRML formula(9.7)
        gauss = self.weights * self.gauss(X)
        return np.sum(gauss, axis=-1)

    #Clustering
    def classify(self, X):
        joint_prob = self.weights * self.gauss(X)
        return np.argmax(joint_prob, axis=1)

Whole code

gaussian_mixture.py


import matplotlib.pyplot as plt
import numpy as np


class GaussianMixture(object):

    def __init__(self, n_component):
        self.n_component = n_component

    def fit(self, X, iter_max=10):
        self.ndim = np.size(X, 1)
        self.weights = np.ones(self.n_component) / self.n_component
        self.means = np.random.uniform(X.min(), X.max(), (self.ndim, self.n_component))
        self.covs = np.repeat(10 * np.eye(self.ndim), self.n_component).reshape(self.ndim, self.ndim, self.n_component)
        for i in xrange(iter_max):
            params = np.hstack((self.weights.ravel(), self.means.ravel(), self.covs.ravel()))
            resps = self.expectation(X)
            self.maximization(X, resps)
            if np.allclose(params, np.hstack((self.weights.ravel(), self.means.ravel(), self.covs.ravel()))):
                break
        else:
            print("parameters may not have converged")

    def gauss(self, X):
        precisions = np.linalg.inv(self.covs.T).T
        diffs = X[:, :, None] - self.means
        assert diffs.shape == (len(X), self.ndim, self.n_component)
        exponents = np.sum(np.einsum('nik,ijk->njk', diffs, precisions) * diffs, axis=1)
        assert exponents.shape == (len(X), self.n_component)
        return np.exp(-0.5 * exponents) / np.sqrt(np.linalg.det(self.covs.T).T * (2 * np.pi) ** self.ndim)

    def expectation(self, X):
        resps = self.weights * self.gauss(X)
        resps /= resps.sum(axis=-1, keepdims=True)
        return resps

    def maximization(self, X, resps):
        Nk = np.sum(resps, axis=0)
        self.weights = Nk / len(X)
        self.means = X.T.dot(resps) / Nk
        diffs = X[:, :, None] - self.means
        self.covs = np.einsum('nik,njk->ijk', diffs, diffs * np.expand_dims(resps, 1)) / Nk

    def predict_proba(self, X):
        gauss = self.weights * self.gauss(X)
        return np.sum(gauss, axis=-1)

    def classify(self, X):
        joint_prob = self.weights * self.gauss(X)
        return np.argmax(joint_prob, axis=1)


def create_toy_data():
    x1 = np.random.normal(size=(100, 2))
    x1 += np.array([-5, -5])
    x2 = np.random.normal(size=(100, 2))
    x2 += np.array([5, -5])
    x3 = np.random.normal(size=(100, 2))
    x3 += np.array([0, 5])
    return np.vstack((x1, x2, x3))


def main():
    X = create_toy_data()

    model = GaussianMixture(3)
    model.fit(X, iter_max=100)
    labels = model.classify(X)

    x_test, y_test = np.meshgrid(np.linspace(-10, 10, 100), np.linspace(-10, 10, 100))
    X_test = np.array([x_test, y_test]).reshape(2, -1).transpose()
    probs = model.predict_proba(X_test)
    Probs = probs.reshape(100, 100)
    colors = ["red", "blue", "green"]
    plt.scatter(X[:, 0], X[:, 1], c=[colors[int(label)] for label in labels])
    plt.contour(x_test, y_test, Probs)
    plt.xlim(-10, 10)
    plt.ylim(-10, 10)
    plt.show()


if __name__ == '__main__':
    main()

result

Maximum likelihood estimation of the parameters of the Gaussian mixture distribution using points as training data, and the probability distribution is illustrated by contour lines. Also, the color of the dots indicates which cluster it belongs to. result.png This is the result of success, but ** sometimes fails **. As described in PRML, maximizing the log-likelihood function this time is a well-posed problem and may not be a good solution. Some heuristics work around it, but this time it fails because it doesn't implement a workaround. However, it should not fail very much.

At the end

Unsupervised clustering was performed by fitting with a mixed Gaussian distribution. The number of Gaussian distributions used at that time is specified here. In the next Chapter 10, we will introduce a method to automatically estimate the number of elements of an appropriate mixed Gaussian distribution, so next time we will implement the variational Bayes used there.

Recommended Posts

PRML Chapter 9 Mixed Gaussian Distribution Python Implementation
PRML Chapter 10 Variational Gaussian Distribution Python Implementation
PRML Chapter 5 Mixed Density Network Python Implementation
PRML Chapter 14 Conditional Mixed Model Python Implementation
PRML Chapter 6 Gaussian Process Regression Python Implementation
PRML Chapter 2 Student's t Distribution Python Implementation
Python implementation mixed Bernoulli distribution
Mixed normal distribution implementation in python
PRML Chapter 5 Neural Network Python Implementation
PRML Chapter 3 Evidence Approximation Python Implementation
PRML Chapter 8 Product Sum Algorithm Python Implementation
PRML Chapter 4 Bayesian Logistic Regression Python Implementation
PRML Chapter 1 Bayesian Curve Fitting Python Implementation
PRML Chapter 11 Markov Chain Monte Carlo Python Implementation
PRML Chapter 12 Bayesian Principal Component Analysis Python Implementation
[Python] Implementation of clustering using a mixed Gaussian model
Mixed Gaussian distribution and logsumexp
EM of mixed Gaussian distribution
PRML Chapter 7 Related Vector Machine Python Implementation for Regression Problems
Explanation and implementation of PRML Chapter 4
PRML Chapter 2 Probability Distribution Nonparametric Method
PRML Chapter 13 Maximum Likelihood Estimating Python Implementation of Hidden Markov Models
Variational Bayesian estimation of mixed Gaussian distribution
PRML implementation Chapter 3 Linear basis function model
Estimating mixed Gaussian distribution by EM algorithm
Implemented in Python PRML Chapter 7 Nonlinear SVM
Implemented in Python PRML Chapter 5 Neural Networks
Implemented in Python PRML Chapter 1 Bayesian Inference
Implemented in Python PRML Chapter 3 Bayesian Linear Regression
[Python] Clustering with an infinitely mixed Gaussian model
EM algorithm calculation for mixed Gaussian distribution problem
Implemented in Python PRML Chapter 1 Polynomial Curve Fitting
Implemented in Python PRML Chapter 4 Classification by Perceptron Algorithm
Logistic distribution in Python
RNN implementation in python
ValueObject implementation in Python
[Python] Chapter 01-01 About Python (First Python)
SVM implementation in python