Python implementation mixed Bernoulli distribution

We will implement the mixed Bernoulli distribution of PRML 9.3.3. As an example of the EM algorithm, the maximum likelihood estimation of the mixed Gaussian distribution by adding several Gaussian distributions is common, but it can also be applied to the maximum likelihood estimation of the mixed Bernoulli distribution by adding the Bernoulli distributions. The Gaussian distribution has two parameters, mean and variance, while the Bernoulli distribution has only one parameter, which is rather easier. This time, we will use the mixed Bernoulli distribution and apply it to MNIST like PRML to cluster each number.

Mixed Bernoulli distribution

The model used this time is based on the multidimensional Bernoulli distribution. It represents the distribution of D-dimensional binary vectors.

{\rm Bern}({\bf x}|{\bf\mu}) = \prod_{i=1}^D \mu_i^{x_i}(1-\mu_i)^{(1-x_i)}

The mixed Bernoulli distribution is obtained by weighting this with the K-dimensional mixing coefficient $ {\ bf \ pi} $ and adding K pieces together. Assuming that the training data is $ {\ bf X} = \ {{\ bf x} \ _1, \ dots, {\ bf x} \ _N \} $

p({\bf X}|{\bf\mu},{\bf\pi}) = \prod_{n=1}^N\left\{\sum_{k=1}^K\pi_k{\rm Bern}({\bf x}_n|{\bf\mu}_k)\right\}

Will be. Now introduce the latent variable $ {\ bf Z} = \ {{\ bf z} \ _1, \ dots, {\ bf z} \ _N \} $ for each piece of data. The K-dimensional binary latent variable vector $ {\ bf z} $ has only one of the K components being 1, and all other components being 0. Given the complete data $ {\ bf X, Z} $, the likelihood function is:

p({\bf X, Z}|{\bf\mu,\pi}) = \prod_{n=1}^N\left\{\prod_{k=1}^K\pi_k^{z_{nk}}{\rm Bern}({\bf x}_n|{\bf\mu}_k)^{z_{nk}}\right\}

code

import If the multidimensional Bernoulli distribution is used as it is, the likelihood is too small and it is inconvenient for the computer, so use logsumexp to use the logarithm.

import numpy as np
from scipy.misc import logsumexp

Mixed Bernoulli distribution

If you are a python2 type person, please replace @ with a function that calculates the inner product of numpy.

#Mixed Bernoulli distribution
class BernoulliMixtureDistribution(object):

    def __init__(self, n_components):
        #Number of clusters
        self.n_components = n_components

    def fit(self, X, iter_max=100):
        self.ndim = np.size(X, 1)

        #Parameter initialization
        self.weights = np.ones(self.n_components) / self.n_components
        self.means = np.random.uniform(0.25, 0.75, size=(self.n_components, self.ndim))
        self.means /= np.sum(self.means, axis=-1, keepdims=True)

        #Repeat EM step
        for i in range(iter_max):
            params = np.hstack((self.weights.ravel(), self.means.ravel()))

            #E step
            stats = self._expectation(X)

            #M step
            self._maximization(X, stats)
            if np.allclose(params, np.hstack((self.weights.ravel(), self.means.ravel()))):
                break
        self.n_iter = i + 1

    #PRML formula(9.52)Logarithm of
    def _log_bernoulli(self, X):
        np.clip(self.means, 1e-10, 1 - 1e-10, out=self.means)
        return np.sum(X[:, None, :] * np.log(self.means) + (1 - X[:, None, :]) * np.log(1 - self.means), axis=-1)

    def _expectation(self, X):
        #PRML formula(9.56)
        log_resps = np.log(self.weights) + self._log_bernoulli(X)
        log_resps -= logsumexp(log_resps, axis=-1)[:, None]
        resps = np.exp(log_resps)
        return resps

    def _maximization(self, X, resps):
        #PRML formula(9.57)
        Nk = np.sum(resps, axis=0)

        #PRML formula(9.60)
        self.weights = Nk / len(X)

        #PRML formula(9.58)
        self.means = (X.T @ resps / Nk).T

result

Like this jupyter notebook 9.3.3 Applying the mixed Bernoulli distribution to the MNIST dataset (200 randomly picked images from 0 to 4 each), the average of the individual Bernoulli distributions is as shown in the figure below. index.png

At the end

Since the learning of the EM algorithm fits into the local solution (although it may not be the local solution in reality), it is not only that each number is clearly reflected as shown above. I felt that it was difficult to learn if there were pairs with similar shapes such as 1 and 7 and 3 and 8.

Recommended Posts

Python implementation mixed Bernoulli distribution
PRML Chapter 5 Mixed Density Network Python Implementation
PRML Chapter 14 Conditional Mixed Model Python Implementation
PRML Chapter 10 Variational Gaussian Distribution Python Implementation
PRML Chapter 2 Student's t Distribution Python Implementation
Logistic distribution in Python
RNN implementation in python
ValueObject implementation in Python
SVM implementation in python
[Python] Implementation of clustering using a mixed Gaussian model
Mixed Gaussian distribution and logsumexp
Write beta distribution in Python
[Line / Python] Beacon implementation memo
Generate U distribution in Python
EM of mixed Gaussian distribution
Python implementation of particle filters
Neural network implementation in python
Maxout description and implementation (Python)
Implementation of quicksort in Python
Python implementation of self-organizing particle filters
PRML Chapter 5 Neural Network Python Implementation
Implementation of life game in Python
Master linear search! ~ Python implementation version ~
PRML Chapter 3 Evidence Approximation Python Implementation
Implementation of desktop notifications using Python
Python implementation of non-recursive Segment Tree
[Python] Mixed Gauss model with Pyro
Implementation of Light CNN (Python Keras)
Implementation of original sorting in Python
Implementation of Dijkstra's algorithm with python
Derivation of multivariate t distribution and implementation of random number generation by python