PRML Chapter 14 Conditional Mixed Model Python Implementation

Since one model cannot capture the characteristics of data, a mixed model is used when you want to combine multiple models. This time, we will implement a mixture of linear regression models.

Mixing linear regression models

Probabilistically representing a normal linear regression model with input $ x $ and output $ t $,

p(t|x,{\bf w},\beta) = \mathcal{N}(t|{\bf w}^\top\phi(x),\beta^{-1})

It looks like. However, $ \ phi (\ cdot) $ is the feature vector, $ {\ bf w} $ is the weighting factor, and $ \ beta $ is the precision parameter. In the mixed linear regression model, we introduce the mixing coefficient $ \ pi_k $ (where $ \ sum_k \ pi_k = 1 $) and add up the K linear regression models to express as follows.

p(t|x,{\bf W},\beta,\pi) = \sum_{k=1}^K\pi_k\mathcal{N}(t|{\bf w}_k^\top\phi(x),\beta^{-1})

As usual, the parameter $ {\ bf W when the training data $ \ {x_n, t_n \} _ {n = 1} ^ N $ (hereinafter referred to as $ {\ bf t, x} $) is given. }, \ pi, \ beta $ is the maximum likelihood estimate. The log-likelihood function at that time is

\ln p({\bf t}|{\bf x},{\bf W},\pi,\beta) = \sum_{n=1}^N\ln\left(\sum_{k=1}^K\pi_k\mathcal{N}(t_n|{\bf w}_k^\top\phi(x_n),\beta^{-1})\right)

It will be. Now we introduce a latent variable $ z_ {nk} $ that represents from which model the individual data was generated. This latent variable takes a value of 0 or 1, and if $ z_ {nk} = 1 $, it means that the nth data point is generated from the kth model, $ 1, \ cdots, k-1. , K + 1, \ cdots, K $ are 0. Now you can also write down the log-likelihood function for the parameters given the complete data.

In this way, the EM algorithm can be used for maximum likelihood estimation of the mixed linear regression model. In step E, find $ \ mathbb {E} [z_ {nk}] $ for each data point and model, and in step M, after the fact about the latent variable $ z $ of the log-likelihood function when complete data is given. Maximizes the expected value of the distribution for the parameters.

code

import This time, in addition to the usual numpy and matplotlib, we will also import the standard python library, itertools. (I don't really need it)

import itertools
import matplotlib.pyplot as plt
import numpy as np

Design matrix

Create a design matrix in which the row vectors represent each feature vector.

class PolynomialFeatures(object):

    def __init__(self, degree=2):
        assert isinstance(degree, int)
        self.degree = degree

    def transform(self, x):
        if x.ndim == 1:
            x = x[:, None]
        x_t = x.transpose()
        features = [np.ones(len(x))]
        for degree in range(1, self.degree + 1):
            for items in itertools.combinations_with_replacement(x_t, degree):
                features.append(reduce(lambda x, y: x * y, items))
        return np.asarray(features).transpose()

Conditional mixed model

This is the code for doing the mixed linear regression model, which is the main part.

class ConditionalMixtureModel(object):

    def __init__(self, n_models=2):
        #Specify the number of models
        self.n_models = n_models

    #Method for maximum likelihood estimation
    def fit(self, X, t, n_iter=100):
        #Initialization of the parameters you want to estimate (weighting factor, mixing factor, variance)
        coef = np.linalg.pinv(X).dot(t)
        self.coef = np.vstack((
            coef + np.random.normal(size=len(coef)),
            coef + np.random.normal(size=len(coef)))).T
        self.var = np.mean(np.square(X.dot(coef) - t))
        self.weight = np.ones(self.n_models) / self.n_models

        #Repeat the EM step a specified number of times
        for i in xrange(n_iter):
            #E step burden rate E[z_nk]Is calculated for each data and each model
            resp = self._expectation(X, t)

            #Maximize for M-step parameters
            self._maximization(X, t, resp)

    #Gaussian function
    def _gauss(self, x, y):
        return np.exp(-(x - y) ** 2 / (2 * self.var))

    #E step burden rate gamma_nk=E[z_nk]Calculate
    def _expectation(self, X, t):
        #PRML formula(14.37)
        responsibility = self.weight * self._gauss(X.dot(self.coef), t[:, None])
        responsibility /= np.sum(responsibility, axis=1, keepdims=True)
        return responsibility

    #Maximize for M-step parameters
    def _maximization(self, X, t, resp):
        #PRML formula(14.38)Calculation of mixing coefficient
        self.weight = np.mean(resp, axis=0)

        for m in xrange(self.n_models):
            #PRML formula(14.42)Calculate the coefficients for each model
            self.coef[:, m] = np.linalg.inv((X.T * resp[:, m]).dot(X)).dot(X.T * resp[:, m]).dot(t)

        #PRML formula(14.44)Variance calculation
        self.var = np.mean(
            np.sum(resp * np.square(t[:, None] - X.dot(self.coef)), axis=1))

    def predict(self, X):
        return X.dot(self.coef)

Training data

y=\|x\|Prepare the training data according to.x\ge0Whenx\le0You can fit with one linear regression model in each range of, but it is better to prepare two models to fit in both ranges.

def create_toy_data(sample_size=20, std=0.1):
    x = np.linspace(-1, 1, sample_size)
    y = (x > 0.).astype(np.float) * 2 - 1
    y = x * y
    y += np.random.normal(scale=std, size=sample_size)
    return x, y

Main function

def main():
    #Creation of training data
    x_train, y_train = create_toy_data()

    #Definition of feature vector (order 1)
    feature = PolynomialFeatures(degree=1)

    #Conversion to design matrix
    X_train = feature.transform(x_train)

    #Preparation of conditional mixed model (2 models)
    model = ConditionalMixtureModel(n_models=2)

    #Maximum likelihood estimation of parameters
    model.fit(X_train, y_train)

    #Illustrated results
    x = np.linspace(-1, 1, 100)
    X = feature.transform(x)
    Y = model.predict(X)
    plt.scatter(x_train, y_train, s=50, facecolor="none", edgecolor="g")
    plt.plot(x, Y[:, 0], c="b")
    plt.plot(x, Y[:, 1], c='r')
    plt.show()

Whole code

import itertools
import matplotlib.pyplot as plt
import numpy as np


class PolynomialFeatures(object):

    def __init__(self, degree=2):
        assert isinstance(degree, int)
        self.degree = degree

    def transform(self, x):
        if x.ndim == 1:
            x = x[:, None]
        x_t = x.transpose()
        features = [np.ones(len(x))]
        for degree in range(1, self.degree + 1):
            for items in itertools.combinations_with_replacement(x_t, degree):
                features.append(reduce(lambda x, y: x * y, items))
        return np.asarray(features).transpose()


class ConditionalMixtureModel(object):

    def __init__(self, n_models=2):
        self.n_models = n_models

    def fit(self, X, t, n_iter=100):
        coef = np.linalg.pinv(X).dot(t)
        self.coef = np.vstack((
            coef + np.random.normal(size=len(coef)),
            coef + np.random.normal(size=len(coef)))).T
        self.var = np.mean(np.square(X.dot(coef) - t))
        self.weight = np.ones(self.n_models) / self.n_models

        for i in xrange(n_iter):
            resp = self._expectation(X, t)
            self._maximization(X, t, resp)

    def _gauss(self, x, y):
        return np.exp(-(x - y) ** 2 / (2 * self.var))

    def _expectation(self, X, t):
        responsibility = self.weight * self._gauss(X.dot(self.coef), t[:, None])
        responsibility /= np.sum(responsibility, axis=1, keepdims=True)
        return responsibility

    def _maximization(self, X, t, resp):
        self.weight = np.mean(resp, axis=0)
        for m in xrange(self.n_models):
            self.coef[:, m] = np.linalg.inv((X.T * resp[:, m]).dot(X)).dot(X.T * resp[:, m]).dot(t)
        self.var = np.mean(
            np.sum(resp * np.square(t[:, None] - X.dot(self.coef)), axis=1))

    def predict(self, X):
        return X.dot(self.coef)


def create_toy_data(sample_size=20, std=0.1):
    x = np.linspace(-1, 1, sample_size)
    y = (x > 0.).astype(np.float) * 2 - 1
    y = x * y
    y += np.random.normal(scale=std, size=sample_size)
    return x, y


def main():
    x_train, y_train = create_toy_data()
    feature = PolynomialFeatures(degree=1)
    X_train = feature.transform(x_train)
    model = ConditionalMixtureModel(n_models=2)
    model.fit(X_train, y_train)
    x = np.linspace(-1, 1, 100)
    X = feature.transform(x)
    Y = model.predict(X)
    plt.scatter(x_train, y_train, s=50, facecolor="none", edgecolor="g")
    plt.plot(x, Y[:, 0], c="b")
    plt.plot(x, Y[:, 1], c='r')
    plt.show()


if __name__ == '__main__':
    main()

result

Non-linear functions by combining multiple linear modelsy=\|x\|The fitting to the data according to is completed. result.png

At the end

When the conditional mixed model implemented this time is used for prediction, the line extends even where there are no data points, as shown in the above result. The mixed expert model, which is a further development of the conditional mixed model, seems to solve this problem by making the mixing coefficient $ \ pi $ a function of the input $ x $.

Recommended Posts

PRML Chapter 14 Conditional Mixed Model Python Implementation
PRML Chapter 5 Mixed Density Network Python Implementation
PRML Chapter 9 Mixed Gaussian Distribution Python Implementation
PRML Chapter 5 Neural Network Python Implementation
PRML Chapter 3 Evidence Approximation Python Implementation
PRML Chapter 8 Product Sum Algorithm Python Implementation
PRML implementation Chapter 3 Linear basis function model
PRML Chapter 10 Variational Gaussian Distribution Python Implementation
PRML Chapter 6 Gaussian Process Regression Python Implementation
PRML Chapter 2 Student's t Distribution Python Implementation
PRML Chapter 1 Bayesian Curve Fitting Python Implementation
PRML Chapter 11 Markov Chain Monte Carlo Python Implementation
PRML Chapter 12 Bayesian Principal Component Analysis Python Implementation
[Python] Implementation of clustering using a mixed Gaussian model
Python implementation mixed Bernoulli distribution
PRML Chapter 7 Related Vector Machine Python Implementation for Regression Problems
Explanation and implementation of PRML Chapter 4
Mixed normal distribution implementation in python
[Python] Mixed Gauss model with Pyro
PRML Chapter 13 Maximum Likelihood Estimating Python Implementation of Hidden Markov Models
PRML: Chapter 8 Graphical Model Image Noise Removal
Implemented in Python PRML Chapter 7 Nonlinear SVM
Implemented in Python PRML Chapter 5 Neural Networks
Implemented in Python PRML Chapter 1 Bayesian Inference
Python implementation of continuous hidden Markov model
Implemented in Python PRML Chapter 3 Bayesian Linear Regression
[Python] Clustering with an infinitely mixed Gaussian model
Implemented in Python PRML Chapter 1 Polynomial Curve Fitting
Maximum likelihood estimation implementation of topic model in python
Implemented in Python PRML Chapter 4 Classification by Perceptron Algorithm
[# 2] Make Minecraft with Python. ~ Model drawing and player implementation ~
Variational Bayesian inference implementation of topic model in python
[Python] Chapter 05-01 Control syntax (comparison operator and conditional branching)
RNN implementation in python
[Python] Chapter 01-01 About Python (First Python)
SVM implementation in python
[Python of Hikari-] Chapter 05-07 Control syntax (conditional branching of comprehension notation)