[PYTHON] Gaussian mixed model EM algorithm [statistical machine learning]

motivation

Task

Problem setting

I want to estimate the maximum likelihood of what is generated from multiple Gaussian distributions!

Random number generation

import numpy as np 
import math
import matplotlib.pyplot as plt

np.random.seed(0)

def data_generate(n):
  return (np.random.randn(n)
  + np.where(np.random.rand(n) > 0.3, 2., -2.))
aa = np.histogram(data_generate(1000), bins =50)
a_bins = aa[1]
a_hist = aa[0]
X1 = []
for i in range(1, len(a_bins)):
    X1.append((a_bins[i-1]+a_bins[i])/2)
plt.bar(X1,a_hist, width=0.05)

image.png

Definition of normal distribution

def nomal_distribution(x , mu, dev):
    c = (2 * math.pi * math.pow(dev, 2)) ** (-1/2)
    a = (-(x-mu)**2)/(2 * (dev**2) )
    return c * math.exp(a)

EM algorithm

Maximize Q of E

def ita(xi, mu, dev, m = m):
    s = 0
    for i in range(m):
        s += w[i] * nomal_distribution(xi, mu[i], dev[i])
    n = []
    for l in range(m):
        n.append(w[l] * nomal_distribution(xi, mu[l], dev[l])/s)
    return n #m matrix

def E (x, mu, dev, m = m, data_size = data_size):
    Q = 0
    for i in range(data_size):
        for j in m:
            Q += ita(x[i], mu, dev)[j] * ( math.ln(w[j]) + math.ln(nomal_distribution(xi, mu[j], dev[j])) )
    
    return Q

def M(x, mu, dev, d,  m = m, data_size = data_size):
    n = 0
    a = 0
    aa = 0
    for i in range(data_size):
        n += np.array(ita(x[i], mu, dev))
        a += (np.array(ita(x[i], mu, dev)) * x[i])
        aa += np.array(ita(x[i], mu, dev)) * (( x[i] - mu) ** 2) 

    w1 = (1/data_size)*n
    mu1 = [ a[j]/n[j] for j in range(m) ] 
    dev1 = [math.sqrt( aa[j]/(d*n[j]) ) for j in range(m)]  

    return w1, mu1, dev1

Graphing

def probability_distribution(xi, w1, mu1, dev1):
    p = 0
    pro = []
    for j in range(m):
        p +=  w1[j] * nomal_distribution(xi, mu1[j], dev1[j])
    return p
epoc = 30
m = 3
data_size = 1000
mu = np.random.uniform(-1, 1, m)
w = [0.2,0.2,0.6]
dev = np.random.uniform(-0.2, 0.2, m)
x = data_generate(data_size)
w1, mu1, dev1 =  M(x, mu, dev, 1)
for num in range(epoc):
    w1, mu1, dev1 = M(x, mu1, dev1, 1)

xx = np.linspace(-10, 10, 100)
pp0 = [probability_distribution(xx[i] ,w, mu, dev) for i in range(len(xx))]   
pp = [probability_distribution(xx[i] ,w1, mu1, dev1) for i in range(len(xx))]   
plt.scatter(xx,pp, c='red')
#plt.scatter(xx,pp0, c = 'blue')

image.png

Recommended Posts

Gaussian mixed model EM algorithm [statistical machine learning]
Estimating mixed Gaussian distribution by EM algorithm
EM algorithm calculation for mixed Gaussian distribution problem
(Machine learning) I tried to understand the EM algorithm in a mixed Gaussian distribution carefully with implementation.
Machine learning model considering maintainability
About machine learning mixed matrices
EM of mixed Gaussian distribution
Machine learning algorithm (simple perceptron)
Machine learning algorithm (support vector machine)
Machine learning algorithm (logistic regression)
<Course> Machine Learning Chapter 6: Algorithm 2 (k-means)
Machine learning algorithm (support vector machine application)
Machine learning algorithm (multiple regression analysis)
Machine learning algorithm (simple regression analysis)
Inversely analyze a machine learning model
Machine learning algorithm (gradient descent method)
Machine learning algorithm (generalization of linear regression)
Machine learning
Machine learning algorithm (implementation of multi-class classification)
<Course> Machine Learning Chapter 1: Linear Regression Model
Cross Validation improves machine learning model accuracy
Machine learning algorithm classification and implementation summary
<Course> Machine Learning Chapter 2: Nonlinear Regression Model
Machine learning algorithm (linear regression summary & regularization)
[Python] Clustering with an infinitely mixed Gaussian model
Face image dataset sorting using machine learning model (# 3)
Classify machine learning related information by topic model
Dictionary learning algorithm
[Memo] Machine learning
Machine learning classification
Machine Learning sample
[Python] Implementation of clustering using a mixed Gaussian model
Machine Learning with Caffe -1-Category images using reference model
Attempt to include machine learning model in python package
[Machine learning] Text classification using Transformer model (Attention-based classifier)
xgboost: A valid machine learning model for table data
Talk about improving machine learning algorithm bottlenecks with Cython