Thing you want to do

Consider a Gaussian distribution with mean $ \ mu $ and variance $ \ sigma ^ 2 $, expressed as follows.

\mathcal{N}(x\,\lvert\,\mu,\sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{1}{2\sigma^2}(x-\mu)^2\right)

Suppose you observe N data, $ x_1, x_2 \ cdots x_N $. Assuming that each data is generated independently according to the above Gaussian distribution, we estimate the values of the mean $ \ mu $ and variance $ \ sigma ^ 2 $ of the Gaussian distribution from the data.

Then evaluate how the estimate compares to the true value.

Estimating mean and variance

The expected value is represented by $ {\ mathbb E} [\ cdot] $. In other words

\mathbb{E}[x] = \mu\\
\mathbb{E}[x^2] = \mu^2 + \sigma^2\\
\mathbb{E}[(x-\mu)^2] = \sigma^2

Also, summarize the observation data

{\bf x} = (x_1,x_2\cdots x_N)^T

It is expressed as.

When N data are generated from the Gaussian distribution, the probability that they match the observed data $ {\ bf x} $ is the multiplication of the probability that each data $ x_1 \ sim x_n $ will be generated.

p({\bf x}\,\lvert\,\mu,\sigma^2) = \prod_{n=1}^N\mathcal{N}(x_n\,\lvert\,\mu,\sigma^2)

It will be. This is called "likelihood". I would like to find $ \ mu, \ sigma ^ 2 $ that maximizes this probability "likelihood", but it is difficult to calculate because it contains $ \ exp $ and $ \ prod $.

Therefore, take the logarithm of both sides to make it easier to calculate as follows.

\ln p({\bf x}\,\lvert\,\mu,\sigma^2) = \sum_{n=1}^N(x_n -\mu)^2 - \frac{N}{2}\ln{\sigma^2} - \frac{N}{2}\ln{2\pi}

$ y = \ ln (x) $ is a function that increases y as $ x $ increases, as shown below.

So, maximizing $ p ({\ bf x} , \ lvert , \ mu, \ sigma ^ 2) $ and $ \ ln p ({\ bf x} , \ lvert , \ mu, \ Maximizing sigma ^ 2) $ has the same meaning.

By differentiating $ \ mu, \ sigma ^ 2 $ and setting it to 0, it becomes as follows.

\mu_{ML} = \frac{1}{N}\sum_{n}x_n\\
\sigma^2_{ML} = \frac{1}{N}\sum_{n}(x_n - \mu_{ML})^2

In addition, in order to distinguish the obtained value from the true value, it is set as $ \ mu_ {ML}, \ sigma ^ 2_ {ML} $.

Comparison with true value

Let the true values of mean and variance be $ \ mu, \ sigma ^ 2 $.

When estimating $$ \ mu_ {ML}, \ sigma ^ 2_ {ML} $ using various data $ {\ bf x}, $$ mu_ {ML}, \ sigma ^ 2_ {ML} Let's find out what value $ is likely to take. Specifically, consider the expected value of $ \ mu_ {ML}, \ sigma ^ 2_ {ML} $.

As for the average

\begin{align}
\mathbb{E}[\mu_{ML}] &= \frac{1}{N}\sum_{n}\mathbb{E}[x_n]\\
&=\mu
\end{align}

Will match the true average. On the other hand, regarding variance,

\begin{align}
\mathbb{E}[\sigma^2_{ML}] &= \frac{1}{N}\sum_{n}\mathbb{E}[(x_n - \mu_{ML})^2]\\
&=\frac{1}{N}\sum_{n}\left(\mathbb{E}[x_n^2] - 2\mathbb{E}[x_n\mu_{ML}] + \mathbb{E}[\mu_{ML}^2]\right)
\end{align}

Note that $ \ mu_ {ML} $ is a value calculated based on the data points.

\begin{align}
\frac{1}{N}\sum_{n}\mathbb{E}[x_n\mu_{ML}] &= \frac{1}{N}\sum_{n}\mathbb{E}[x_n\frac{1}{N}\sum_{m}x_m]\\
&=\frac{1}{N^2}\sum_{n}\sum_{m}\mathbb{E}[x_nx_m]\\
&=\frac{1}{N}\left(\sigma^2 + N\mu^2\right)
\end{align}

\begin{align}
\frac{1}{N}\sum_{n}\mathbb{E}[\mu_{ML}^2] &= \frac{1}{N}\sum_{n}\mathbb{E}[\frac{1}{N}\sum_{l}x_l\frac{1}{N}\sum_{m}x_m]\\
&=\frac{1}{N^3}\sum_{n}\sum_{l}\sum_{m}\mathbb{E}[x_lx_m]\\
&=\frac{1}{N}\left(\sigma^2 + N\mu^2\right)
\end{align}

Because it becomes

\mathbb{E}[\sigma^2_{ML}] = \frac{N-1}{N}\sigma^2

It turns out that the variance estimate is likely to be smaller than the true value. This phenomenon is called bias.

Implementation 1

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import numpy as np
import matplotlib.pyplot as plt

#Data for plot
X=np.arange(1,5,0.001)

def gaussian(x,mu,sig2):
    y=np.exp(((x-mu)**2)/(-2*sig2))/np.sqrt(2*np.pi*sig2)
    return y

#True mean / variance
mu=3
sig2=1.0

#Number of data used for one maximum likelihood estimation
N=2

#Number of maximum likelihood estimates
M=10000

#Generate M set data from true distribution with N as one set
x=np.sqrt(sig2)*np.random.randn(N,M)+mu

#Maximum likelihood estimation using N data for M sets
mu_ml=(1/N)*x.sum(0)
sig2_ml=(1/N)*((x-mu_ml)**2).sum(0)

#True distribution
plt.plot(X,gaussian(X,mu,sig2),'r',lw=15,alpha=0.5)

#Distribution obtained by averaging the results of M maximum likelihood estimation
plt.plot(X,gaussian(X,mu_ml.sum(0)/M,sig2_ml.sum(0)/M),'g')

#N for the variance of the maximum likelihood estimate/N-1x to remove bias
plt.plot(X,gaussian(X,mu_ml.sum(0)/M,sig2_ml.sum(0)/M * N/(N-1)),'b')

Execution result 1

You can see that the variance of the maximum likelihood estimation value (green line) multiplied by $ \ frac {N} {N-1} $ (blue line) and the true distribution (thick red line) match.

Implementation 2

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import numpy as np
import matplotlib.pyplot as plt

#Data for plot
plotS = 1000
X = np.linspace(-1,1,plotS)

#Gaussian distribution probability density function
def gaussDist(x,mu,sig2):
    ret = np.exp(-(x-mu)**2/(2*sig2))/np.sqrt(2*np.pi*sig2)
    return ret

#True distribution
mu_r = 0
sig2_r = 0.05
Y_r = gaussDist(X,mu_r,sig2_r)

np.random.seed(10)
for i in range(3):
    plt.subplot(3,1,i+1)
    plt.ylim([0,7])

    #Training data
    N = 2
    x_t = np.random.randn(N) * np.sqrt(sig2_r) + mu_r
    plt.plot(x_t,np.zeros(x_t.shape[0]),'bo')

    #Maximum likelihood estimated distribution
    mu_ML = x_t.sum()/N
    sig2_ML = ((x_t - mu_ML)**2).sum()/N
    Y_ml = gaussDist(X,mu_ML,sig2_ML)
    plt.plot(X,Y_ml,'r')
    
    #True distribution
    plt.plot(X,Y_r,'g')

Execution result 2

It can be seen that the estimated distribution (red) tends to have a smaller variance than the true distribution (green).

[PYTHON] PRML Diagram Figure 1.15 Bias in Maximum Likelihood Estimate of Gaussian Distribution