As a result of collecting various observation data, there may be a desire to apply the probability distribution behind the observation data to the model and estimate it. It's convenient for calculating various statistics and simulation calculations. There are normal distribution, exponential distribution, binomial distribution, beta distribution, etc. as models of probability distribution, but this time, while implementing MCMC method (Markov Chain Monte Carlo method) "simplely" from the principle for gamma distribution. , I would like to try how much I can fit. In addition, WinBUGS and PyMC3 are excellent tools for executing MCMC. There are //docs.pymc.io/) etc., so you don't have to make it yourself **. However, there are many things that you can see by making it yourself, and I think there is a merit in promoting understanding.
What is Gamma Distribution? The distribution is characterized by two parameters: the shape parameter $ k $ and the scale parameter $ q $.
P(x) = \frac{1}{\Gamma(k)q^k}x^{k-1}e^{-x/q},\ x \geq 0 \ (k>0,\ q>0)
It is a distribution that takes non-negative real numbers, and is an excellent guy who can express exponential distribution, chi-square distribution, or Erlang distribution. Especially, I like the fact that it can express a distorted distribution with a wide hem. There are many observational data in the world that are represented by non-negative values such as the number, length, weight, time interval, population, amount of money, etc., and it seems to be useful for expressing the distribution of such variables. In fact, it seems to be used in engineering and economics, such as waiting time distribution and income distribution.
The MCMC method (Markov Chain Monte Carlo method) is a method that efficiently generates a sample of a certain probability distribution and estimates the nature of the probability distribution from the sample. The Markov chain is responsible for the generation of the sample according to the probability distribution, and Monte Carlo is responsible for the estimation from the sample. For details, the Video of Professor Yukito Iba of the Institute of Statistical Mathematics is very easy to understand. In this article, we will use MCMC to estimate the gamma distribution, so we define the following terms:
What I want to do here is to collect a lot of data $ D_n $, and if the probability distribution $ P (\ theta | D_n) $ of the parameter $ \ theta $ can be calculated well, then the parameter $ \ theta $ of the assumed model ( You can calculate the expected value (based on posterior probabilities), right? The probability distribution $ P (\ theta | D_n) $ is expressed by the following equation from Bayes' theorem (or from the normalization condition of the probability distribution).
P(\theta|D_n) = P(D_n | \theta)P(\theta) / \int_{\theta} d\theta P(D_n | \theta)P(\theta)
Here, the integral should be taken for all $ \ theta $ with a non-zero probability of producing the data $ D_n $. One of the simple ways to estimate this distribution is grid calculation. In other words, if $ \ theta $ is evenly distributed on an appropriate grid and represented by the discrete value $ \ theta_i (i = 1,2, ..., I) $,
P(\theta_i|D_n) \simeq P(D_n | \theta_i)P(\theta_i) / \sum_{j \in I} P(D_n | \theta_j)P(\theta_j)
You can approximate the integral of the denominator by the sum, as in. It looks like this when I write it in a diagram.
However, in order to improve the estimation accuracy of $ \ theta $, the number of discrete values $ I $ must be large enough. If you simply try to represent a $ N $ dimension parameter with $ M $ meshes, you need $ I = M ^ N $ discrete values, which will explode as the dimension goes up.
So, ** MCMC changes the way of thinking and does not fix the discrete value $ \ theta_i $, but move it steadily **. Move $ k (k = 1,2, ...) $, $ k $ th $ \ theta_i (i = 1, .., I) $ to $ \ theta_i (k) $, $ \ theta_i ( Let $ P_k (\ theta) $ be the probability distribution that k) $ follows. Also, let $ \ pi (x \ rightarrow x') $ move from $ \ theta_i (k) $ to $ \ theta_i (k + 1) $. Then it can be illustrated as follows.
I designed this way of moving $ \ pi (x \ rightarrow x') $ well, and after moving it many times,
P_k(\theta) \rightarrow P(\theta | D_n) \ ({\rm as \ } k \rightarrow \infty)
If is true, you can achieve your goal. Next, let's look at the Metropolis algorithm as one of the ways to move it.
The Metropolis algorithm is a sampling algorithm from a probability distribution. In the figure above, we will increase the number of characters.
In other words, let's say that $ \ theta_i $ is generated by synthesizing distributions that follow and do not follow the model. At this time, let's assume that the movement by the map $ \ pi (x \ rightarrow x') $ is restricted as shown in the figure below.
In other words, the restriction is that $ \ theta_i $ that followed ** $ P_ * (\ theta) $ will always move to $ P_ * (\ theta) $ ** ((1) in the figure). Furthermore, let's assume that $ \ theta_i $ that followed ** $ P_ {o_k} (\ theta) $ can always be moved to $ P_ * (\ theta) $ ** (② in the figure). Then, if ② exists moderately, from the limitation of ①, it can be seen that as $ k $ increases, it becomes $ P_k (\ theta) \ rightarrow P_ * (\ theta) $. Strictly speaking, (1) is called stationary, (2) is called irreducible, and it seems that the convergence of the probability distribution from these two conditions is called ergodicity. (For details, see Explainer video of Professor Yukito Iba).
One of the mappings $ \ pi (x \ rightarrow x') $ that satisfy these properties is the Metropolis algorithm, which is represented by the following flow.
As long as the probability distribution of 1 is symmetric, a normal distribution can be used, for example.
Q(x,x') = \frac{1}{\sqrt{2\pi \sigma^2}}exp\left(-\frac{(x-x')^2}{2\sigma^2}\right)
By the way, the algorithm extended to the non-symmetrical case ($ Q (x, x') \ neq Q (x', x) $) is called the Metropolis-Hastings algorithm. Looking at step 3,
Now, let's estimate the gamma distribution while implementing MCMC in Python.
Set the library.
import numpy as np
import matplotlib.pyplot as plt
from math import gamma
Defines a gamma function, a histogram generation, and a function that displays the shape of the probability distribution.
def gamma_pdf(x, k, q):
p = 1./(gamma(k) * pow(q, k)) * pow(x, k-1) * np.exp( -x/q )
return p
def genPdfHist(f, xmin, xmax, N):
x = np.linspace(xmin,xmax, N)
p = [f(v) for v in x]
return x,p
def showPdf(f, xmin, xmax, N, yscale=1.0):
x, p = genPdfHist(f, xmin, xmax, N)
y = [e*yscale for e in p]
plt.plot(x,y)
plt.show()
Next, let's create observation data using the gamma distribution. Define a function to sample for creating observational data.
def PlainSampler(pdf, xmin, xmax, N):
n = 0
xs = []
while n < N:
x = xmin + np.random.rand()*(xmax-xmin)
p = pdf(x)
if np.random.rand() < p:
# accept
xs.append(x)
n = n + 1
return xs
Use the above function to generate the observation data. (Meaning experimental observation data) First, let's generate the parameters of the correct model as $ (k ^ *, q ^ *) = (2,4) $.
k_true = 2.0
q_true = 4.0
gp1 = lambda x : gamma_pdf(x, k_true, q_true)
smp1 = PlainSampler(gp1, 0, 50, 1000)
h1 = plt.hist(smp1, bins=50)
showPdf(gp1, 0, 50, 100, h1[0].sum()/(h1[1][1]-h1[1][0]))
plt.show()
The row of the showPdf function multiplies the adjustment factor to align the histogram with the Y-axis of the probability distribution. The data smp1 is the observed value, and its histogram looks like this. Of course, since it is generated with random numbers, the shape of the histogram changes each time it is executed.
We will define the functions used to calculate MCMC.
First, posterior probability distribution
def pdf_gamma_sample(x, sample):
k = x[0]
q = x[1]
#Prevent P from becoming too small
sample1 = np.random.choice(sample, 50)
p = [gamma_pdf(s, k, q) for s in sample1]
return np.prod(p)
A function that gives the prior distribution (initial distribution) of the parameter $ \ theta = (k, q) $. Since there is no prior information, let's generate it from a uniform random number. However, I will give only the maximum value.
def genInit( cdim, xdim, xmax):
x = np.random.rand(cdim, xdim)*xmax
return x
Next is the core part. Map $ \ pi (x \ rightarrow x') $ by Metropolis algorithm and a function that executes MCMC.
def MHmap(pdf, x, xdim, sig):
#Gibbs sampling (change only one variable)
xnew = x + np.eye(1, xdim, np.random.randint(xdim))[0,:] * np.random.normal(loc=0, scale=sig)
#Limit the range of x
xnew = np.maximum(xnew, 0.01)
#Selected by Metropolis criteria
r = np.random.rand()
if r < pdf(xnew)/pdf(x):
x = xnew
return x
def MCMC_gamma_01(sample, x0, sig, N):
#
pdf1 = lambda x: pdf_gamma_sample(x, sample)
#
x = x0
xs = []
for i in range(N):
#Map point clouds according to Metropolis standards
x = [MHmap(pdf1, e, 2, sig) for e in x]
#Random resampling
s = np.random.randint(0,len(x),(len(x)))
x = [x[j] for j in s]
#
xs.append(x)
return np.array(xs)
This is the point when coding.
** I'm not sure if this implementation is optimal **, but I can imagine that the actual libraries (PyMC3, WinBUGS, etc.) would have been devised in various ways. This is where the gap between theory and implementation becomes visible.
Now, let's MCMC with the function created in 3 to estimate the parameter $ \ theta = (k, q) $ of the gamma distribution model from the observation data smp1 created in 2. Set the point cloud size to 500, the parameter dimension to 2 dimensions, and the maximum value to 10. Also, let the standard deviation of the transition function $ Q (x, x') $ be 3. Calculate MCMC 100 times.
p0 = genInit( 500, 2, 10)
sig = 3
xs = MCMC_gamma_01(smp1, p0, sig, 100)
After a few minutes, the result will come out, so let's make a graph.
fig = plt.figure()
ax1 = fig.add_subplot(1,2,1)
ax2 = fig.add_subplot(1,2,2)
ax1.hist(xs[-1,:,0], bins=50)
ax2.hist(xs[-1,:,1], bins=50)
ax1.set_xlabel("k")
ax2.set_xlabel("q")
k_est = xs[-1,:,0].mean()
q_est = xs[-1,:,1].mean()
print("k*:{}, q*:{}".format(k_est, q_est))
plt.show()
The result is shown below. It's not a very clean histogram, but when calculated by the average value, it is $ (\ bar {k}, \ bar {q}) = (2.21, 4.46) $.
A comparison of the original generated distribution and the distribution using $ (\ bar {k}, \ bar {q}) = (2.21, 4.46) $ is shown in the figure below. It's a little off, but it seems that the long-tailed distribution can be adjusted to some extent.
If you try changing the parameter $ \ theta = (k, q) $ in various ways, it seems to fit. The impression is that it will converge after repeating it about 100 times.
From the above, the following trends were seen from the results of estimating the gamma distribution by simply implementing MCMC.
I referred to the following page.
Recommended Posts