[PYTHON] Try to estimate the parameters of the gamma distribution while simply implementing MCMC

Introduction

As a result of collecting various observation data, there may be a desire to apply the probability distribution behind the observation data to the model and estimate it. It's convenient for calculating various statistics and simulation calculations. There are normal distribution, exponential distribution, binomial distribution, beta distribution, etc. as models of probability distribution, but this time, while implementing MCMC method (Markov Chain Monte Carlo method) "simplely" from the principle for gamma distribution. , I would like to try how much I can fit. In addition, WinBUGS and PyMC3 are excellent tools for executing MCMC. There are //docs.pymc.io/) etc., so you don't have to make it yourself **. However, there are many things that you can see by making it yourself, and I think there is a merit in promoting understanding.

What is the gamma distribution?

What is Gamma Distribution? The distribution is characterized by two parameters: the shape parameter $ k $ and the scale parameter $ q $.

P(x) = \frac{1}{\Gamma(k)q^k}x^{k-1}e^{-x/q},\ x \geq 0 \ (k>0,\ q>0)

It is a distribution that takes non-negative real numbers, and is an excellent guy who can express exponential distribution, chi-square distribution, or Erlang distribution. Especially, I like the fact that it can express a distorted distribution with a wide hem. There are many observational data in the world that are represented by non-negative values such as the number, length, weight, time interval, population, amount of money, etc., and it seems to be useful for expressing the distribution of such variables. In fact, it seems to be used in engineering and economics, such as waiting time distribution and income distribution.

What is MCMC?

The MCMC method (Markov Chain Monte Carlo method) is a method that efficiently generates a sample of a certain probability distribution and estimates the nature of the probability distribution from the sample. The Markov chain is responsible for the generation of the sample according to the probability distribution, and Monte Carlo is responsible for the estimation from the sample. For details, the Video of Professor Yukito Iba of the Institute of Statistical Mathematics is very easy to understand. In this article, we will use MCMC to estimate the gamma distribution, so we define the following terms:

What I want to do here is to collect a lot of data $ D_n $, and if the probability distribution $ P (\ theta | D_n) $ of the parameter $ \ theta $ can be calculated well, then the parameter $ \ theta $ of the assumed model ( You can calculate the expected value (based on posterior probabilities), right? The probability distribution $ P (\ theta | D_n) $ is expressed by the following equation from Bayes' theorem (or from the normalization condition of the probability distribution).

P(\theta|D_n) = P(D_n | \theta)P(\theta) / \int_{\theta} d\theta P(D_n | \theta)P(\theta)

Here, the integral should be taken for all $ \ theta $ with a non-zero probability of producing the data $ D_n $. One of the simple ways to estimate this distribution is grid calculation. In other words, if $ \ theta $ is evenly distributed on an appropriate grid and represented by the discrete value $ \ theta_i (i = 1,2, ..., I) $,

P(\theta_i|D_n) \simeq P(D_n | \theta_i)P(\theta_i) / \sum_{j \in I}  P(D_n | \theta_j)P(\theta_j)

You can approximate the integral of the denominator by the sum, as in. It looks like this when I write it in a diagram.

P_theta_Dn.jpg

However, in order to improve the estimation accuracy of $ \ theta $, the number of discrete values $ I $ must be large enough. If you simply try to represent a $ N $ dimension parameter with $ M $ meshes, you need $ I = M ^ N $ discrete values, which will explode as the dimension goes up.

So, ** MCMC changes the way of thinking and does not fix the discrete value $ \ theta_i $, but move it steadily **. Move $ k (k = 1,2, ...) $, $ k $ th $ \ theta_i (i = 1, .., I) $ to $ \ theta_i (k) $, $ \ theta_i ( Let $ P_k (\ theta) $ be the probability distribution that k) $ follows. Also, let $ \ pi (x \ rightarrow x') $ move from $ \ theta_i (k) $ to $ \ theta_i (k + 1) $. Then it can be illustrated as follows. P_k_k+1_map1.jpg

I designed this way of moving $ \ pi (x \ rightarrow x') $ well, and after moving it many times,

P_k(\theta) \rightarrow P(\theta | D_n) \  ({\rm as \ } k \rightarrow \infty)

If is true, you can achieve your goal. Next, let's look at the Metropolis algorithm as one of the ways to move it.

Metropolis algorithm

The Metropolis algorithm is a sampling algorithm from a probability distribution. In the figure above, we will increase the number of characters.

In other words, let's say that $ \ theta_i $ is generated by synthesizing distributions that follow and do not follow the model. At this time, let's assume that the movement by the map $ \ pi (x \ rightarrow x') $ is restricted as shown in the figure below.

P_k_k+1_map2.jpg

In other words, the restriction is that $ \ theta_i $ that followed ** $ P_ * (\ theta) $ will always move to $ P_ * (\ theta) $ ** ((1) in the figure). Furthermore, let's assume that $ \ theta_i $ that followed ** $ P_ {o_k} (\ theta) $ can always be moved to $ P_ * (\ theta) $ ** (② in the figure). Then, if ② exists moderately, from the limitation of ①, it can be seen that as $ k $ increases, it becomes $ P_k (\ theta) \ rightarrow P_ * (\ theta) $. Strictly speaking, (1) is called stationary, (2) is called irreducible, and it seems that the convergence of the probability distribution from these two conditions is called ergodicity. (For details, see Explainer video of Professor Yukito Iba).

One of the mappings $ \ pi (x \ rightarrow x') $ that satisfy these properties is the Metropolis algorithm, which is represented by the following flow.

  1. Generate $ x'$ from the symmetric probability distribution $ Q (x, x') (= Q (x', x)) $ with the current point $ x $.
  2. Generate a uniform random number $ 0 \ leq r <1 $.
  3. If $ r <{P_ * (x')} / {P_ * (x)} $, move to $ x \ rightarrow x'. If not, don't move ( x \ rightarrow x $).
  4. Repeat steps 1 to 3.

As long as the probability distribution of 1 is symmetric, a normal distribution can be used, for example.

Q(x,x') = \frac{1}{\sqrt{2\pi \sigma^2}}exp\left(-\frac{(x-x')^2}{2\sigma^2}\right)

By the way, the algorithm extended to the non-symmetrical case ($ Q (x, x') \ neq Q (x', x) $) is called the Metropolis-Hastings algorithm. Looking at step 3,

Try to calculate with Python

Now, let's estimate the gamma distribution while implementing MCMC in Python.

1. Preparation

Set the library.

import numpy as np
import matplotlib.pyplot as plt
from math import gamma

Defines a gamma function, a histogram generation, and a function that displays the shape of the probability distribution.

def gamma_pdf(x, k, q):
    p = 1./(gamma(k) * pow(q, k)) * pow(x, k-1) * np.exp( -x/q )
    return p

def genPdfHist(f, xmin, xmax, N):
    x = np.linspace(xmin,xmax, N)
    p = [f(v) for v in x]
    return x,p

def showPdf(f, xmin, xmax, N, yscale=1.0):
    x, p = genPdfHist(f, xmin, xmax, N)
    y = [e*yscale for e in p]
    plt.plot(x,y)
    plt.show()

2. Prepare observation data

Next, let's create observation data using the gamma distribution. Define a function to sample for creating observational data.

def PlainSampler(pdf, xmin, xmax, N):
    n = 0
    xs = []
    while n < N:
        x = xmin + np.random.rand()*(xmax-xmin)
        p = pdf(x)
        if np.random.rand() < p:
            # accept
            xs.append(x)
            n = n + 1
    return xs

Use the above function to generate the observation data. (Meaning experimental observation data) First, let's generate the parameters of the correct model as $ (k ^ *, q ^ *) = (2,4) $.

k_true = 2.0
q_true = 4.0
gp1 = lambda x : gamma_pdf(x, k_true, q_true)

smp1 = PlainSampler(gp1, 0, 50, 1000)
h1 = plt.hist(smp1, bins=50)
showPdf(gp1, 0, 50, 100, h1[0].sum()/(h1[1][1]-h1[1][0]))
plt.show()

The row of the showPdf function multiplies the adjustment factor to align the histogram with the Y-axis of the probability distribution. The data smp1 is the observed value, and its histogram looks like this. Of course, since it is generated with random numbers, the shape of the histogram changes each time it is executed. pdf_sample_k2.0_q4.0.png

3. Prepare a function for MCMC

We will define the functions used to calculate MCMC. First, posterior probability distributionP(D_n |\theta)Is a function that calculates. Simultaneous probability by selecting 50 sample values from the original sample to prevent the calculated probability value from becoming too small and to ensure data independence.P(D_n |\theta)=P(X_1 | \theta)P(X_2 | \theta) \cdots P(X_n | \theta)Is being calculated.

def pdf_gamma_sample(x, sample):
    k = x[0]
    q = x[1]
    #Prevent P from becoming too small
    sample1 = np.random.choice(sample, 50)
    p = [gamma_pdf(s, k, q) for s in sample1]
    return np.prod(p)

A function that gives the prior distribution (initial distribution) of the parameter $ \ theta = (k, q) $. Since there is no prior information, let's generate it from a uniform random number. However, I will give only the maximum value.

def genInit( cdim, xdim, xmax):
    x = np.random.rand(cdim, xdim)*xmax
    return x

Next is the core part. Map $ \ pi (x \ rightarrow x') $ by Metropolis algorithm and a function that executes MCMC.

def MHmap(pdf, x, xdim, sig):
    #Gibbs sampling (change only one variable)
    xnew = x + np.eye(1, xdim, np.random.randint(xdim))[0,:] * np.random.normal(loc=0, scale=sig)
    #Limit the range of x
    xnew = np.maximum(xnew, 0.01)
    #Selected by Metropolis criteria
    r = np.random.rand()
    if r < pdf(xnew)/pdf(x):
        x = xnew
    return x

def MCMC_gamma_01(sample, x0, sig, N):
    #
    pdf1 = lambda x: pdf_gamma_sample(x, sample)
    #
    x = x0
    xs = []
    for i in range(N):
        #Map point clouds according to Metropolis standards
        x = [MHmap(pdf1, e, 2, sig) for e in x]
        #Random resampling
        s = np.random.randint(0,len(x),(len(x)))
        x = [x[j] for j in s]
        #
        xs.append(x)
    return np.array(xs)   

This is the point when coding.

** I'm not sure if this implementation is optimal **, but I can imagine that the actual libraries (PyMC3, WinBUGS, etc.) would have been devised in various ways. This is where the gap between theory and implementation becomes visible.

4. Try MCMC

Now, let's MCMC with the function created in 3 to estimate the parameter $ \ theta = (k, q) $ of the gamma distribution model from the observation data smp1 created in 2. Set the point cloud size to 500, the parameter dimension to 2 dimensions, and the maximum value to 10. Also, let the standard deviation of the transition function $ Q (x, x') $ be 3. Calculate MCMC 100 times.

p0 = genInit( 500, 2, 10)
sig = 3
xs = MCMC_gamma_01(smp1, p0, sig, 100)

After a few minutes, the result will come out, so let's make a graph.

fig = plt.figure()
ax1 = fig.add_subplot(1,2,1)
ax2 = fig.add_subplot(1,2,2)
ax1.hist(xs[-1,:,0], bins=50)
ax2.hist(xs[-1,:,1], bins=50)
ax1.set_xlabel("k")
ax2.set_xlabel("q")
k_est = xs[-1,:,0].mean()
q_est = xs[-1,:,1].mean()
print("k*:{}, q*:{}".format(k_est, q_est))
plt.show()

The result is shown below. It's not a very clean histogram, but when calculated by the average value, it is $ (\ bar {k}, \ bar {q}) = (2.21, 4.46) $. MCMC_sample_k2.0_q4.0.png

A comparison of the original generated distribution and the distribution using $ (\ bar {k}, \ bar {q}) = (2.21, 4.46) $ is shown in the figure below. It's a little off, but it seems that the long-tailed distribution can be adjusted to some extent. MCMC_match_k2.0_q4.0.png

5. Other results

If you try changing the parameter $ \ theta = (k, q) $ in various ways, it seems to fit. The impression is that it will converge after repeating it about 100 times. MCMC_Gamma_ANIM.gif

Consideration

From the above, the following trends were seen from the results of estimating the gamma distribution by simply implementing MCMC.

Furthermore ...

Reference link

I referred to the following page.

  1. MCMC Lecture (Yukito Iba)
  2. Bayesian statistical modeling with Python
  3. MH (Metropolis-Hastings) method learned from numerical examples and codes
  4. I tried to organize MCMC.
  5. Bayesian inference for change point detection using PyMC3
  6. Gamma distribution
  7. WinBUGS
  8. PyMC3
  9. Gamma distribution

Recommended Posts

Try to estimate the parameters of the gamma distribution while simply implementing MCMC
Try to estimate the number of likes on Twitter
First python ② Try to write code while examining the features of python
Try to simulate the movement of the solar system
Try to solve the problems / problems of "Matrix Programmer" (Chapter 1)
Try to get the contents of Word with Golang
Steps to calculate the likelihood of a normal distribution
Try to get the function list of Python> os package
Try to evaluate the performance of machine learning / regression model
Try to evaluate the performance of machine learning / classification model
Try to improve the accuracy of Twitter like number estimation
Try to solve the problems / problems of "Matrix Programmer" (Chapter 0 Functions)
Try to automate the operation of network devices with Python
Try to model a multimodal distribution using the EM algorithm
Try to extract the features of the sensor data with CNN
I want to manually assign the training parameters of the [Pytorch] model
[Note] Let's try to predict the amount of electricity used! (Part 1)
[Cloudian # 9] Try to display the metadata of the object in Python (boto3)
[Python] Try to graph from the image of Ring Fit [OCR]
Try transcribing the probability mass function of the binomial distribution in Python
Try to solve the N Queens problem with SA of PyQUBO
Try to model the cumulative return of rollovers in futures trading
I summarized how to change the boot parameters of GRUB and GRUB2
Try to predict the triplet of boat race by ranking learning