I tried to implement Bayesian linear regression by Gibbs sampling in python

Introduction

This article is about Bayesian estimation of a simple linear regression model. Specifically, the Markov chain Monte Carlo method (commonly known as MCMC) was implemented in python using Gibbs sampling. Execution of MCMC in python is famous by a module called pymc, but this time I dared to implement it only with numpy and scipy without using pymc. In writing this article, Mr. Kosumi's ["Bayes Computational Statistics" (Asakura Shoten)](https://www.amazon.co.jp/%E3%83%99%E3%82%A4%E3%82 % BA% E8% A8% 88% E7% AE% 97% E7% B5% B1% E8% A8% 88% E5% AD% A6-% E7% B5% B1% E8% A8% 88% E8% A7% A3% E6% 9E% 90% E3% 82% B9% E3% 82% BF% E3% 83% B3% E3% 83% 80% E3% 83% BC% E3% 83% 89-% E5% 8F% A4 % E6% BE% 84-% E8% 8B% B1% E7% 94% B7 / dp / 4254128568 / ref = sr_1_1? Ie = UTF8 & qid = 1485493617 & sr = 8-1 & keywords =% E3% 83% 99% E3% 82% A4 % E3% 82% BA% E8% A8% 88% E7% AE% 97% E7% B5% B1% E8% A8% 88) was greatly referred to.

theory

Before implementation, let's look back at the theory, though it is simple. The linear regression model you want to estimate

y_i=\boldsymbol{x}_{i}'\boldsymbol{\beta}+u_i, \ \ u_i\sim N(0,\sigma^2) \ \  (i=1,\cdots,n)

($ Y_i $ is an unexplained variable, $ \ boldsymbol {x} $ is an explanatory variable vector of k × 1, $ u_i $ is mean 0, and is independent of each other in a normal distribution with variance $ \ sigma ^ 2 $. Represents an error term according to). At this time, $ y_i $ follows a normal distribution with mean $ \ boldsymbol {x} _ {i} ^ {'} \ boldsymbol {\ beta} $ and variance $ \ sigma ^ 2 $, so that the likelihood function is

\begin{align}
f(\boldsymbol{y} \ | \ \boldsymbol{\beta},\sigma^2) &= \prod_{i=1}^{n}\frac{1}{(2\pi\sigma^2)^{1/2}}\mathrm{exp}\left\{-\frac{(y_i-\boldsymbol{x}_{i}'\boldsymbol{\beta})^2}{2\sigma^2}\right\} \\
&\propto \left(\frac{1}{\sigma^2}\right)^{n/2}\mathrm{exp}\left\{-\frac{\sum_{i=1}^{n}(y_i-\boldsymbol{x}_{i}'\boldsymbol{\beta})^2}{2\sigma^2}\right\} 
\end{align}

Will be. Regarding the prior distribution of parameters,

\boldsymbol{\beta} \sim N(\boldsymbol{\beta}_0,\boldsymbol{B}_0), \ \ \sigma^2 \sim IG\left(\frac{n_0}{2},\frac{s_0}{2}\right)

(However, IG means an inverse gamma distribution). Therefore, when Bayes' theorem is applied to the above likelihoods and prior distributions, the posterior distribution becomes

\pi(\boldsymbol{\beta},\sigma^2 \ | \ \boldsymbol{y}) \propto \left(\frac{1}{\sigma^2}\right)^{n/2}\mathrm{exp}\left\{-\frac{\sum_{i=1}^{n}(y_i-\boldsymbol{x}_{i}'\boldsymbol{\beta})^2}{2\sigma^2}\right\}\mathrm{exp}\left\{-\frac{1}{2}(\boldsymbol{\beta}-\boldsymbol{\beta}_0)'\boldsymbol{B}_{0}^{-1}(\boldsymbol{\beta}-\boldsymbol{\beta}_0)\right\}\left(\frac{1}{\sigma^2}\right)^{n_0/2+1}\mathrm{exp}\left(-\frac{s_0}{2\sigma^2}\right)

Will be.

Now, although the kernel of the posterior distribution can be successfully derived, the integral calculation of the above equation cannot be performed analytically, so the standardized constant of the posterior distribution cannot be calculated (that is why sampling using MCMC is necessary). Will be). Therefore, we successfully generate random numbers that follow the posterior distribution by Gibbs sampling and analyze the properties of the desired distribution, but in order to perform Gibbs sampling, we must derive a complete conditional distribution of each parameter. This can be calculated from the posterior distribution kernel derived earlier, but the process is complicated, so omit it and show only the result (for details, each person ["Bayesian Computational Statistics" (Asakura Shoten)](https: // www) .amazon.co.jp/%E3%83%99%E3%82%A4%E3%82%BA%E8%A8%88%E7%AE%97%E7%B5%B1%E8%A8%88% E5% AD% A6-% E7% B5% B1% E8% A8% 88% E8% A7% A3% E6% 9E% 90% E3% 82% B9% E3% 82% BF% E3% 83% B3% E3 % 83% 80% E3% 83% BC% E3% 83% 89-% E5% 8F% A4% E6% BE% 84-% E8% 8B% B1% E7% 94% B7 / dp / 4254128568 / ref = sr_1_1 ? ie = UTF8 & qid = 1485493617 & sr = 8-1 & keywords =% E3% 83% 99% E3% 82% A4% E3% 82% BA% E8% A8% 88% E7% AE% 97% E7% B5% B1% E8% Please refer to A8% 88) etc.). The complete conditional distribution of each parameter

\begin{align}
\pi(\boldsymbol{\beta} \ | \ \sigma^2,\boldsymbol{y}) &= N(\hat{\beta},\hat{\boldsymbol{B}}) \\
\pi(\sigma^2 \ | \ \boldsymbol{\beta},\boldsymbol{y}) &= IG\left(\frac{n+n_0}{2},\frac{\sum_{i=1}^{n}(y_i-\boldsymbol{x}_{i}'\boldsymbol{\beta})^2)+s_0}{2}\right)
\end{align} \\

However,

\hat{\boldsymbol{B}}^{-1}=\sum_{i=1}^{n}\frac{\boldsymbol{x}_i\boldsymbol{x}_i'}{\sigma^2}+\hat{\boldsymbol{B}}_{0}^{-1}, \ \ \hat{\boldsymbol{\beta}}=\hat{\boldsymbol{B}}\left(\sum_{i=1}^{n}\frac{\boldsymbol{x}_iy_i}{\sigma^2}+\hat{\boldsymbol{B}}_{0}^{-1}\boldsymbol{\beta}_0\right)

Is. Sampling from this perfect conditional distribution may be performed alternately one after another.

Implementation by python

Pseudo data is generated, Bayesian estimation is performed from the data, and it is verified whether the distribution of the results approaches the original parameters. Very simple, but the Gibbs sampling pytho code is below.

import numpy as np
from numpy.random import normal, multivariate_normal
from scipy.stats import invgamma
import matplotlib.pyplot as plt

##Pseudo data generation
alpha = 2.5
beta = 0.8
sigma = 10
data = 100

x = np.c_[np.ones(data), rand(data) * data]

y = 2.5 * x[:,0] + 0.8 * x[:,1] + normal(0, sigma, data)

plt.plot(x[:,1],y,'o')



##Gibbs sampling

#Specify the number of burn ins and the number of samplings
burnin = 10000
it = 2000

beta = np.zeros([burnin + it, 2])
sigma = np.zeros(burnin + it)
B_hat = np.zeros([2,2])
beta_hat = np.zeros(2)
sum1 = np.zeros([2,2])
sum2 = np.zeros(2) 

#Specifying hyperparameters
beta0 = np.array([0,0])
B0 = np.array([[1, 0],[0,1]])
n0 = 2
s0 = 2

#Initial value setting
beta[0,0] = 1
beta[0,1] = 1
sigma[0] = 1 


for i in range(data):
    sum1[0,0] += x[i,0] ** 2
    sum1[0,1] += x[i,0] * x[i,1]
    sum1[0,1] += x[i,0] * x[i,1]
    sum1[1,1] += x[i,1] ** 2 
    sum2[0] += x[i,0] * y[i]
    sum2[1] += x[i,1] * y[i]

for i in range(burnin + it - 1):
    B_hat = np.linalg.inv(sum1/sigma[i] + np.linalg.inv(B0))
    beta_hat = B_hat @ (sum2/sigma[i] + np.linalg.inv(B0) @ beta0)
    beta[i+1,:] =  multivariate_normal(beta_hat, B_hat)
    sigma[i+1] = invgamma.rvs((data+n0)/2,(np.dot(y-beta[i,0]*x[:,0]-beta[i,1]*x[:,1], y-beta[i,0]*x[:,0]-beta[i,1]*x[:,1])))

As a result of sampling, the histogram of beta is as follows.

image

It can be seen that the MAP estimator is approaching the initially set value of 0.8. I did it.

(Details will be added later)

Recommended Posts

I tried to implement Bayesian linear regression by Gibbs sampling in python
I tried to implement PLSA in Python
I tried to implement permutation in Python
I tried to implement PLSA in Python 2
I tried to implement ADALINE in Python
I tried to implement PPO in Python
I tried to implement TOPIC MODEL in Python
I tried to implement selection sort in python
I tried to implement a pseudo pachislot in Python
I tried to implement Dragon Quest poker in Python
I tried to implement GA (genetic algorithm) in Python
I tried to implement a one-dimensional cellular automaton in Python
I tried to implement the mail sending function in Python
I tried to implement blackjack of card game in Python
I tried using Bayesian Optimization in Python
I tried to implement a misunderstood prisoner's dilemma game in Python
Implemented in Python PRML Chapter 3 Bayesian Linear Regression
(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.
I tried to implement a card game of playing cards in Python
I tried to implement merge sort in Python with as few lines as possible
I tried to implement what seems to be a Windows snipping tool in Python
I tried to implement sentence classification & Attention visualization by Japanese BERT in PyTorch
"Linear regression" and "Probabilistic version of linear regression" in Python "Bayesian linear regression"
I tried to graph the packages installed in Python
I want to easily implement a timeout in python
I tried to implement Minesweeper on terminal with python
I tried to implement an artificial perceptron with python
I tried to summarize how to use pandas in python
I tried to implement PCANet
Online linear regression in Python
I tried to implement StarGAN (1)
I tried to create API list.csv in Python from swagger.yaml
I tried to implement anomaly detection by sparse structure learning
[Django] I tried to implement access control by class inheritance.
I tried "How to get a method decorated in Python"
I tried to make a stopwatch using tkinter in python
I tried to touch Python (installation)
[Python] I implemented peripheral Gibbs sampling
I tried to implement adversarial validation
I tried to implement hierarchical clustering
I tried to implement Realness GAN
I tried Line notification in Python
(Python) Expected value ・ I tried to understand Monte Carlo sampling carefully
[Python] I tried to implement stable sorting, so make a note
I tried to implement sentence classification by Self Attention with PyTorch
Introduction to Bayesian Statistical Modeling with python ~ Trying Linear Regression with MCMC ~
I tried to summarize the contents of each package saved by Python pip in one line
I tried to summarize Python exception handling
Linear regression in Python (statmodels, scikit-learn, PyMC3)
I tried to implement Autoencoder with TensorFlow
Online Linear Regression in Python (Robust Estimate)
Python3 standard input I tried to summarize
I implemented Cousera's logistic regression in Python
I wanted to solve ABC159 in Python
I tried to implement CVAE with PyTorch
[Python] I tried to calculate TF-IDF steadily
I tried to touch Python (basic syntax)
Introduction to Vectors: Linear Algebra in Python <1>
[Python] I tried to summarize the set type (set) in an easy-to-understand manner.
I tried to communicate with a remote server by Socket communication with Python.
I tried to develop a Formatter that outputs Python logs in JSON