I heard that Bayesian statistics are hot recently, so I thought I should study "Introduction to Bayesian Statistics". It's easy to understand up to the middle, but I felt it was difficult to understand as soon as the continuous distribution came out because it is a style that explains with figures without using mathematical formulas as much as possible. Therefore, for studying, I will use coin throwing as a simple example and look at the posterior distribution each time I throw it. I don't understand the MCMC method yet, so I will calculate the probability density function in small pieces.
Hiroyuki Kojima "Introduction to Bayesian Statistics", Diamond, November 2015
It is an explanation using the same figure as the above book, using coin throwing as an example. Kita Mathematics 108th Educational Practice Study Group, Saturday, January 26, 2019 "Bayesian Inference Thought"
This is the same subject, but it is an advanced content calculated by the MCMC method using pyMC. I also want to be able to use pyMC. Look at the posterior distribution of coins on the front and back of each trial (PyMC)
This is the first chapter of "Introduction to Bayesian Statistics". The figure is often used and it is easy to understand. Bayesian inference for a clear understanding in 5 minutes
The strength of Bayesian statistics is
Bayesian inference is based on Bayes' theorem.
The posterior distribution $ p (\ theta | x) $ that was updated with information can be updated one after another as the prior distribution $ p (\ theta) $. The answer does not change even if the order of multiplication is changed, so in the example of coin throwing, $ x = $ {front, front, back, back} or $ x = $ {front, back, front, back} is the final The posterior distribution is the same. This is called the sequential rationality of Bayesian inference. Since the results of the data so far are reflected in the prior distribution, it can be said to be a kind of learning function [^ 2].
import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["font.family"] = "Times New Roman" #Set the entire font
plt.rcParams["xtick.direction"] = "in" #Inward the x-axis scale line
plt.rcParams["ytick.direction"] = "in" #Inward the y-axis scale line
plt.rcParams["xtick.minor.visible"] = True #Addition of x-axis auxiliary scale
plt.rcParams["ytick.minor.visible"] = True #Addition of y-axis auxiliary scale
plt.rcParams["xtick.major.width"] = 1.5 #Line width of x-axis main scale line
plt.rcParams["ytick.major.width"] = 1.5 #Line width of y-axis main scale line
plt.rcParams["xtick.minor.width"] = 1.0 #Line width of x-axis auxiliary scale line
plt.rcParams["ytick.minor.width"] = 1.0 #Line width of y-axis auxiliary scale line
plt.rcParams["xtick.major.size"] = 10 #x-axis main scale line length
plt.rcParams["ytick.major.size"] = 10 #Length of y-axis main scale line
plt.rcParams["xtick.minor.size"] = 5 #x-axis auxiliary scale line length
plt.rcParams["ytick.minor.size"] = 5 #Length of y-axis auxiliary scale line
plt.rcParams["font.size"] = 14 #Font size
plt.rcParams["axes.linewidth"] = 1.5 #Enclosure thickness
If you don't know anything in advance, assume a uniform prior distribution for the time being. It is said that this is called Principle of insufficient reason. First, let's follow this principle.
N = 1000 #Number of calculation divisions
p_x_front = np.linspace(0.0,1.0,N) #p(x|table)
p_x_back = 1.0 - p_x_front #p(x|back)
prior_dist = np.ones(N) #p(θ)Prior distribution
fig,axes = plt.subplots(figsize=(8,6))
axes.plot(p_x_front,prior_dist,label="Prior distribution")
axes.scatter(np.sum(prior_dist*p_x_front/N),0.5,s=100,label="expected value")
axes.legend(loc="best")
axes.set_xlabel(r"$p(\theta)$")
axes.set_ylabel("Probability density")
axes.set_ylim(-0.1,2.5)
axes.set_xticks(np.linspace(0.0,1.0,11))
The vertical axis is the probability density. When integrated, it becomes 1. The round plot is the expected value. The value on the vertical axis of the expected value is just made to be easy to see.
def update(p_x_theta,p_theta):
return (p_x_theta * p_theta) / np.sum(p_x_theta * p_theta / N)
fig,axes = plt.subplots(figsize=(8,6))
dist_front1_back0 = update(p_x_front,prior_dist)
axes.plot(p_x_front,dist_front1_back0,label="front=1, back=0")
axes.scatter(np.sum(dist_front1_back0*p_x_front/N),0.5,s=100)
dist_front1_back1 = update(p_x_back,dist_front1_back0)
axes.plot(p_x_front,dist_front1_back1,label="front=1, back=1")
axes.scatter(np.sum(dist_front1_back1*p_x_front/N),0.5,s=100)
dist_front2_back1 = update(p_x_front,dist_front1_back1)
axes.plot(p_x_front,dist_front2_back1,label="front=2, back=1")
axes.scatter(np.sum(dist_front2_back1*p_x_front/N),0.5,s=100)
dist_front2_back2 = update(p_x_back,dist_front2_back1)
axes.plot(p_x_front,dist_front2_back2,label="front=2, back=2")
axes.scatter(np.sum(dist_front2_back2*p_x_front/N),0.2,s=100)
axes.legend(loc="best")
axes.set_xlabel(r"$p(\theta)$")
axes.set_ylabel("Probability density")
axes.set_ylim(-0.1,2.5)
axes.set_xticks(np.linspace(0.0,1.0,11))
The update function is Bayes' theorem itself. The posterior distribution changes each time you get information about whether a coin is front or back. You can also see by comparing the orange and red plots that the more data you have, the more accurate it is.
Let's look at the posterior distribution when the front is 100 times and the back is 100 times.
def updates(front,back):
p_theta = prior_dist[:]
for i in range(front):
p_theta = update(p_x_front,p_theta)
for i in range(back):
p_theta = update(p_x_back,p_theta)
return p_theta
p_theta = updates(100,100)
fig,axes = plt.subplots(figsize=(8,6))
axes.plot(p_x_front,p_theta)
axes.scatter(np.sum(p_theta*p_x_front/N),0.5,s=100)
axes.set_xlabel(r"$p(\theta)$")
axes.set_ylabel("Probability density")
axes.set_ylim(-0.1,12.0)
axes.set_xticks(np.linspace(0.0,1.0,11))
The variance has become much smaller. The expected value is 0.5.
Bayesian inference can reflect subjectivity. We will use the beta distribution of $ \ alpha = 3, \ beta = 2 $ for the prior distribution, assuming that the table will be easier to see this time.
a = 3.0
b = 2.0
x = np.linspace(0.0,1.0,N)
prior_dist = x**(a-1.0) * (1.0-x)**(b-1.0)
prior_dist /= np.sum(prior_dist) / N
fig,axes = plt.subplots(figsize=(8,6))
axes.plot(p_x_front,prior_dist,label="Prior distribution")
axes.scatter(np.sum(prior_dist*p_x_front/N),0.5,s=100,label="expected value")
axes.legend(loc="best")
axes.set_xlabel(r"$p(\theta)$")
axes.set_ylabel("Probability density")
axes.set_ylim(-0.1,2.5)
axes.set_xticks(np.linspace(0.0,1.0,11))
The expected value is around 0.6.
fig,axes = plt.subplots(figsize=(8,6))
dist_front1_back0 = update(p_x_front,prior_dist)
axes.plot(p_x_front,dist_front1_back0,label="front=1, back=0")
axes.scatter(np.sum(dist_front1_back0*p_x_front/N),0.5,s=100)
dist_front1_back1 = update(p_x_back,dist_front1_back0)
axes.plot(p_x_front,dist_front1_back1,label="front=1, back=1")
axes.scatter(np.sum(dist_front1_back1*p_x_front/N),0.5,s=100)
dist_front2_back1 = update(p_x_front,dist_front1_back1)
axes.plot(p_x_front,dist_front2_back1,label="front=2, back=1")
axes.scatter(np.sum(dist_front2_back1*p_x_front/N),0.5,s=100)
dist_front2_back2 = update(p_x_back,dist_front2_back1)
axes.plot(p_x_front,dist_front2_back2,label="front=2, back=2")
axes.scatter(np.sum(dist_front2_back2*p_x_front/N),0.2,s=100)
axes.legend(loc="best")
axes.set_xlabel(r"$p(\theta)$")
axes.set_ylabel("Probability density")
axes.set_ylim(-0.1,2.5)
axes.set_xticks(np.linspace(0.0,1.0,11))
Compared to using a uniform prior distribution, the expected value is higher due to the subjectivity that the table is likely to appear.
Again, let's look at the posterior distribution when the front is 100 times and the back is 100 times.
p_theta = updates(100,100)
fig,axes = plt.subplots(figsize=(8,6))
axes.plot(p_x_front,p_theta)
axes.scatter(np.sum(p_theta*p_x_front/N),0.5,s=100)
axes.set_xlabel(r"$p(\theta)$")
axes.set_ylabel("Probability density")
axes.set_ylim(-0.1,12.0)
axes.set_xticks(np.linspace(0.0,1.0,11))
The expected value was 0.502, which was slightly larger than 0.5, but as the number of data increased, the influence of the initial subjectivity gradually diminished, and an accurate distribution could be obtained.
I got a rough idea of Bayesian inference. I don't understand the MCMC method, so next is [Introduction to Data Analysis with Bayesian Statistical Modeling Beginning with R and Stan](https://logics-of-blue.com/r-stan-bayesian-model-intro-book I hope I can read -support /) and write about more advanced content this time.
[^ 1]: Hiroyuki Kojima "Introduction to Complete Self-study Bayes Statistics" Diamond, November 2015, p.6 [^ 2]: Hiroyuki Kojima "Introduction to Complete Self-study Bayes Statistics" Diamond, November 2015, p.145