Introduction

I was talking about Bayesian updates in an atmosphere, so I wanted to study properly and summarized it. This is the nth decoction article.

reference

--Bayesian statistics (range) in 5 minutes (https://www.slideshare.net/matsukenbook/15-59154892) --Bayesian estimation of probability distribution (http://arduinopid.web.fc2.com/P19.html) --Bayesian estimation of the winning probability of Cocoichi's spoon (https://rmizutaa.hatenablog.com/entry/2019/02/15/200700)

procedure

Bayes update

Roughly summarizing Bayesian estimation, instead of point-estimating the parameters of likelihood (= model) with MLE, the distribution of the parameters is estimated using Bayes' theorem. Bayes' theorem is expressed by the following equation.

p(w|x) = \frac{p(x|w)p(w)}{p(x)}

p(x|w)Is the likelihood (= model),p(w)Is prior distribution,p(x)Is the probability of data occurrence.p(w|x)Is called posterior distribution. This formula is a formula that the distribution (posterior distribution) of the parameter $ w $ when the data $ x $ is given can be obtained by multiplying the prior distribution of the parameter $ w $ of the model by the likelihood. ..

Bayes' update is a mechanism that uses Bayes' theorem to sequentially update the posterior distribution. The following development is based on here. First, we transform the formula of Bayes' theorem.

\begin{align}
&p(A, B, C) = p(A|B, C)p(B, C) = p(B, C|A)p(A) \\
\Rightarrow& p(A|B, C) = \frac{p(B, C|A)p(A)}{p(B, C)}
\end{align}

Up to this point, it generally holds.

Next, suppose B and C are independent and conditionally independent given A. Then we get the following equation:

\begin{align}
p(A|B, C) &= \frac{p(B, C|A)p(A)}{p(B, C)} \\
&= \frac{p(B|A)p(C|A)p(A)}{p(B)p(C)} \\
&= \frac{p(C|A)}{p(C)}\frac{p(B|A)p(A)}{p(B)} \\
&= \frac{p(C|A)}{p(C)}p(A|B)
\end{align}

Now, let's put $ w $ in A, the first observed data $ x_1 $ in B, and the second observed data $ x_2 $ in C. "$ X_1 $ and $ x_2 $ are independent and conditional independence given $ w $" means that the observed data are independently generated from the distribution and the likelihood (= model) is also $ x_1. It means modeling so that $ and $ x_2 $ are independent. Note that this property does not hold for data that is time-series correlated. At this time, the following equation holds.

\begin{align}
p(w|x_1, x_2) = \frac{p(x_2|w)}{p(x_2)}p(w|x_1)
\end{align}

The posterior distribution of $ w $ given the data $ x_1 $ and $ x_2 $ is Bayesian theorem of $ x_2 $ and $ w $ with $ p (w | x_1) $ in the prior distribution. I will! Using this relationship, give an appropriate prior distribution → find the posterior distribution when data is given → read the posterior distribution as prior distribution → find the posterior distribution when data is given → ... and so on. You can update the distribution. This is how Bayesian updates work.

Bayesian update example

Let's calculate the Bayesian update by referring to here.

Consider a distorted coin. Let's Bayesian estimate the probability that this coin will appear. Suppose you get a coin roll independently on each trial. First, consider the likelihood, or model, of the coin's roll. Since the front and back of the coin are binary variables, they follow the Bernoulli distribution. Therefore, the likelihood is Bernoulli distribution. Correspond the front side to $ x = 1 $ and the back side to $ x = 0 $. $ w $ is a real number between 0 and 1 and corresponds to the very probability that the table will appear. Therefore, the distribution of $ w $ obtained by Bayesian update itself is the distribution of the probability that the table will appear.

The Bernoulli distribution can be written as:

\begin{align}
p(x|w) = w^x(1-w)^{1-x}
\end{align}

Let's assume that the prior distribution is $ p (w) = 1 $ as an information prior distribution. $ p (x) $ is unknown, but if you focus on $ w $, it is a normalized constant, so it can be automatically calculated from the condition that the integral of $ p (w | x) $ is 1.

Now, let's shake the coin.

1st time: Table

\begin{align}
p(w|x_1=1) \propto p(x_1=1|w)p(w) = w
\end{align}

It is already standardized because it becomes 1 when integrated from 0 to 1.

\begin{align}
p(w|x_1=1) = w
\end{align}

This gives the second prior distribution $ p (w) = p (w | x_1 = 1) = w $.

Second time: Table

\begin{align}
p(w|x_1=1, x_2=1) \propto p(x_2=1|w)p(w) = w^2
\end{align}

When integrated, it becomes 1/3, so when standardized, it becomes as follows.

\begin{align}
p(w|x_1=1, x_2=1) = 3w^2
\end{align}

This gives the third prior distribution $ p (w) = p (w | x_1 = 1, x_2 = 1) = 3w ^ 2 $.

Third time: back

\begin{align}
p(w|x_1=1, x_2=1, x_3=0) \propto p(x_3=0|w)p(w) = 3(1-w)w^2
\end{align}

When integrated, it becomes 1/4, so when standardized, it becomes as follows.

\begin{align}
p(w|x_1=1, x_2=1, x_3=0) = 12(1-w)w^2
\end{align}

In this way, you can find the distribution of the probability $ w $ that a coin will be turned upside down.

Implementation of Bayesian update example

Let's implement the example seen above in Python. For the implementation, I referred to here.

import sympy
import numpy as np
import matplotlib.pyplot as plt

np.random.rand()

#Front and back series of coins
xs = [1, 1, 0]  #Front, front, back

#Prior distribution is non-information prior distribution
prior_prob = 1

#Symbol to integrate
w = sympy.Symbol('w')

#Initialize
#It is necessary when executing repeatedly with jupyter notebook
posterior_prob = None

#Sequential calculation of posterior distribution
for x in xs:
    
    #Calculate posterior distribution (unnormalized)
    if x==1:
        posterior_prob = w*prior_prob
    else:
        posterior_prob = (1-w)*prior_prob
        
    #Standardization
    Z = sympy.integrate(posterior_prob, (w, 0, 1))
    posterior_prob = posterior_prob/Z
        
    #Prior distribution replacement
    prior_prob = posterior_prob
    
plt.figure(figsize=(5, 4))
X = np.linspace(0, 1, 100)
plt.plot(X, [posterior_prob.subs(w, i) for i in X])
plt.xlabel("w")
plt.show()

bonus

Let's try to estimate when there is a little more data. With a coin with a probability of appearing on the table 0.35, increase the number of trials to 30 and estimate. As the data increases, the estimation becomes more accurate, and the distribution becomes sharper.

import sympy
import numpy as np
import matplotlib.pyplot as plt

np.random.rand(0)


def bernoulli_sampler(w, n):
    """w is the probability of getting 1 and n is the number of data to generate"""
    xs = np.random.rand(n)
    xs = xs<w
    return xs.astype("int")


#Front and back series of coins
xs = bernoulli_sampler(0.35, 30)

#Prior distribution is non-information prior distribution
prior_prob = 1

#Symbol to integrate
w = sympy.Symbol('w')

#Initialize
#It is necessary when executing repeatedly with jupyter notebook
posterior_prob = None

#Sequential calculation of posterior distribution
for x in xs:
    
    #Calculate posterior distribution (unnormalized)
    if x==1:
        posterior_prob = w*prior_prob
    else:
        posterior_prob = (1-w)*prior_prob
        
    #Standardization
    Z = sympy.integrate(posterior_prob, (w, 0, 1))
    posterior_prob = posterior_prob/Z

    #Prior distribution replacement
    prior_prob = posterior_prob

plt.figure(figsize=(5, 4))
X = np.linspace(0, 1, 100)
plt.plot(X, [posterior_prob.subs(w, i) for i in X])
plt.xlabel("w")
plt.show()

in conclusion

The point is that B and C are independent and conditional independence given A.

[PYTHON] Bayes update