Introduction

This article is an article that explains Bayesian inference for those who are interested in probability statistics with knowledge of university to graduate school.

I wrote an article because the problem of the robot that appeared in Mathematics 1 ・ A of the 2020 Center Test was difficult. I feel like I didn't learn conditional probabilities in high school, so I was curious that high school students these days are doing difficult things.

In fact, it's very easy to just give the correct answer to this question. Perhaps this robot problem itself is easy for a soft-headed high school student.

Question 3 [1]

This question is very easy if you just give an answer. If you solve it from the top, you will know that 0 and 2 are correct, so the robot problem of 3 will automatically be incorrect. I wouldn't even read this question if I was on the exam.

The problem statement is as follows.

③ There are two robots that look at the coin and say only "front (front)" or "back (back)". However, both robots have a probability of speaking correctly on the surface that appears is 0.9 and a probability of not speaking correctly is 0.1, and it is assumed that these two robots speak without being influenced by each other. Now, a person throws a coin. When the two bodies who saw the exposed side both said "front", the probability that the table actually appeared is $ p $, which is $ p \ leq0.9 $.

問題.PNG

High school student answer

First, let's solve it using mathematics that high school students can use. However, I don't know how to teach in the actual class because it seems that I solved it only by looking at the reference site of high school students and writing it. .. ..

Conditional probabilities (if written in intuitive Japanese without fear of mathematical misunderstanding)

\frac{Probability that an event and a condition occur at the same time}{Probability that the condition will occur}

Given in. Mathematically speaking

\frac{P(A \cap B)}{P(B)}

It will be. Now that A is the event where the coin is actually the front, and B is the event where the two robots speak to the front at the same time.

\begin{align}
& \frac{P(A \cap B)}{P(B)}　\\
=& \frac{P(The coin is actually the front) \cap (Two robots speak to the table at the same time)}{P(Two robots speak to the table at the same time)}　\\
=& \frac{0.5*0.9*0.9}{0.5*0.1*0.1+0.5*0.9*0.9}　\\
=& 0.9878\cdots
\end{align}

The answer is that this question is "false". The denominator is difficult, but the probability that two robots will say the front at the same time is interpreted that the probability of two robots that do not depend on coins can be obtained by adding the case where the coin is front and the case where it is back. .. Around this time, high school students who do not know about marginalization may have difficulty understanding the denominator. I can feel the intention of the questioner, "Please solve other questions and answer this question."

Solve more precisely

The story from here will be in the range after university.

Problem definition and modeling

A graphical model of the problem is presented with a conditional probability table (CPT).

モデル図.PNG

If the random variables on the front and back of the coin are $ C $, the back is $ C = 0 $, and the front is $ C = 1 $.

P(C=0) = 0.5 \\
P(C=1) = 0.5

Can be expressed as. If the random variable of the nth robot's remark is $ R_n $, the back is $ R_n = 0 $, and the front is $ R_n = 1 $.

P(R_n=0|C=0) =  0.9\\
P(R_n=1|C=0) =  0.1\\
P(R_n=0|C=1) =  0.1\\
P(R_n=1|C=1) =  0.9

Can be expressed as. In addition, the robots speak unaffected by each other, so the coin faces are independent (conditionally independent) under given conditions.

P(R_1, R_2|C) = P(R_1|C)P(R_2|C)

Is established. These are the prerequisites given in the question. It's a very simple Bayesian network example.

Here, $ P (R_n = 1) $ is due to marginalization

\begin{align}
& P(R_n=1) \\ 
=& P(R_n=1|C=0)P(C=0) + P(R_n=1|C=1)P(C=1) \quad (peripheralization)\\
=& 0.1\cdot0.5 +0.9\cdot0.5 \\
=& 0.5
\end{align}

Can be easily obtained.

Easy to solve

What you should find is $ P (C = 1 | R_1 = 1, R_2 = 1) $. By making full use of Bayes' theorem and marginalization, it can be easily expressed by only known variables.

\begin{align}
 & P(C=1|R_1=1, R_2=1) \\
=& \frac{P(C=1, R_1=1, R_2=1)}{P(R_1=1, R_2=1)} \quad (Bayes' theorem)\\
=& \frac{P(R_1=1, R_2=1|C=1)P(C=1)}{P(R_1=1, R_2=1)} \\
=& \frac{P(R_1=1|C=1)P(R_2=1|C=1)P(C=1)}{P(R_1=1, R_2=1)} \quad (Take advantage of conditional independence) \\
=& \frac{P(R_1=1|C=1)P(R_2=1|C=1)P(C=1)}{P(R_1=1, R_2=1|C=0)(C=0) + P(R_1=1, R_2=1|C=1)(C=1)} \quad (Peripheralization) \\
=& \frac{0.5\cdot0.9\cdot0.9}{0.5\cdot0.1\cdot0.1+0.5\cdot0.9\cdot0.9}　\\
=& 0.9878\cdots
\end{align}

The high school student's answer is just an expression of this in Japanese. Rather, high school answers can contain linguistic misunderstandings, so it would be ideal to think rigorously in this way.

Robot paradox

Well, the main subject is from here. Share stories that you have solved and found to be a little interesting.

These robots have no effect on each other. ** No matter what Robot 2 says, Robot 1's remarks should not change. ** ** However, under certain circumstances, ** robots appear to influence each other **. Like the Monty Hall problem, the human intuition and the actual theoretical answer deviate from each other.

[Monty Hall Problem](https://ja.wikipedia.org/wiki/%E3%83%A2%E3%83%B3%E3%83%86%E3%82%A3%E3%83%BB% E3% 83% 9B% E3% 83% BC% E3% 83% AB% E5% 95% 8F% E9% A1% 8C)

Robot independence

Do you feel uncomfortable with the expansion of the denominator of the formula introduced in the chapter "Easy to solve"? Because the robots don't affect each other

\begin{align}
P(R_1=1, R_2=1) = P(R_1=1)P(R_2=1)
\end{align}

Seems to hold, and can it be calculated easily using this?

** But this formula transformation is wrong. $ P (R_1 = 1, R_2 = 1) \ neq P (R_1 = 1) P (R_2 = 1) . ** ** Robots are "conditionally independent". Only with the condition that coin C is revealedP(R_1, R_2|C) = P(R_1|C)P(R_2|C)$Is established. The correct expansion of $ P (R_1 = 1, R_2 = 1) $ is

\begin{align}
&P(R_1=1, R_2=1) \\
=& P(R_1=1|R_2=1)P(R_2=1) \\
\end{align}

And $ P (R_1 = 1 | R_2 = 1) $ is

\begin{align}
& P(R_1=1|R_2=1) \\ 
=& P(R_1=1|R_2=1, C=0)P(C=0|R_2=1) + P(R_1=1|R_2=1, C=1)P(C=1|R_2=1) \\
=& P(R_1=1|C=0)P(C=0|R_2=1) + P(R_1=1|C=1)P(C=1|R_2=1) \\
=& 0.1\cdot0.1 +0.9\cdot0.9 \\
=& 0.82
\end{align}

Can be calculated.

This is a scary result. Since $ P (R_1) = 0.5 $, the probability that Robot 1 will say "front" has increased from 0.5 to 0.82 when Robot 2 says "front". I feel that the remarks of Robot 1 have changed with the remarks of Robot 2. Robot 1 and Robot 2 should be unrelated to each other, but it feels strange.

Why does Robot 1's remark depend on Robot 2?

This paradox is caused by the invisible coins. As I explained earlier, under the condition that the coin is visible, the robot's remarks are conditionally independent and do not affect each other. But what if Robot 2 speaks when the coins aren't visible? If Robot 2 says "front", the probability that the coin is front will increase. In fact, $ P (C = 1 | R_2 = 1) $ is from Bayes' theorem.

\begin{align}
& P(C=1|R_2=1) \\ 
=& \frac{P(R_2=1|C=1)P(C=1)}{P(R_2=1)} \\
=& \frac{0.9\cdot0.5}{0.5} \\
=& 0.9
\end{align}

It will be. When Robot 2 says front, the probability that a coin is front increases from 0.5 to 0.9. From this, if Robot 2 is front, the probability that coins are front will increase, and by extension, the probability that Robot 1 will be front will increase.

Still can't understand intuitively? Then, increase the number of robots and think about it. Considering N-body robots, it becomes closer to human intuition.

Consider N-body robots.

Consider the case where there are N robots and each of them judges the coin's back face separately.

ロボットＮ.PNG

Because the robot is conditionally independent of the coin roll,

P(R_1, R_2, \dots, R_N|C) = \prod_{i=1}^N P(R_i|C)

Is established. When all N robots say "front", the probability that the coin is actually front is calculated in the same way as when there are two robots.

\begin{align}
& P(C=1|R_1=1, R_2=1, \dots, R_N=1) \\
= & \frac{P(C=1)\prod_{i=1}^N P(R_i=1|C=1)}{P(R_1=1, R_2=1, \dots, R_N=1)} \\
= & \frac{P(C=1)\prod_{i=1}^N P(R_i=1|C=1)}{P(R_1=1, R_2=1, \dots, R_N=1 | C=0)P(C=0) + P(R_1=1, R_2=1, \dots, R_N=1 | C=1)P(C=1)} \\
= & \frac{P(C=1)\prod_{i=1}^N P(R_i=1|C=1)}{P(C=0)\prod_{i=1}^N P(R_i=1|C=0) + P(C=1)\prod_{i=1}^N P(R_i=1|C=1)} \\
= & \frac{0.5\cdot0.9^N}{0.5\cdot0.1^N + 0.5\cdot0.9^N} \\
= & \frac{0.9^N}{0.1^N + 0.9^N}
\end{align}

It will be. It became a very beautiful ceremony. In fact, substituting $ N = 2 $ matches the answer for the two robots. This formula is 1 at the limit of $ N \ rightarrow \ infinity $. If there are an infinite number of robots and all of them say "front", you can tell that they are front without looking at the coins.

Next, when robots 2 to N say front, the probability $ P (R_1 = 1 | R_2 = 1, R_2 = 1, \ dots, R_N = 1) $ that robot 1 says front is

\begin{align}
& P(R_1=1|R_2=1, R_3=1, \dots, R_N=1) \\ 
=& P(R_1=1|C=0)P(C=0|R_2=1, R_3=1, \dots, R_N=1) \\
 &+ P(R_1=1|C=1)P(C=1|R_2=1, R_3=1, \dots, R_N=1) \\
=& P(R_1=1|C=0) \frac{0.1^{N-1}}{0.1^{N-1} + 0.9^{N-1}} + P(R_0=1|C=1) \frac{0.9^{N-1}}{0.1^{N-1} + 0.9^{N-1}} \\
=& \frac{0.1^N}{0.1^{N-1} + 0.9^{N-1}} + \frac{0.9^N}{0.1^{N-1} + 0.9^{N-1}} \\
=& \frac{0.1^N + 0.9^N}{0.1^{N-1} + 0.9^{N-1}} \\
\end{align}

It will be. This is also a beautiful formula. This also certainly matches the case of two robots with the substitution of $ N = 2 $. This formula is 0.9 at the limit of $ N \ rightarrow \ infinity $. If there are an infinite number of robots and all of them say "front", they will be "front" without looking at the coins, so the result that the probability of saying "robot 1" is 0.9 is reasonable. The results when changing the number of robots are as follows.

\begin{align}
&P(R_1=1) &=& 0.5 \\
&P(R_1=1|R_2=1) &=& 0.820 \\
&P(R_1=1|R_2=1, R_3=1) &=& 0.890 \\
&P(R_1=1|R_2=1, R_3=1, R_4=1) &=& 0.899 \\
\end{align}

In this case, "I can't see the coin, but since many other robots are front, the coin is probably front. Therefore, the probability that robot 1 is front should be higher than 0.5," intuitively. I think you can understand it.

The reason why two robots feel uncomfortable may be that the brain thinks that the two robots are on an equal footing. While we think that Robot 1 will not be affected by Robot 2's remarks, it is likely that the remarks of many robots will affect Robot 1. It may be an animal saga that values majority voting.

The question of why two robots are counterintuitive has something to do with the Monty Hall problem. Even if you don't understand the Monty Hall problem intuitively, you can easily understand the Monty Hall problem with 100 doors.

Graphical model

Of course, the discussion I've done so far isn't the first time I've thought about it. It's just a special case of a graphical model or a field called Bayesian networks. Specifically, this robot problem can be said to be an example of tail-to-tail graph inference. I'm tired so I won't explain it anymore. .. .. If you are interested, please google with the graphical model.

Monte Carlo simulation

Perform a Monte Carlo simulation to see if the discussion so far is correct. I threw a coin 100,000 times with python.

Monte Carlo simulation source code (folded because it is long)

# -*- coding: utf-8 -*-

# %%
import numpy as np
import pandas as pd

# %%


N = 100000

robot_num = 2

result = []

# %%

coin = np.zeros((N, 1), dtype = int)
robot = np.zeros((N, robot_num), dtype = int)

for i in range(N):
    coin[i] = np.random.randint(0, 2)
    robot_rand = np.random.random(robot_num)
   
    for n_robot, tmp in enumerate(robot_rand):
        if tmp <= 0.9 :
            robot[i, n_robot] = coin[i]
        else :
            robot[i, n_robot] = coin[i] ^ 1
    

# %%
            
result = pd.DataFrame(np.concatenate([coin, robot], axis = 1), columns = ["coin", "robot1", "robot2"])

all_1 = result[(result.robot1 == 1) & (result.robot2 == 1)]

p = (all_1.coin == 1).sum()/all_1.shape[0]

# %%

p_truth = (0.5*0.9*0.9)/(0.5*0.9*0.9+0.1*0.1*0.5)
print("Conditional probability of coins when two robots say front (theory):{}".format(p_truth))
print("Conditional probability of coins when two robots say front (experiment):{}".format(p))

# %%

r12 = result[result.robot2 == 1]
p12 = (r12.robot1 == 1).sum() / r12.shape[0]


p12_truth = 0.9*0.9+0.1*0.1
print("Conditional probability (theory) of robot 1 when robot 2 says front:{}".format(p12_truth))
print("Conditional probability of robot 1 when robot 2 says front (experiment):{}".format(p12))

The output result of the program is as follows. There is still an error at 100,000 times, but it can be confirmed that they are roughly the same.

Conditional probability of coins when two robots say front (theory): 0.9878048780487805
Conditional probability of coins when two robots say front (experiment): 0.988986730708585
Conditional probability (theory) of robot 1 when robot 2 says front: 0.8200000000000001
Conditional probability of robot 1 when robot 2 says front (experiment): 0.8214371114299378

Below, miscellaneous things

Finally, I will write down what I came up with.

M-faced dice

It was a problem dealing with coins, that is, Bernoulli distribution, but by considering something like an M-plane dice, the same argument can be made even in the case of categorical distribution. If the probability that $ m ^ \ prime $ appears on a certain surface is $ P (C = m ^ \ prime | R_1 = r_1, R_2 = r_2, \ dots, R_N = r_N) $,

\begin{align}
& P(C=m^\prime|R_1=r_1, R_2=r_2, \dots, R_N=r_N) \\
= & \frac{P(C=m^\prime)\prod_{i=1}^N P(R_i=r_i|C=m^\prime)}{P(R_1=r_1, R_2=r_2, \dots, R_N=r_N)} \\
= & \frac{P(C=m^\prime)\prod_{i=1}^N P(R_i=r_i|C=m^\prime)}{\sum_{m=1}^M P(R_1=r_1, R_2=r_2, \dots, R_N=r_N | C=m)P(C=m)} \\
= & \frac{P(C=m^\prime)\prod_{i=1}^N P(R_i=r_i|C=m^\prime)}{\sum_{m=1}^M　P(C=m)\prod_{i=1}^N P(R_i=r_i|C=m)}
\end{align}

It will be.

Bayesian inference consideration

If you think of adding a robot as adding a data point, you can see that the argument is exactly the same as the common Bayesian inference model. Estimating the parameters of the regression model from the noisy data points of the Gaussian distribution and estimating the front and back of the coin from a whimsical robot do the same calculation.

\begin{align}
& P(C=m^\prime|R_1=r_1, R_2=r_2, \dots, R_{N+1}=r_{N+1}) \\
\propto & P(C=m^\prime) \prod_{i=1}^{N+1} P(R_i=r_i|C=m^\prime) \\
= & P(C=m^\prime) \prod_{i=1}^N P(R_i=r_i|C=m^\prime)) \times P(R_{N+1}=r_{N+1}|C=m^\prime) \\
\propto & P(C=m^\prime|R_1=r_1, R_2=r_2, \dots, R_N=r_N) \times P(R_{N+1}=r_{N+1}|C=m^\prime)
\end{align}

Can be written. If $ P (C = m ^ \ prime | R_1 = r_1, R_2 = r_2, \ dots, R_N = r_N) $ can be calculated, just multiply the probability of additional robot $ {N + 1} $ and everything This means that you can calculate the overall probability without having to recalculate. It's the same as the discussion when adding data points and inferring parameters.

Finally

Please do your best as an examinee.

What I did

Below, I will write an article if I am motivated.

When there are 2 coins and 1 robot A discussion of head-to-head models. It will be more difficult than this problem.
When the probability of coins appearing on the front and back and the probability of robot remarks are unknown It is a method of estimating the probability with MCMC. The explanation is very difficult, so I don't seem to be motivated.

[PYTHON] [Probability] I will explain because the robot problem of Center 2020 Mathematics 1 ・ A was interesting.