[PYTHON] Do you understand the confidence intervals correctly? What is the difference from the conviction section?

The ** Confidence interval ** is a concept that comes up early in the study of statistics, but it is often misunderstood because of its name. For an intuitive interpretation, the ** Credible interval ** using Bayesian statistics is more appropriate, but many people may not understand the difference, or even know the confidence interval in the first place. ..

In this article, we once again clarify the positions of ** Frequentist ** and ** Bayesian **, which divide statistics into two, understand the difference between confidence intervals and confidence intervals, and statistics on data. I would like to use it for scientific analysis.

Concept of Confidence Interval

There are various ways to express the variation of data, such as variance and standard deviation, and ** Confidence interval ** is one of them.

The confidence interval for the interesting parameter $ \ theta $, calculated from $ n $ of data $ X $, is calculated using the standard deviation $ \ sigma $ of the population to which the parameter belongs, as follows:


(\theta - z\frac{\sigma}{\sqrt{n}},\hspace{5pt} \theta + z\frac{\sigma}{\sqrt{n}})

At this time, $ z $ represents the confidence level, and if it is 95% on both sides, $ z = 1.96 $, which is the range that includes 95% of the data of the standard normal distribution, is used. In many cases, the standard deviation of the population is not known, so ** "If you do not know the variation of the normally distributed population, the sample standard deviation $ s instead of the population standard deviation $ \ sigma $. We use $ and use the statistical technique of "assuming that the distribution of parameters follows the t distribution" **. At this time, the confidence interval calculation formula is rewritten as follows.


(\theta - t\frac{s}{\sqrt{n}},\hspace{5pt} \theta + t\frac{s}{\sqrt{n}})

At this time, $ t $ is a t-value with $ n-1 $ degrees of freedom and $ \ alpha = (1-C) / 2 $. For example, if you want to find the 95% confidence interval for $ C $, $ C = 0.95 $, so $ \ alpha = 0.025 $.

This confidence interval is the idea of the Frequentist, but thanks to the name, there are many mistakes that make the following Bayesian interpretation.

"Confidence interval? For example, if you have a 95% confidence interval, 95% of the data you get is in that range, right?"

"Confidence interval? For example, if it is a 95% confidence interval, it shows the range of 95% of the variation range of the parameters of interest calculated from the obtained data, right?"

"Confidence interval? For example, if you have a 95% confidence interval, then when you repeat the experiment many times, there is a 95% chance that the parameter you are interested in will fall within that range, right?"

…… Unfortunately, everything is different.

** A 95% confidence interval means that when many experiments are repeated, for example 100 times, the frequency of experiments in which the interesting parameters calculated from the data obtained in each experiment fall within that confidence interval is 95. It means that it is **.

It's easy to think intuitively from the name of the confidence interval, but it's actually a rather difficult concept to interpret. I think there aren't many people who can read the above definition and immediately understand "Oh, I see!".

The reason is that frequency theorists think as follows.

"There is a true value for the parameter of interest" "The true value of that parameter comes from the population." "But the experiment yields only part of the population." "Therefore, the parameters calculated in each experiment may or may not fit within a confidence interval."

In other words, the philosophy of the confidence interval is that ** the parameter value is originally one, but let's somehow quantify the range of variation created by repeating the experiment **. It's a concept that goes through many assumptions and is difficult to interpret.

Bayesian credible interval

You want to more intuitively calculate the range of parameters you are interested in and say, "There is a 95% chance that this parameter will be in this range!" This is where the idea of ** Bayesian credible interval ** comes into play.

The key is the Bayesian philosophy. Unlike frequencyists, Bayesian do not think that "parameters have true values". Instead, think of "** parameter $ \ theta $ takes various values, and which value is taken by the probability distribution $ P (\ theta) $ **". After determining the model parameter $ \ theta $ that gives the data $ y $, the following Bayes' theorem holds.


P(\theta|y) = \frac{P(y|\theta)P(\theta)}{\Sigma_{y}P(y|\theta)P(\theta)}

At this time, $ P (\ theta) $ is called ** prior distribution ** and represents the probability distribution for the value of $ \ theta $ before observing the data.

$ P (y | \ theta) $ is called the ** Likelihood ** and represents the probability distribution of the data $ y $ when the model parameter $ \ theta $ is determined.

Finally, $ P (\ theta | y) $ is called ** posterior distribution ** and represents the probability distribution of the model parameter $ \ theta $ after observing the data $ y $. ..

The nice thing about this Bayesian statistic is that the possible range of the value $ \ theta $ of interest is expressed (as a post-experimental probability, or posterior probability distribution), so for example, 95% ** $ \ theta. A ** point where the range containing $ can be calculated directly from the posterior probability distribution $ P (\ theta | y) $.

Therefore, under Bayesian philosophy, by calculating the Bayesian conviction interval, we can draw an intuitive conclusion that "the probability that an interesting parameter is in this range is $ p $!".

(Example) Confidence interval vs Bayesian confidence interval

Let's solve a simple example to reinforce your understanding.

A had a problem. I'm worried that I'm weak in rock-paper-scissors. Therefore, Mr. A decided to record the result every time he played rock-paper-scissors. When I became a friend, I played rock-paper-scissors with that opponent until I won or lost. Of the 100 rock-paper-scissors games, A won 42 times. At this time, can you say that Mr. A is weak in rock-paper-scissors? *

Let's answer this problem of Mr. A from the standpoints of Frequentist and Bayesian.

"Frequencyist's answer"

Mr. A's winning percentage $ \ theta $ is 42/100 according to the record, but this is a result obtained in a limited trial of 100 times, and it is a result that only a part of the population is seen, * * A's winning percentage $ \ theta $ has a unique and true value. ** What is the true value? I don't know that. However, I know if Mr. A is vulnerable to rock-paper-scissors. All you have to do is calculate the ** confidence interval **!

The population this time has two results, the probability of winning $ \ theta $ and the probability of losing $ 1- \ theta $ under multiple independent trials, so it is ** binomial distribution **. At this time, the average of the distribution of the number of wins obtained from the results of 100 times of rock-paper-scissors is, from ** central limit theorem **, average $ 100 \ theorem $, variance $ 100 \ thea (1- \ theta). You can follow the normal distribution of $! Since the variance is known, the 95% confidence interval can be calculated as follows using $ z = 1.96 $.


(100\theta - 1.96\sqrt{100\theta(1-\theta)}, \hspace{5pt} 100\theta + 1.96\sqrt{100\theta(1-\theta)})

Substituting the observed win rate $ \ theta = 42/100 $, the confidence interval can be calculated as $ (32.3, 51.7) $. In other words, if you repeat a lot of experiments to find the winning percentage by playing rock-paper-scissors 100 times, 95% of them will be $ 32.3 \ leq wins \ leq 51.7 $! This range includes 50 wins that are neither strong nor weak, so I'm 95% confident that A isn't weak in rock-paper-scissors!

"Bayesian Answers"

Since this rock-paper-scissors is the result of winning or losing based on multiple independent trials, the probability distribution that gives the data $ y $ is ** Binomial distribution **, isn't it? The model parameter of this probability distribution, in this case, if the winning rate is $ \ theta $, the number of trials is $ N $ and the number of wins is $ y $, and this probability distribution can be written as follows.


P(y|\theta) = \binom{N}{y}\theta^{y}(1-\theta)^{N-y}

If the model parameter $ \ theta $ is fixed, the above equation can be treated as a likelihood function in Bayes' theorem. Of course, $ \ theta $ will be calculated by multiplying it with the prior probability formula $ P (\ theta) $.

Now, the prior probabilities of the model parametersP(\theta)Probability distribution ofBeta distributionLet's try. This time the likelihood functionP(y|\theta)Is a binomial distribution, so prior probability distributionP(\theta) がBeta distributionなら、求めたい事後確率分布P(\theta|y) もBeta distributionになるから、計算が楽になる。このように、事前確率分布と事後確率分布が同様の確率分布になるような事前確率分布を**Conjugate prior probability distribution(Conjugate prior)**I call it, but that's okay.

The beta distribution has two parameters of prior probability distribution, $ \ alpha and \ beta $.


\theta \sim Beta(\alpha, \beta)

Let's decide $ \ alpha, \ beta $ by using the fact that the mean of the beta distribution is $ \ frac {\ alpha} {\ alpha + \ beta} $ and the sample size is $ \ alpha + \ beta $.

There are two ways to win or lose the rock-paper-scissors game, so I want the average of the prior probability distribution $ P (\ theta) $ to be $ 0.5 $. The sample size is $ 100 $, so $ \ alpha = \ beta = 50 $ is fine.

At this time, the likelihood functionP(y|\theta)Is a binomial distribution, so due to the nature of the conjugate prior probability distribution, the posterior probability distributionP(\theta|y)Can be written as follows.


\theta|y \sim Beta(\alpha + y, \beta + N - y)

Substituting the value of the parameter, the prior probability distribution is $ Beta (50,50) $ and the posterior probability distribution is $ Beta (92,108) $. The graph of each distribution is as follows.

Finally, let's find the essential 95% Bayesian conviction interval. In short, just calculate the range that contains 95% of the data in the posterior probability distribution, and the result will be $ (0.392,0.530) $. In the graph, the red line shows the lower limit of the confidence interval, and the purple line shows the upper limit of the confidence interval.

From this result, Mr. A's winning percentage is in the range of $ (0.392,0.530) $ with a 95% probability. Well, it tends to be a little weak, but $ \ theta = 0.5 $, which is a 50-50 win rate, is included in this range, and I think it's safe to say that Mr. A is not weak in rock-paper-scissors.

Summary Summary

--The C% confidence interval means that when many experiments are repeated (N times), the number of experiments in which the parameter of interest is within that interval is approximately NC times. --In Bayesian statistics, you can directly calculate the probability that a parameter falls within a certain range called a conviction interval.

in conclusion

Confidence intervals are a basic concept learned from the beginning in statistics, but they are actually difficult to interpret. Bayesian statistics allow for more intuitive interpretations by humans, but tend to make the calculation process more complicated. After all, it's a difference in philosophy, so I think you can use it properly according to your taste.

Below is the code to solve this example and draw a graph.

`ci.py`



# library
from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# A-kun data
p = 42/100
n = 100

# Frequentist -------------------------------
# due to the binomial distribution with the central limit theorem...
mu = 100*p
sigma = np.sqrt(100*p*(1-p))
ci = norm.interval(0.95, loc=mu, scale=sigma)

# Bayesian ----------------------------------
from scipy.stats import beta

# plot beta distribution
fig, ax = plt.subplots(1,1)

def plotBetaPDF(a,b,ax):    
    # range
    x = np.linspace(beta.ppf(0.01, a, b), beta.ppf(0.99, a, b), 100)
    
    # visualize
    ax.plot(x, beta.pdf(x, a, b), lw=5, alpha=0.5)
    ax.set_xlabel('theta')
    ax.set_ylabel('P(theta)')

# prior beta distribution
a = 50
b = 50
plotBetaPDF(a,b,ax)
ax.text(0.54,6,'Prior')

# posterior beta distribution
a = 50+42
b = 50+100-42
plotBetaPDF(a,b,ax)
ax.text(0.48,10,'Posterior')

#plt.close('all')

# 95% Bayesian credible interval
bci = beta.interval(0.95,a,b)
print('95% Bayesian credible interval:' + str(bci))
ax.plot(np.array([bci[0],bci[0]]),np.array([0,12]))
ax.plot(np.array([bci[1],bci[1]]),np.array([0,12]))