Introduction

This series is a brief explanation of "Basics of Modern Mathematical Statistics" by Tatsuya Kubokawa, and let's implement the contents in python. I used Google Colaboratory (hereinafter referred to as Colab) for implementation. If you have any suggestions, I would appreciate it if you could write them in the comment section. It may not be suitable for those who want to understand all the contents of the book properly because it is written with a stance that it would be nice if it could be output by touching only the part that I thought needed explanation. Please note that if the formula numbers and proposition / definition indexes are written according to the book, the numbers may be skipped in this article.

Overview of Chapter 3

Probability distribution was the function for which the probability can be obtained by giving a variable. Each of the various types of probability distributions has its own characteristics and uses. It is important to know what the characteristics of each probability distribution are, because if you make a mistake in the assumed probability distribution, you will make a mistake. You can find the expected value and variance of the probability distribution using the probability generation function, product factor generation function, and characteristic function in the previous chapter, but I think you should remember. You may remember it while using it. At the end of the chapter we touch on Stein's equations and Stirling's formula. If you google, you will find many probability distributions that are not introduced in the article. I will write an article on "Probability generating function, Moment generating function, Characteristic function" at another time to prove the proposition using the probability generating function, so I would like to introduce it at that time.

Discrete probability distribution

$ $ We dealt with expected value and variance in Chapter 2, but did not touch on the relationship between expected value and variance. Let $ E [X] = \ mu , and $ V (X) = E [(X- \ mu) ^ 2] = E [X ^ 2-2 \ mu X + \ mu ^ 2] = E [X ^ 2]-(E [X]) ^ 2 $ $=E[X(X-1)]+E[X]-(E[X])^2 $$ The relationship is derived. Keep in mind that many of these relational expressions will appear in the future. The discrete probability distribution (variable $ X $ is a discrete probability distribution) introduced in this book is as follows. ・ Discrete uniform distribution ・ Binomial distribution ・ Poisson distribution ・ Geometric distribution ・ Negative binomial distribution ・ Hypergeometric distribution Let's pick it up and take a look.

Binomial distribution

$ $　 Before the binomial distribution, let me explain the Bernoulli trial. Let me quote the expression in the book,

A Bernoulli trial is an experiment in which a $ p $ probability of'success', a $ 1-p $ probability of'failure', and a random variable $ X $ is'successful', $ 1 $,' Take $ 0 $ on failure'.

The binomial distribution is a distribution in which the variable $ X $ is the "number of'successes'" when this Bernoulli trial is performed independently (the previous trial does not affect the next trial) $ n $. The probability of failing $ k $ times and failing $ nk $ times is expressed by the following formula ('success','failure' is a binary opposition such as'get sick','do not get', etc. Anything you do). $P(k)={}_nC_kp^k(1-p)^{n-k}, \ k=0,1,2,...,n $ The reason why there is $ {} _nC_k $ is that $ n $ trials are done independently, so you can choose $ k $'success' out of $ n $ times. The binomial distribution, where the number of trials and the probability are $ n and p $, respectively, is represented by $ Bin (n, p) $. In this book, it is written in this notation until the end, so you have to get used to it.

As an example, let's draw the probability distribution of the number of times the table appears when the coin is thrown 30 times and 1000 times.

Poisson distribution

The Poisson distribution is a special case of the binomial distribution, and when "rare phenomena" can be "observed (tried) in large numbers" (example: distribution of the number of traffic accidents that occur in one day), the binomial distribution Use the Poisson distribution instead. In other words, if we take the limit of $ n \ to \ infinity, \ p \ to 0 $ in the binomial distribution above, it will converge to the Poisson distribution. There is also a formula for the probability distribution of the Poisson distribution itself, but I will omit it in this article. When $ np = \ lambda $, the Poisson distribution is expressed as $ Po (\ lambda) $. For example, if $ n = 10 and p = 0.1 $, then $ \ lambda = 1 $ (which happens about once every 10 times).

Now, let's check the binomial distribution and Poisson distribution with python.

%matplotlib inline
import matplotlib.pyplot as plt
from scipy.special import comb#Function to calculate the combination
import pandas as pd

#Graph drawing of binomial distribution
def Bin(n,p,x_min,x_max,np):
  prob = pd.Series([comb(float(n),k)*p**k*(1-p)**(float(n)-k) for k in range(0,n+1)]) #Calculate the probability at each k
  plt.bar(prob.index,prob,label=np)#Bar graph (y value,x value)
  plt.xlim(x_min,x_max)
  plt.legend()
  plt.show()

Bin(1000,0.5,0,30,"n=30,p=0.5")#30 coins
Bin(10000,0.5,4500,5500,"n=1000,p=0.5")#1000 coins
Bin(40000,0.00007,0,15,"n=40000,p=0.00007")#Try increasing n and decreasing p

If you do this, you will get the following three graphs.

How about the same function, but with a little distortion, you could draw something like a Poisson distribution.

The remaining three discrete probability distributions also have their own unique ideas, but I think you can read them if you are aware of what the discrete random variable $ X $ represents.

Continuous distribution

The continuous distribution introduced in the book is as follows ・ Uniform distribution ·normal distribution ・ Gamma distribution, chi-square distribution ・ Exponential distribution, hazard distribution ・ Beta distribution Let's pick it up here as well.

normal distribution

$ $　 The normal distribution is the most important probability distribution because it has a symmetrical shape centered on the mean and is easy to handle. When the random variable $ X $ follows a normal distribution with mean $ \ mu, $ variance $ \ sigma ^ 2 $, the probability density function of $ X $ is $ f_X (x | \ mu, \ sigma ^ 2) = \ frac {1} {\ sqrt {2 \ pi} \ sigma} \ exp (-\ frac {(x- \ mu) ^ 2} {2 \ sigma ^ 2}) Given by $, this distribution is $ \ mathcal It is represented by {N} (\ mu, \ sigma ^ 2) $. The standardized $ \ mathcal {N} (0,1) $ is called the standard normal distribution, and $ \ phi (z) = \ frac {1} {\ sqrt {2 \ pi}} \ exp (-\ Write frac {z ^ 2} {2}) $ (the graph is in the previous article). The cumulative distribution function (integral value = probability) of the standard normal distribution is represented by $ \ Phi (z) = \ int_ {-\ infty} ^ z \ phi (t) dt $, and the hypothesis that will appear in a later chapter. It is used when dealing with tests and confidence intervals.

Gamma distribution, chi-square distribution

$　$ There is a chi-square distribution as a special case of the gamma distribution, but the chi-square distribution is more important in statistics. As we will see in later chapters, the chi-square distribution is used for interval estimation of population variance, goodness-of-fit test, independence test, and so on. As for the chi-square distribution, the properties that appear in Chapters 4 and 5 are more important than the formula expressed using the gamma function, so only the shape of the chi-square distribution is drawn here. The chi-square distribution with $ n $ degrees of freedom is represented by $ \ chi_n ^ 2 $. I will omit the degree of freedom because it will be better understood in the following chapters.

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

x1 = np.arange(0,15,0.1)
y1 = stats.chi2.pdf(x=x1,df=1)#df=degree of freedom(Degree of freedom)is
y2 = stats.chi2.pdf(x=x1,df=2)
y3 = stats.chi2.pdf(x=x1,df=3)
y4 = stats.chi2.pdf(x=x1,df=5)
y5 = stats.chi2.pdf(x=x1,df=10)
y6 = stats.chi2.pdf(x=x1,df=12)

plt.figure(figsize=(7,5))
plt.plot(x1,y1, label='n=1')
plt.plot(x1,y2, label='n=2')
plt.plot(x1,y3, label='n=3')
plt.plot(x1,y4, label='n=5')
plt.plot(x1,y5, label='n=10')
plt.plot(x1,y6, label='n=12')

plt.ylim(0,0.7); plt.xlim(0,15)
plt.legend()
plt.show()

When you do this, you get:

Exponential distribution, hazard distribution

$ $　 The probability density function of the exponential distribution is given by the following formula and is expressed as $ Ex (\ lambda) . $ f_X (x | \ lambda) = \ lambda e ^ {-\ lambda x}, \ x> 0 $$ Exponential distributions and hazard distributions are used as distributions such as survival time and the period until a machine fails, and the random variable $ X $ often indicates the time / period. Exponential distribution expected value, variance is $ E [X] = \ frac {1} {\ lambda} = \ theta $ $ V (X) = \ frac {1} {\ lambda ^ 2} $ And the exponential distribution of $ \ theta = 2 $ matches the chi-square distribution with $ n = 2 $ degrees of freedom (above). $ P (X> s) $ represents the probability of survival over time $ s $, which is $ P (X> s) = 1-F_X (s) = e ^ {-\ lambda s} $ (1 is the total probability, 2 items are the cumulative distribution function up to time $ s $). The conditional probability of surviving more than $ t $ in time under the condition of surviving in hour $ s $ is $ P (X \ ge s + t | X \ ge s) = ... = \ frac {e ^ {-\ lambda (s + t)}} {e ^ {-\ lambda s}} = e ^ {^ \ lambda t} = P (X \ get) $, time $ s $ survival You can see that it doesn't depend on the condition that you were doing (remember the conditional probability and try to derive it). This property is called memorability, and it also holds for geometric distributions, which are discrete probability distributions. Even if you've never been lucky, good luck may come tomorrow.

Beta distribution

In the beta distribution, the random variable $ X $ takes a value on the interval (0,1), and its probability density function is $ f_X (x | a, b) = \ frac {1} {B (a, b)} x ^ {a-1} (1-x) ^ {b-1} $ Given by, represented by $ Beta (a, b) $. $ B (a, b) $ is a beta function $B(a,b)=\int_{0}^1 x^{a-1}(1-x)^{b-1} dx$ is. You can easily confirm that it becomes 1 by integrating the probability density function. Beta functions appear in Chapter 6, Bayesian method.

I've only introduced a few, but that's all for Chapter 3. Thank you very much.

References

"Basics of Modern Mathematical Statistics" by Tatsuya Kubokawa

[Basics of Modern Mathematical Statistics with python] Chapter 3: Typical Probability Distribution