[Basics of Modern Mathematical Statistics with python] Chapter 3: Typical Probability Distribution

Introduction

This series is a brief explanation of "Basics of Modern Mathematical Statistics" by Tatsuya Kubokawa, and let's implement the contents in python. I used Google Colaboratory (hereinafter referred to as Colab) for implementation. If you have any suggestions, I would appreciate it if you could write them in the comment section. It may not be suitable for those who want to understand all the contents of the book properly because it is written with a stance that it would be nice if it could be output by touching only the part that I thought needed explanation. Please note that if the formula numbers and proposition / definition indexes are written according to the book, the numbers may be skipped in this article.

Overview of Chapter 3

Probability distribution was the function for which the probability can be obtained by giving a variable. Each of the various types of probability distributions has its own characteristics and uses. It is important to know what the characteristics of each probability distribution are, because if you make a mistake in the assumed probability distribution, you will make a mistake. You can find the expected value and variance of the probability distribution using the probability generation function, product factor generation function, and characteristic function in the previous chapter, but I think you should remember. You may remember it while using it. At the end of the chapter we touch on Stein's equations and Stirling's formula. If you google, you will find many probability distributions that are not introduced in the article. I will write an article on "Probability generating function, Moment generating function, Characteristic function" at another time to prove the proposition using the probability generating function, so I would like to introduce it at that time.

Discrete probability distribution

$ $ We dealt with expected value and variance in Chapter 2, but did not touch on the relationship between expected value and variance. Let $ E [X] = \ mu , and $ V (X) = E [(X- \ mu) ^ 2] = E [X ^ 2-2 \ mu X + \ mu ^ 2] = E [X ^ 2]-(E [X]) ^ 2 $ $=E[X(X-1)]+E[X]-(E[X])^2 $$ The relationship is derived. Keep in mind that many of these relational expressions will appear in the future. The discrete probability distribution (variable $ X $ is a discrete probability distribution) introduced in this book is as follows. ・ Discrete uniform distribution ・ Binomial distribution ・ Poisson distribution ・ Geometric distribution ・ Negative binomial distribution ・ Hypergeometric distribution Let's pick it up and take a look.

Binomial distribution

$ $  Before the binomial distribution, let me explain the Bernoulli trial. Let me quote the expression in the book,

A Bernoulli trial is an experiment in which a $ p $ probability of'success', a $ 1-p $ probability of'failure', and a random variable $ X $ is'successful', $ 1 $,' Take $ 0 $ on failure'.

The binomial distribution is a distribution in which the variable $ X $ is the "number of'successes'" when this Bernoulli trial is performed independently (the previous trial does not affect the next trial) $ n $. The probability of failing $ k $ times and failing $ nk $ times is expressed by the following formula ('success','failure' is a binary opposition such as'get sick','do not get', etc. Anything you do). $P(k)={}_nC_kp^k(1-p)^{n-k}, \ k=0,1,2,...,n $ The reason why there is $ {} _nC_k $ is that $ n $ trials are done independently, so you can choose $ k $'success' out of $ n $ times. The binomial distribution, where the number of trials and the probability are $ n and p $, respectively, is represented by $ Bin (n, p) $. In this book, it is written in this notation until the end, so you have to get used to it.

As an example, let's draw the probability distribution of the number of times the table appears when the coin is thrown 30 times and 1000 times.

Poisson distribution

The Poisson distribution is a special case of the binomial distribution, and when "rare phenomena" can be "observed (tried) in large numbers" (example: distribution of the number of traffic accidents that occur in one day), the binomial distribution Use the Poisson distribution instead. In other words, if we take the limit of $ n \ to \ infinity, \ p \ to 0 $ in the binomial distribution above, it will converge to the Poisson distribution. There is also a formula for the probability distribution of the Poisson distribution itself, but I will omit it in this article. When $ np = \ lambda $, the Poisson distribution is expressed as $ Po (\ lambda) $. For example, if $ n = 10 and p = 0.1 $, then $ \ lambda = 1 $ (which happens about once every 10 times).

Now, let's check the binomial distribution and Poisson distribution with python.

%matplotlib inline
import matplotlib.pyplot as plt
from scipy.special import comb#Function to calculate the combination
import pandas as pd

#Graph drawing of binomial distribution
def Bin(n,p,x_min,x_max,np):
  prob = pd.Series([comb(float(n),k)*p**k*(1-p)**(float(n)-k) for k in range(0,n+1)]) #Calculate the probability at each k
  plt.bar(prob.index,prob,label=np)#Bar graph (y value,x value)
  plt.xlim(x_min,x_max)
  plt.legend()
  plt.show()

Bin(1000,0.5,0,30,"n=30,p=0.5")#30 coins
Bin(10000,0.5,4500,5500,"n=1000,p=0.5")#1000 coins
Bin(40000,0.00007,0,15,"n=40000,p=0.00007")#Try increasing n and decreasing p

If you do this, you will get the following three graphs. image.png image.png image.png

How about the same function, but with a little distortion, you could draw something like a Poisson distribution.

The remaining three discrete probability distributions also have their own unique ideas, but I think you can read them if you are aware of what the discrete random variable $ X $ represents.

Continuous distribution

The continuous distribution introduced in the book is as follows ・ Uniform distribution ·normal distribution ・ Gamma distribution, chi-square distribution ・ Exponential distribution, hazard distribution ・ Beta distribution Let's pick it up here as well.

normal distribution

$ $  The normal distribution is the most important probability distribution because it has a symmetrical shape centered on the mean and is easy to handle. When the random variable $ X $ follows a normal distribution with mean $ \ mu, $ variance $ \ sigma ^ 2 $, the probability density function of $ X $ is $ f_X (x | \ mu, \ sigma ^ 2) = \ frac {1} {\ sqrt {2 \ pi} \ sigma} \ exp (-\ frac {(x- \ mu) ^ 2} {2 \ sigma ^ 2}) Given by $, this distribution is $ \ mathcal It is represented by {N} (\ mu, \ sigma ^ 2) $. The standardized $ \ mathcal {N} (0,1) $ is called the standard normal distribution, and $ \ phi (z) = \ frac {1} {\ sqrt {2 \ pi}} \ exp (-\ Write frac {z ^ 2} {2}) $ (the graph is in the previous article). The cumulative distribution function (integral value = probability) of the standard normal distribution is represented by $ \ Phi (z) = \ int_ {-\ infty} ^ z \ phi (t) dt $, and the hypothesis that will appear in a later chapter. It is used when dealing with tests and confidence intervals.

Gamma distribution, chi-square distribution

$ $ There is a chi-square distribution as a special case of the gamma distribution, but the chi-square distribution is more important in statistics. As we will see in later chapters, the chi-square distribution is used for interval estimation of population variance, goodness-of-fit test, independence test, and so on. As for the chi-square distribution, the properties that appear in Chapters 4 and 5 are more important than the formula expressed using the gamma function, so only the shape of the chi-square distribution is drawn here. The chi-square distribution with $ n $ degrees of freedom is represented by $ \ chi_n ^ 2 $. I will omit the degree of freedom because it will be better understood in the following chapters.

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

x1 = np.arange(0,15,0.1)
y1 = stats.chi2.pdf(x=x1,df=1)#df=degree of freedom(Degree of freedom)is
y2 = stats.chi2.pdf(x=x1,df=2)
y3 = stats.chi2.pdf(x=x1,df=3)
y4 = stats.chi2.pdf(x=x1,df=5)
y5 = stats.chi2.pdf(x=x1,df=10)
y6 = stats.chi2.pdf(x=x1,df=12)

plt.figure(figsize=(7,5))
plt.plot(x1,y1, label='n=1')
plt.plot(x1,y2, label='n=2')
plt.plot(x1,y3, label='n=3')
plt.plot(x1,y4, label='n=5')
plt.plot(x1,y5, label='n=10')
plt.plot(x1,y6, label='n=12')

plt.ylim(0,0.7); plt.xlim(0,15)
plt.legend()
plt.show()

When you do this, you get: image.png

Exponential distribution, hazard distribution

$ $  The probability density function of the exponential distribution is given by the following formula and is expressed as $ Ex (\ lambda) . $ f_X (x | \ lambda) = \ lambda e ^ {-\ lambda x}, \ x> 0 $$ Exponential distributions and hazard distributions are used as distributions such as survival time and the period until a machine fails, and the random variable $ X $ often indicates the time / period. Exponential distribution expected value, variance is $ E [X] = \ frac {1} {\ lambda} = \ theta $ $ V (X) = \ frac {1} {\ lambda ^ 2} $ And the exponential distribution of $ \ theta = 2 $ matches the chi-square distribution with $ n = 2 $ degrees of freedom (above). $ P (X> s) $ represents the probability of survival over time $ s $, which is $ P (X> s) = 1-F_X (s) = e ^ {-\ lambda s} $ (1 is the total probability, 2 items are the cumulative distribution function up to time $ s $). The conditional probability of surviving more than $ t $ in time under the condition of surviving in hour $ s $ is $ P (X \ ge s + t | X \ ge s) = ... = \ frac {e ^ {-\ lambda (s + t)}} {e ^ {-\ lambda s}} = e ^ {^ \ lambda t} = P (X \ get) $, time $ s $ survival You can see that it doesn't depend on the condition that you were doing (remember the conditional probability and try to derive it). This property is called memorability, and it also holds for geometric distributions, which are discrete probability distributions. Even if you've never been lucky, good luck may come tomorrow.

Beta distribution

In the beta distribution, the random variable $ X $ takes a value on the interval (0,1), and its probability density function is $ f_X (x | a, b) = \ frac {1} {B (a, b)} x ^ {a-1} (1-x) ^ {b-1} $ Given by, represented by $ Beta (a, b) $. $ B (a, b) $ is a beta function $B(a,b)=\int_{0}^1 x^{a-1}(1-x)^{b-1} dx$ is. You can easily confirm that it becomes 1 by integrating the probability density function. Beta functions appear in Chapter 6, Bayesian method.

I've only introduced a few, but that's all for Chapter 3. Thank you very much.

References

"Basics of Modern Mathematical Statistics" by Tatsuya Kubokawa

Recommended Posts

[Basics of Modern Mathematical Statistics with python] Chapter 3: Typical Probability Distribution
[Basics of Modern Mathematical Statistics with python] Chapter 1: Probability
[Basics of modern mathematical statistics with python] Chapter 2: Probability distribution and expected value
1. Statistics learned with Python 2. Probability distribution [Thorough understanding of scipy.stats]
1. Statistics learned with Python 2-1. Probability distribution [discrete variable]
Getting Started with Python Basics of Python
Basics of binarized image processing with Python
1. Statistics learned with Python 1-3. Calculation of various statistics (statistics)
[Python] Chapter 02-04 Basics of Python Program (About Comments)
[Python] Chapter 02-03 Basics of Python programs (input / output)
[Introduction to Data Scientists] Basics of Probability and Statistics ♬ Probability / Random Variables and Probability Distribution
Statistics with python
Basics of python ①
[Python of Hikari-] Chapter 05-06 Control Syntax (Basics of Comprehension)
1. Statistics learned with Python 1-2. Calculation of various statistics (Numpy)
[Python] Chapter 02-01 Basics of Python programs (operations and variables)
[Python] Chapter 02-02 Basics of Python programs (Handling of character strings)
[Python for Hikari] Chapter 09-01 Classes (Basics of Objects)
[Python] Chapter 02-05 Basics of Python programs (string operations / methods)
[Python] Chapter 02-06 <Supplement> Basics of Python programs (handling of numerical values)
Basics of Python scraping basics
[Chapter 5] Introduction to Python with 100 knocks of language processing
# 4 [python] Basics of functions
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Basics of data science] Collecting data from RSS with python
Basics of python: Output
Check the asymptotic nature of the probability distribution in Python
[Chapter 4] Introduction to Python with 100 knocks of language processing
python: Basics of using scikit-learn ①
Practice typical methods of statistics (1)
Basics of Python × GIS (Part 1)
Try transcribing the probability mass function of the binomial distribution in Python
Basics of Python x GIS (Part 3)
Paiza Python Primer 5: Basics of Dictionaries
SNS Python basics made with Flask
[Python of Hikari-] Chapter 09-03 Class (inheritance)
Setup modern Python environment with Homebrew
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock with Python (Chapter 3)
10 functions of "language with battery" python
Review of the basics of Python (FizzBuzz)
Basics of Python x GIS (Part 2)
Basics of touching MongoDB with MongoEngine
PRML Chapter 2 Probability Distribution Nonparametric Method
Implementation of Dijkstra's algorithm with python
Coexistence of Python2 and 3 with CircleCI (1.0)
Lognormal probability plot with Python, matplotlib
About the basics list of Python basics
Basic study of OpenCV with Python
Learn the basics of Python ① Beginners
Calculate the probability of being a squid coin with Bayes' theorem [python]
"Manim" that can draw animation of mathematical formulas and graphs with Python
Rehabilitation of Python and NLP skills starting with "100 Language Processing Knock 2015" (Chapter 1)
Python learning memo for machine learning by Chainer Chapter 13 Basics of neural networks