[PYTHON] [Statistical test 2nd grade] Discrete probability distribution

Introduction

Various probability distributions appear in the second grade of the statistical test. This time, I briefly summarized the discrete probability distributions. Also, in Python, actually draw each probability distribution to deepen your understanding. (The explanation of the code is omitted.)

symbol meaning
P(A) EventAProbability of
X Random variable
E[X] Random variableXExpected value of
V[X] Random variableXDispersion
n Number of trials

Binomial distribution

Binomial distribution that the probability of success is $ p $ and the number of successes is $ x $ when $ n $ Bernoulli trials are performed, that is, the number of successes $ X = x $ follows. It is called distribution. The probability function is

P(X=x)≡f(x)={}_n C_{x}p^x(1-p)^{n-x}\\

It will be.

Also, if the expected value and variance of the binomial distribution are calculated based on the definition,

E[X]=np\\
V[X]=E[X^2]-μ^2=np(1-p)\\

It will be.

In particular, the distribution when $ n = 1 $ is called the Bernoulli distribution.

Draw a binomial distribution ($ n = 40, p = 0.25,0.5,0.75 $) using Python's scipy library.

from scipy.stats import binom 
import numpy as np
import matplotlib.pyplot as plt

x =  np.arange(1, 40, 1)
y1= [binom.pmf(i, 40, 0.25) for i in x]
y2= [binom.pmf(i, 40, 0.5) for i in x]
y3= [binom.pmf(i, 40, 0.75) for i in x]

plt.bar(x, y1, width=0.5, color="r" ,alpha=0.5, label="Binom p= {}".format(0.25))
plt.bar(x, y2, width=0.5, color="g" ,alpha=0.5, label="Binom p= {}".format(0.5))
plt.bar(x, y3, width=0.5, color="b",alpha=0.5, label="Binom p= {}".format(0.75))

plt.legend(loc=8)
plt.show()

スクリーンショット 2020-11-03 3.14.25.png

Poisson distribution

The Poisson distribution is the probability distribution obtained when the expected value $ np = λ $ is fixed in the binomial distribution and the limits of $ n → ∞ and p → 0 $ are taken for the number of trials and the probability of success. The probability function is

P(X=x)≡f(x)=\frac{e^{-λ}λ^x}{x!}\\

It will be.

The expected value and variance of this distribution

E[X]=λ\\
V[X] = λ\\

It will be. This is easy to see given these limits, as the expected value and variance of the binomial distribution are $ np $ and $ np (1-p) $.

Let's draw a Poisson distribution of $ λ = 10, 20, 30 $.

from scipy.stats import poisson
fig, ax = plt.subplots(1, 1)

x =  np.arange(1, 50, 1)
y1= [poisson.pmf(i, 10) for i in x]
y2= [poisson.pmf(i, 20) for i in x]
y3= [poisson.pmf(i, 30) for i in x]

plt.bar(x, y1, width=0.5, color="r", alpha=0.5, label="Poisson λ= {}".format(10))

plt.bar(x, y2, width=0.5, color="g", alpha=0.5, label="Poisson λ= {}".format(20))

plt.bar(x, y3, width=0.5, color="b", alpha=0.5, label="Poisson λ= {}".format(30))

plt.legend()
plt.show()

スクリーンショット 2020-11-03 4.10.17.png

The Poisson distribution is the distribution obtained when considering the limits of the binomial distribution parameters $ n $ and $ p $. Let's see how big $ n $ actually overlaps the two distributions.

Fix it at $ λ = 10 $, change $ n $ and $ p $, and see how the distribution changes.

from scipy.stats import poisson
fig, axes = plt.subplots(1, 3, figsize=(15,5))

x =  np.arange(1, 30, 1)
y1= [poisson.pmf(i, 10) for i in x]
y2 = [binom.pmf(i, 10**1, 10**0) for i in x]
y3 = [binom.pmf(i, 10**2, 10**-1) for i in x]
y4 = [binom.pmf(i, 10**3, 10**-2) for i in x]

axes[0].bar(x, y1, width=0.5, color="r", alpha=0.3, label="Poisson λ= {}".format(10))
axes[0].bar(x, y2, width=0.5, color="b", alpha=0.3, label="Binom n= {}".format(10))
axes[0].set_title('n=10')
axes[0].legend()
axes[1].bar(x, y1, width=0.5, color="r", alpha=0.3, label="Poisson λ= {}".format(10))
axes[1].bar(x, y3, width=0.5, color="b", alpha=0.3, label="Binom n= {}".format(100))
axes[1].set_title('n=100')
axes[1].legend()
axes[2].bar(x, y1, width=0.5, color="r", alpha=0.3, label="Poisson λ= {}".format(10))
axes[2].bar(x, y4, width=0.5, color="b", alpha=0.3, label="Binom n= {}".format(1000))
axes[2].set_title('n=1000')
axes[2].legend()

スクリーンショット 2020-11-03 4.49.04.png

Comparing the graphs, it cannot be said that the approximation is good at n = 10, but it can be seen that the distribution is almost the same when n = 100, 1000. In other words, it seems good to say that Bernoulli trials with three or more digits follow the Poisson distribution.

Geometric distribution

The probability distribution of the number of trials X when a Bernoulli trial with a success probability of $ p $ is repeated until it succeeds for the first time is called a geometric distribution. The probability function of this distribution is

P(X=x)≡f(x)=p(1-p)^{x-1}\\

It will be. Expected value, variance

E[X]=\frac{1}{p}\\
V[X]=\frac{1-p}{p^2}\\

It will be. Drawing a geometric distribution of $ p = 0.1 $ in Python looks like this:

from scipy.stats import geom
fig, axes = plt.subplots(1, 1)

x =  np.arange(1, 30, 1)
y = [geom.pmf(i, 0.1) for i in x]

plt.bar(x, y, width=0.5, color="g", alpha=0.5, label="Geom p= {}".format(0.1))

plt.legend()
plt.show()

スクリーンショット 2020-11-03 5.18.32.png

Finally

Next, I would like to summarize the continuous probability distribution. I applied for the statistical test two weeks later, so I will do my best to study!

reference

Revised edition officially certified by the Japan Statistical Society, "Statistics Basics" for Level 2 Statistical Test Understand the Poisson distribution carefully and draw it in Python

Recommended Posts

[Statistical test 2nd grade] Discrete probability distribution
Statistical test grade 2 probability distribution learned in Python ②
Statistical test grade 2 probability distribution learned in Python ①
Hypothesis test and probability distribution
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (2)
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (1)
Distribution and test
1. Statistics learned with Python 2-1. Probability distribution [discrete variable]