1. Statistics learned with Python 2-1. Probability distribution [discrete variable]

#Import Numerical Library
import numpy as np
import scipy as sp
import pandas as pd
from pandas import Series, DataFrame
#Import visualization library
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
%matplotlib inline
#Japanese display module of matplotlib
!pip install japanize-matplotlib
import japanize_matplotlib

⑴ Bernoulli distribution

x = np.array([0,0,1,1,0,1,0,0])

#Calculate the probability distribution
p = len(x[x==1]) / len(x)
pmf_bernoulli = sp.stats.bernoulli.pmf(x, p)

#Visualization
plt.vlines(x, 0, pmf_bernoulli, 
           colors='blue', lw=50)
plt.xticks([0,1])
plt.xlim([0 - 0.5, 1 + 0.5])
plt.grid(True)

image.png

Two types of events probability
0 0.625
1 0.375

⑵ Binomial distribution

sp.stats.binom.pmf(n=5, p=0.5, k=2)

image.png

#Generate pseudo-random numbers
np.random.seed(1)
rvs_binom = sp.stats.binom.rvs(n=10, p=0.2, size=10000)

#Get the probability distribution
m = np.arange(0, 10+1, 1)
pmf_binom = sp.stats.binom.pmf(n=10, p=0.2, k=m)

#Visualization
sns.distplot(rvs_binom, bins=m, 
             kde=False, norm_hist=True, label='rvs')
plt.plot(m, pmf_binom, label='pmf')
plt.xticks(m)
plt.legend()
plt.grid()

image.png

Number of times the table appears probability
0 0.107374182
1 0.268435456
2 0.301989888
3 0.201326592
4 0.088080384
5 0.026424115
6 0.005505024
7 0.000786432
8 0.000073728
9 0.000004096
10 0.000000102

⑶ Poisson distribution

sp.stats.poisson.pmf(k=2, mu=5)

image.png

#Generate pseudo-random numbers
np.random.seed(1)
rvs_poisson = sp.stats.poisson.rvs(mu=2, size=10000)

#Get the probability distribution
m = np.arange(0, 10+1, 1)
pmf_poisson = sp.stats.poisson.pmf(mu=2, k=m)

#Visualization
sns.distplot(rvs_poisson, bins=m, 
             kde=False, norm_hist=True, label='rvs')
plt.plot(m, pmf_poisson, label='pmf')
plt.xticks(m)
plt.legend()
plt.grid()

image.png

Number of occurrences probability
0 0.135335283
1 0.270670566
2 0.270670566
3 0.180447044
4 0.090223522
5 0.036089409
6 0.012029803
7 0.003437087
8 0.000859272
9 0.000190949
10 0.000038190
#Specify parameters
n = 100000000
p = 0.00000002

#Calculate the probability distribution of the binomial distribution
num = np.arange(0, 10+1, 1)
pmf_binom_2 = sp.stats.binom.pmf(n=n, p=p, k=num)

#Visualization
plt.plot(m, pmf_poisson, 
         color='lightgray', lw=10, label='poisson')
plt.plot(m, pmf_binom_2, 
         color='black', linestyle='dotted', label='binomial')
plt.xticks(num)
plt.legend()
plt.grid()

image.png

⑷ Geometric distribution

%precision 3
sp.stats.geom.pmf(k=1, p=1/6)

image.png

#Specify the number of trials
num = np.arange(1, 11, 1)

#Calculate the probability distribution
prob = []
for i in num:
    value = sp.stats.geom.pmf(k=i, p=1/6)
    prob.append(value)

#Visualization
plt.bar(num, prob)
plt.xticks(num)
plt.xlabel('Number of times until 1 appears for the first time')
plt.ylabel('probability')
plt.show()

image.png

Number of trials probability a formula
1 0.167
2 0.139 ⅚ ・ ⅙
3 0.116 ⅚ ・ ⅚ ・ ⅙
4 0.096 ⅚ ・ ⅚ ・ ⅚ ・ ⅙
5 0.080 ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅙
6 0.067 ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅙
7 0.056 ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅙
8 0.047 ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅙
9 0.039 ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅙
10 0.032 ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅚ ・ ⅙

⑸ Discrete uniform distribution

#Specify all events
num = np.arange(1, 7, 1)

#Calculate the probability distribution
prob = []
for i in num:
    value = 1 / len(num)
    prob.append(value)

#Visualization
plt.bar(num, prob)
plt.xticks(num)
plt.xlabel('Dice roll')
plt.ylabel('probability')
plt.show()

image.png

Dice roll probability
1 0.167
2 0.167
3 0.167
4 0.167
5 0.167
6 0.167

⑹ hypergeometric distribution

#Specify parameters
M = 20 #Total number
n = 7  #Number of hits
N = 12 #Number of selections

#Create a random variable
k = np.arange(0, n+1)

#Create a model
hgeom = sp.stats.hypergeom(M, n, N)
#Calculate the probability distribution
pmf_hgeom = hgeom.pmf(k)

#Visualization
plt.bar(k, pmf_hgeom)
plt.xticks(k)
plt.xlabel('Number of hits')
plt.ylabel('probability')
plt.show()

image.png

Number of hits probability
0 0.00010
1 0.00433
2 0.04768
3 0.19866
4 0.35759
5 0.28607
6 0.09536
7 0.01022

⑺ Negative binomial distribution

#Specify parameters
N = 12  #Number of trials
p = 0.5 #Probability of success
k = 3   #Number of successes

#Calculate the probability distribution
pmf_nbinom = sp.stats.nbinom.pmf(range(N), k, p)

#Visualization
plt.bar(range(N), pmf_nbinom)
plt.xlabel('Number of failures')
plt.ylabel('probability')
plt.xticks(range(N))
plt.show()

image.png

Number of failures probability
0 0.125
1 0.188
2 0.188
3 0.156
4 0.117
5 0.082
6 0.055
7 0.035
8 0.022
9 0.013
10 0.008
11 0.005

Summary

We have looked at the discrete probability distribution, but we will summarize it in a list with an awareness of what is a random variable and, in a nutshell, what to put on the x-axis.

Types of probability distributions Random variable Parameters
Bernoulli distribution Event 0, 1 Probability of occurrence p
Binomial distribution Number of trials Probability of occurrence p,Number of occurrences k,Number of trials n
Poisson distribution Number of trials Average number of occurrences mu
Geometric distribution Number of trials Probability of success p,Number of trials k
Discrete uniform distribution Event type ※scipy.Uniform distribution of atats is continuous only
Hypergeometric distribution Number of successes Total number M,Number of successes in the whole n,Number of selections N
Negative binomial distribution Number of failures Probability of success p,Number of successes k,Number of trials N

Recommended Posts

1. Statistics learned with Python 2-1. Probability distribution [discrete variable]
1. Statistics learned with Python 2. Probability distribution [Thorough understanding of scipy.stats]
1. Statistics learned with Python 1-1. Basic statistics (Pandas)
[Basics of Modern Mathematical Statistics with python] Chapter 3: Typical Probability Distribution
Statistical test grade 2 probability distribution learned in Python ②
Statistical test grade 2 probability distribution learned in Python ①
[Basics of modern mathematical statistics with python] Chapter 2: Probability distribution and expected value
Calculate sample distribution with Scipy (discrete distribution)
[Python] Object-oriented programming learned with Pokemon
Perceptron learning experiment learned with Python
Python data structures learned with chemoinformatics
Efficient net pick-up learned with Python
Lognormal probability plot with Python, matplotlib
Algorithm learned with Python 9th: Linear search
[Statistical test 2nd grade] Discrete probability distribution
Algorithm learned with Python 7th: Year conversion
Algorithm learned with Python 8th: Evaluation of algorithm
[Python] Get the variable name with str
Algorithm learned with Python 4th: Prime numbers
Algorithm learned with Python 2nd: Vending machine
Algorithm learned with Python 19th: Sorting (heapsort)
Automatically aggregate JCG deck distribution with Python
Algorithm learned with Python 6th: Leap year
Algorithm learned with Python 3rd: Radix conversion
Algorithm learned with Python 12th: Maze search
Algorithm learned with Python 11th: Tree structure
Algorithm learned with Python 13th: Tower of Hanoi
Algorithm learned with Python 16th: Sorting (insertion sort)
Algorithm learned with Python 15th: Sorting (selection sort)
Algorithm learned with Python 17th: Sorting (bubble sort)
Use Python and word2vec (learned) with Azure Databricks
"Principle of dependency reversal" learned slowly with Python
FizzBuzz with Python3
Scraping with Python
Scraping with Python
Python with Go
Twilio with Python
Integrate with Python
Play with 2016-Python
AES256 with python
Tested with Python
python starts with ()
with syntax (Python)
Bingo with python
Zundokokiyoshi with python
Excel with Python
Microcomputer with Python
Cast with python
I learned Python with a beautiful girl at Paiza # 02
I learned Python with a beautiful girl at Paiza # 01
Algorithm learned with Python 18th: Sorting (stack and queue)
Check the asymptotic nature of the probability distribution in Python