Statistical test grade 2 probability distribution learned in Python ①

Introduction

While studying statistical tests, various probability distributions come out, but I think it's hard to get an image just by looking at mathematical formulas. While moving various parameters in Python, draw the probability distribution and attach the image.

reference

For the explanation of the probability distribution, refer to the following.

-Statistics time -Introduction to Statistics (Basic Statistics I) Department of Statistics, Faculty of Liberal Arts, University of Tokyo

Various probability distributions

This article does not give detailed explanations such as the derivation of various mathematical formulas, but focuses on grasping the shape of each distribution and the meaning of that distribution. This article deals with the following two distributions.

--Binomial distribution --Poisson distribution

Binomial distribution

The number of successful $ n $ trials of independent trials (Bernoulli trials) with only two outcomes, such as "whether the coin is tossed, front or back" is $ X $ The distribution that follows is called the binomial distribution **.

――The number of times you roll the dice 10 times and get 1 ――The number of times the coin appears when you throw it 5 times ――The number of times a baseball team with a winning percentage of 70% plays 144 games and wins

Etc. follow the binomial distribution.

The formula for the probability mass function of the binomial distribution is expressed as follows.


P(X = k) = {}_n C _kp^k(1-p)^{n-k}

$ n $ is the number of trials, $ p $ is the probability of success of the trial, and $ k $ is the number of successful trials.

Also, when the random variable $ X $ follows the binomial distribution, the expected value $ E (X) $ and the variance $ V (X) $ are as follows.


E(X) = np


V(X) = np(1 - p)

I think that the expected value is the product of the number of trials $ n $ and the probability of success $ p $, which matches the feeling.

Now let's draw a probability distribution in Python. When you make $ 50 $ trials with a probability of success of $ 10 % $, check the distribution of the number of successes.

import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()

def comb_(n, k):
    result = math.factorial(n) / (np.math.factorial(n - k) * np.math.factorial(k))
    return result


def binomial_dist(p, n, k):
    result = comb_(n, k) * (p**k) * ((1 - p) ** (n - k))
    return result

x =  np.arange(1, 50, 1)

y = [binomial_dist(a, 50, i) for i in x]

plt.bar(x, y, align="center", width=0.4, color="blue", 
             alpha=0.5, label="binomial p= " + "{:.1f}".format(a))

plt.legend()
plt.ylim(0, 0.3)
plt.xlim(0, 50)

plt.show()
plt.savefig('binomial_dist_sample.png')

binomial_dist_sample.png

Since the probability of success is $ 10 % $, you can see that the probability of success is still the highest for $ 4,5 $. It also matches that the expected value is $ np = 50 × 0.1 = 5 $. You can also see that the odds of success over $ 10 $ are very low, and $ 20 $ is a miracle level.

Now let's see how the distribution changes as we increase the probability of success (change $ p $).

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
%matplotlib notebook

def comb_(n, k):
    result = math.factorial(n) / (np.math.factorial(n - k) * np.math.factorial(k))
    return result


def binomial_dist(p, n, k):
    result = comb_(n, k) * (p**k) * ((1 - p) ** (n - k))
    return result

fig = plt.figure()

def update(a):
    plt.cla() 
    x =  np.arange(1, 50, 1)
    y = [binomial_dist(a, 50, i) for i in x]

    plt.bar(x, y, align="center", width=0.4, color="blue", 
                 alpha=0.5, label="binomial p= " + "{:.1f}".format(a))
    
    plt.legend()
    plt.ylim(0, 0.3)
    plt.xlim(0, 50)
    
    
ani = animation.FuncAnimation(fig,
                              update,
                              interval=1000,
                              frames = np.arange(0.1, 1, 0.1),
                              blit=True)
plt.show()
ani.save('Binomial_dist.gif', writer='pillow') 

Binomial_dist.gif

You can see that the closer $ p $ is to $ 0.5 $ (probability of success is $ 50 % $), the wider the tail of the distribution, and the closer it is to $ 0 $ or $ 1 $, the sharper the shape. Looking at the formula $ V (X) = np (1-p) $, we can see that the closer $ p $ is to $ 0.5 $, the larger the variance value. If the probability of success is even, the results will vary accordingly, which is in line with the feeling.

Poisson distribution

Next is the Poisson distribution. The probability distribution that represents the probability that an event that occurs an average of $ \ lambda $ times per unit time occurs exactly $ k $ times is called a Poisson distribution.

――Number of vehicles passing through a specific intersection in one hour --Number of visits to the website in one hour --Number of emails received per day ――Number of visitors to the store within a certain period of time

Etc. are said to follow the Poisson distribution.

The formula for the Poisson distribution probability mass function is expressed as follows.

P(X=k) = \frac{\lambda^k \mathrm{e}^{-\lambda}}{k!}

It is a very confusing formula, but if you want to know the detailed derivation process, please see Previous article. The expected value $ E (X) $ and the variance $ V (X) $ when the random variable $ X $ follows the Poisson distribution are as follows.


E(X) = \lambda


V(X) = \lambda

Since we are talking about events that occur an average of $ \ lambda $ times, it makes sense that the expected value is $ \ lambda $ as it is.

Now let's draw a probability distribution in Python. Let's draw the Poisson distribution on top of each other to see what the average $ 5 $ event, average $ 10 $ event, and average $ 15 $ event occur per unit time.


import numpy as np
import matplotlib.pyplot as plt

def poisson(k, lambda_):
    k = int(k)
    result = (lambda_**k) * (np.exp(-lambda_))  / np.math.factorial(k)
    return result

x =  np.arange(1, 50, 1)
y1= [poisson(i, 5) for i in x]
y2= [poisson(i, 15) for i in x]
y3= [poisson(i, 30) for i in x]

plt.bar(x, y1, align="center", width=0.4, color="red"
                ,alpha=0.5, label="Poisson λ= %d" % 5)

plt.bar(x, y2, align="center", width=0.4, color="green"
                ,alpha=0.5, label="Poisson λ= %d" % 15)

plt.bar(x, y3, align="center", width=0.4, color="blue"
                ,alpha=0.5, label="Poisson λ= %d" % 30)

plt.legend()
plt.savefig('Poisson_sample.png')
plt.show()

Poisson_sample.png

Since the value of $ \ lambda $ is equal and distributed, the larger the value of $ \ lambda $, the wider the tail of the probability distribution. Looking at the movement of the change in the distribution when $ \ lambda $ is increased, it looks like this.


import matplotlib.pyplot as plt
import matplotlib.animation as animation
import numpy as np
from scipy.stats import poisson
fig = plt.figure()

def update(a):
    plt.cla()
    
    x =  np.arange(1, 50, 1)

    y = [poisson.pmf(i,a) for i in x]

    plt.bar(x, y, align="center", width=0.4, color="red", 
                 alpha=0.5, label="Poisson λ= %d" % a)
    
    plt.legend()
    plt.ylim(0, 0.3)
    plt.xlim(0, 50)
    
ani = animation.FuncAnimation(fig,
                              update,
                              interval=500,
                              frames = np.arange(1, 31, 1),
                              blit=True)
plt.show()
ani.save('Poisson_distribution.gif', writer='pillow') 

Poisson_distribution.gif

You can see that the tail of the distribution is changing as the value of $ \ lambda $ increases. The larger the average number of events that occur per unit time, $ \ lambda $, the more the number of events that occur will vary.

Binomial and Poisson distributions

The Poisson distribution is actually a probability distribution created based on the binomial distribution. The Poisson distribution is a Poisson distribution that brings $ n → ∞ $ closer to $ p → 0 $ while keeping $ np = \ lambda $ constant. (It's called the Poisson Central Limit Theorem. [Previous article] (https://qiita.com/g-k/items/836820b826775feb5628), so if you are interested, please have a look there. )

In other words, among the events that follow the binomial distribution, ** events that are numerous and rarely occur ** follow the Poisson distribution.

As a concrete example, let's draw the binomial distribution of $ n = 100 $ $ p = 0.1 $ and the Poisson distribution of $ \ lambda = 1 $ on top of each other.

import numpy as np
import matplotlib.pyplot as plt

def poisson(k, lambda_):
    result = (lambda_**k) * (np.exp(-lambda_))  / np.math.factorial(k)
    return result

def comb_(n, k):
    result = math.factorial(n) / (np.math.factorial(n - k) * np.math.factorial(k))
    return result

def binomial_dist(p, n, k):
    result = comb_(n, k) * (p**k) * ((1 - p) ** (n - k))
    return result

x =  np.arange(1, 100, 1)
y1= [poisson(i, 1) for i in x]
y2 = [binomial_dist(0.01, 100, i) for i in x]

plt.xlim(0, 30)

plt.bar(x, y1, align="center", width=0.4, color="red"
                ,alpha=0.5, label="Poisson λ= %d" % 1)

plt.bar(x, y2, align="center", width=0.4, color="blue", 
                 alpha=0.5, label="binomial p= " + "{:.2f}".format(0.01))

plt.legend()
plt.savefig('bino_poisson.png')
plt.show()

bino_poisson.png

You can see that the distributions almost exactly overlap. By actually drawing the distribution in this way, it becomes easier to understand the relationship between the distributions.

NEXT Next time, we will focus on "geometric distribution," "exponential distribution," and "negative binomial distribution."

Recommended Posts

Statistical test grade 2 probability distribution learned in Python ②
Statistical test grade 2 probability distribution learned in Python ①
[Statistical test 2nd grade] Discrete probability distribution
Statistical test (multiple test) in Python: scikit_posthocs
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (2)
Logistic distribution in Python
1. Statistics learned with Python 2-1. Probability distribution [discrete variable]
Check the asymptotic nature of the probability distribution in Python
Refactoring Learned in Python (Basic)
Algorithm in Python (primality test)
Python classes learned in chemoinformatics
Generate U distribution in Python
What I learned in Python
Character code learned in Python
Python functions learned in chemoinformatics
Hypothesis test and probability distribution
Set python test in jenkins
1. Statistics learned with Python 2. Probability distribution [Thorough understanding of scipy.stats]
Mixed normal distribution implementation in python
Write selenium test code in python
I learned about processes in Python
Elementary ITK usage learned in Python
Try transcribing the probability mass function of the binomial distribution in Python
Basic Linear Algebra Learned in Python (Part 1)
Stress Test with Locust written in Python
Write the test in a python docstring
Markov chain transition probability written in Python
Collectively implement statistical hypothesis testing in Python
Post Test 3 (Working with PosgreSQL in Python)
Studying Mathematics in Python: Solving Simple Probability Problems
Create a Vim + Python test environment in 1 minute
I want to do Dunnett's test in Python
Python variables and data types learned in chemoinformatics
Match the distribution of each group in Python
Prime number enumeration and primality test in Python
Set up a test SMTP server in Python.
TensorFlow: Run data learned in Python on Android
Quadtree in Python --2
CURL in python
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
Meta-analysis in Python
Unittest in python
Discord in Python
DCI in Python
quicksort in python
nCr in python
N-Gram in Python
Distribution and test
Programming in python
Plink in Python
Constant in python
Python Integrity Test
Lifegame in Python.
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Csv in python