[PYTHON] I tried to summarize the relationship between probability distributions starting from the Bernoulli distribution

Introduction

For the purpose of understanding the relationship between probability distributions, we have implemented and summarized functions that generate various probability distributions in Python, starting from the Bernoulli distribution. This article is for the following people.

The implemented Python functions are as follows. Function names include bernolli and binom.

kankei1.png

For example, a uniform distribution can be generated by following the following order. Bernoulli distribution-> geometric distribution-> exponential distribution-> gamma distribution-> beta distribution-> uniform distribution The definition of function name and probability distribution follows scipy.stats.   Now, let's explain each function in turn.

Bernoulli distribution

First, generate a sample that follows the Bernoulli distribution of the starting point. The Bernoulli distribution is a probability distribution that follows "the number of wins when the $ p $ lottery is drawn once". As you can see from the definition, the possible values are 0 or 1. The probability function is given by the following equation.

f(x;p) = p^x (1-p)^{1-x}\ \ (x=0,1)

Proceed to Python implementation. Import numpy and scipy.stats in advance.

import numpy as np
from scipy import stats

Below is a function that generates sampleSize samples that follow a Bernoulli distribution with a probability of $ p $.

def bernoulli(p, sampleSize):
    return stats.bernoulli.rvs(p, size=sampleSize)

Let's try to generate 20 samples from this function.

bernoulli(0.2, 20)
> array([1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1])

According to the setting, 1 appears with a probability of about 0.2. This is the last time I use scipy.stats, and after that I will generate a probability distribution based on this bernoulli function.

From Bernoulli distribution to binomial distribution

The binomial distribution is a probability distribution that follows "the number of wins when a lottery with a probability of winning $ p $ is drawn $ n $ times". The probability function is given by the following equation.

f(x;n,p) = \left(
\begin{matrix}
n\\
x
\end{matrix}
\right) p^x(1-p)^{n-x}\ \ (x=0,1,\cdots,n)

The binomial distribution is the Bernoulli distribution's "number of lottery draws" multiplied by $ n $. sampleSize "samples generated from a binomial distribution with $ n $ number of trials and $ p $ probability" are made from $ n \ times $ sampleSize "samples generated from a Bernoulli distribution with a probability $ p $" can do. The implementation is as below.

def binom(n, p, sampleSize):
    x = bernoulli(p, n * sampleSize)
    x = x.reshape([n, sampleSize])
    return np.sum(x, axis=0)

Binomial to Poisson distribution

The Poisson distribution is a probability distribution that follows "the number of lottery tickets that win $ \ mu $ per unit time". The probability function is given by the following equation.

f(x;\mu) = \frac{\mu^x}{x!}\exp(-\mu)\ \ (x=0,1,\cdots)

As you can see below, the Poisson distribution is very similar to the binomial distribution.

Binomial distribution Poisson distribution
What do you see Number of hits Number of hits
Under what conditions Constant number of trials Constant time

Assuming that $ n $ in the binomial distribution with $ n $ in number of trials and $ \ frac {\ mu} {n} $ in probability is $ n \ to \ infinity $, it converges to a Poisson distribution with an average of $ \ mu $. The implementation is as below.

def poisson(mu, sampleSize):
    n = 1000 #Large enough number
    p = mu / n
    return binom(n, p, sampleSize)

Theoretically, the larger the $ n $, the better, but the amount of calculation increases. A sample that follows a Poisson distribution was generated from the binom function. The binom function is based on the bernoulli function. In other words, we generated a Poisson distribution from the Bernoulli distribution. We will continue to use this method in the future.

Geometric distribution from Bernoulli distribution

The geometric distribution is a probability distribution that follows "the number of trials when the probability of winning is drawn until the lottery of $ p $ is won". The probability function is given by the following equation.

f(x;p)= p(1-p)^{x-1}\ \ (x=1,2,\cdots)

Samples generated from a geometric distribution with a probability of $ p $ can be made from a Bernoulli distribution with a probability of $ p $. The implementation is as below.

def geom(p, sampleSize):
    x = []
    for _ in range(sampleSize):
        t = 1
        while bernoulli(1 - p, 1):
            t += 1
        x.append(t)
    return np.array(x)
def geom(p, sampleSize):
    n = 1000  
    x = []
    s = np.array([-1], dtype=int)
    while len(x) < sampleSize:
        x_ = bernoulli(p, n) #Simultaneously generate n samples that follow the Bernoulli distribution.
        x_ = np.where(x_ == 1)[0]
        x_ = np.concatenate([s, x_])
        x.extend(x_[1:] - x_[:-1])
        s = np.array([x_[-1] - n])
    return np.array(x[:sampleSize])

The two functions do the same thing, but I think the former is easier to understand and the latter is faster.

Negative binomial distribution from geometric distribution

The negative binomial distribution is a probability distribution that follows "the number of misses when the lottery with a probability of winning $ p $ is drawn until it wins $ n $ times". The probability function is given by the following equation.

f(x;n,p) = \left(
\begin{matrix}
n+x-1\\
x
\end{matrix}
\right) p^n(1-p)^{x}\ \ (x=0,1,\cdots,n)


The only essential difference from the geometric distribution is "how many times you win the lottery". As shown below, a negative binomial distribution can be created from a geometric distribution.

def nbinom(n, p, sampleSize):
    x = geom(p, sampleSize * n) - 1
    x = x.reshape([sampleSize, n])
    return np.sum(x, axis=1)

From geometric distribution to exponential distribution

The exponential distribution is a probability distribution that follows "the time it takes for a lottery to win $ \ frac {1} {\ lambda} $ per unit time on average". The probability density function is given by the following equation.

f(x,\lambda) = \lambda\exp(-\lambda x)

The exponential distribution is very similar to the geometric distribution as follows:

Geometric distribution Exponential distribution
What do you see Number of trials until one hit Time to get one hit
What kind of lottery? The probability of winning each trial is constant The average number of hits per unit time is constant

The property that the probability of winning does not change regardless of time or trial is called ** memorylessness **. Both geometric and exponential distributions have this property. You can create an exponential distribution by setting $ p \ to0 $ in the geometric distribution.

def expon(lam, sampleSize):
    p = 0.01  #Small enough
    x = geom(p, sampleSize)
    return x * p * lam

Theoretically, the smaller $ p $ is, the better, but the amount of calculation increases.

From exponential distribution to Poisson distribution

Poisson distributions can be created from exponential distributions as well as binomial distributions. The slide in The bad relationship between the exponential distribution and the Poisson distribution is very easy to understand. The slide shows the implementation of R, but here is the implementation in Python.

def poisson_(mu, sampleSize):
    x = []
    while len(x) < sampleSize:
        t = 0
        n = -1
        while t <= 1:
            t += expon(1 / mu, 1)
            n += 1
        x.append(n)
    return np.array(x)

From exponential distribution to gamma distribution

The gamma distribution is a probability distribution that follows "the time it takes to win $ \ alpha $ times in a lottery that averages $ \ frac {1} {\ lambda} $ per unit time". The probability density function is given by the following equation.

f(x;\alpha,\lambda)=\frac{\lambda^\alpha}{\Gamma(\alpha)}x^{\alpha-1}\exp(-\lambda x)\ \ (x\ge0)

However, $ \ Gamma (\ alpha) $ is given by the following formula.

\Gamma(\alpha)=\int_0^\infty t^{\alpha-1}\exp(-t)dt

The only essential difference from the exponential distribution is "how many hits do you want to see?" The gamma distribution can be made from an exponential distribution.

def gamma(alpha, lam, sampleSize):
    x = expon(lam, sampleSize * alpha)
    x = x.reshape([sampleSize, alpha])
    return np.sum(x, axis=1)

The argument $ \ alpha $ of the above gamma function must be a natural number. Anything is OK as long as $ \ alpha $ of the general gamma distribution is a positive number, but if it is $ \ alpha $ other than a natural number, it cannot be made from the Bernoulli distribution and it is difficult to explain, so I will limit it to natural numbers here. did.

Gamma distribution from negative binomial distribution

The gamma distribution can also be created from the negative binomial distribution. The gamma distribution and the negative binomial distribution are very similar, as follows:

Negative binomial distribution Gamma distribution
What do you see nNumber of trials until a hit \alphaTime to get a hit
What kind of lottery? The probability of winning each trial is constant The average number of hits per unit time is constant

Creating a gamma distribution from a negative binomial distribution is exactly the same as creating an exponential distribution from a geometric distribution.

def gamma_(alpha, lam, sampleSize):
    p = 0.01  #Small enough
    x = nbinom(alpha, p, sampleSize)
    return x * p * lam

Again, the argument $ \ alpha $ must be a natural number.

Binomial to normal distribution

Any distribution can be added infinitely to a normal distribution (central limit theorem). There is an example in my first article "Checking the asymptotic nature of probability distributions in Python". The binomial distribution was the sum of $ n $ of the Bernoulli distribution. The binomial distribution follows a normal distribution by setting $ n \ to \ infinty $. The probability density function of the normal distribution with mean $ \ mu $ and variance $ \ sigma ^ 2 $ is given by the following equation.

f(x;\mu,\sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp\biggl\{-\frac{(x-\mu)^2}{2\sigma^2}\biggr\}

Samples that follow a normal distribution with mean $ \ mu $ and standard deviation $ \ sigma $ can be generated in the following ways:

def norm(mu, sigma, sampleSize):
    n = 1000 #Large enough number
    p = 0.5
    x = binom(n, p, sampleSize)
    sd = (n * p * (1 - p)) ** 0.5
    x = (x - n * p) / sd * sigma + mu
    return x

From normal distribution to chi-square distribution

I can't explain it like a lottery, but the $ \ chi ^ 2 $ distribution is often used in hypothesis testing. When $ X_1, X_2, \ cdots, X_ {df} $ independently follow a standard normal distribution (mean 0, variance 1 normal), $ X_1 ^ 2 + X_2 ^ 2 + \ cdots + X_ {df} ^ 2 $ is called the $ \ chi ^ 2 $ distribution with $ df $ of degrees of freedom. The implementation is as below.

def chi2(df, sampleSize):
    x = norm(0, 1, sampleSize * df) ** 2
    x = x.reshape([sampleSize, df])
    return np.sum(x, axis=1)

Chi-square distribution to gamma distribution

Again, like a lottery, the $ \ chi ^ 2 $ distribution with 1 degree of freedom matches the gamma distribution with $ \ alpha = 1/2 $ and $ \ lambda = 1/2 $. Due to the ** regeneration ** property of the gamma distribution, it is possible to generate a gamma distribution when $ \ alpha $ is $ n $ times ($ n $ is a natural number) of $ \ frac {1} {2} $. Reproducibility will be explained at the end.

def gamma__(alpha, lam, sampleSize):
    df = int(np.round(alpha * 2))
    x = chi2(1, sampleSize * df)
    x = x.reshape([sampleSize, df])
    x = x * lam / 2
    return np.sum(x, axis=1)

The argument $ \ alpha $ of this function must be $ n $ times ($ n $ is a natural number) of $ \ frac {1} {2} $.

From gamma distribution to beta distribution

This cannot be explained like a lottery, but based on "gamma distribution with parameters $ (\ alpha, \ lambda) $" and "gamma distribution with parameters $ (\ beta, \ lambda) $" Can generate a beta distribution of $ (\ alpha, \ beta) $. The density function of the beta distribution is given by the following equation.

f(x;\alpha,\beta)=x^{\alpha-1}(1-x)^{\beta-1}\ \ (0\le x\le 1)

Since the range is 0 to 1, it is often used as a prior distribution of parameters representing probabilities in Bayesian statistics.

def beta(alpha, beta, sampleSize):
    x1 = gamma(alpha, 1, sampleSize)
    x2 = gamma(beta, 1, sampleSize)
    return x1 / (x1 + x2)

Beta distribution to uniform distribution

As is clear from the density function of the beta distribution, if $ \ alpha, \ beta = 1 $, the beta distribution matches the uniform distribution. Samples that follow a uniform distribution can be made from a beta distribution.

def uniform(sampleSize):
    return beta(1, 1, sampleSize)

Summary

kankei2.png

The relationship between each other is clear at a glance.

Now let's talk about ** reproducibility **. Reproductiveness is the property that the addition of distributions can be replaced by the addition of parameters. For example

  1. X the binomial distribution with $ n $ trials and $ p $ probability
  2. Y for the binomial distribution with $ m $ number of trials and $ p $ probability

Then, $ X + Y $ follows a binomial distribution with $ (n + m) $ in trial count and $ p $ in probability. Since 1. is equal to the sum of $ n $ Bernoulli distributions and 2. is equal to the sum of $ m $ Bernoulli distributions, $ X + Y $ is the sum of $ (n + m) $ Bernoulli distributions. That is, the number of trials matches the binomial distribution of $ (n + m) $. According to the same theory, the distribution at the tip of the pluralization arrow in the above figure always has reproductive properties (binomial distribution, negative binomial distribution, gamma distribution, $ \ chi ^ 2 $ distribution). And, of course, the normal distribution and Poisson distribution, which are the limits of the binomial distribution, are also reproducible.

Finally Python experiment

Try the Bernoulli distribution-> geometric distribution-> exponential distribution-> gamma distribution-> beta distribution-> uniform distribution-> Bernoulli distribution.

Define the function for plotting in advance.

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='whitegrid')

def plot_descreteDitribution(sample, truePf, xmin, xmax):
    sampleSize = len(sample)
    x = np.arange(xmin, xmax + 1)
    pf = np.array([np.sum(sample == i) for i in x]) / sampleSize  #Distribution obtained by experiment
    
    #Plot the distribution obtained in the experiment in blue
    plt.plot(x - 0.1, pf, 'bo', ms=8)
    plt.vlines(x - 0.1, 0, pf, colors='b', lw=5, alpha=0.5)

    #Plot the true distribution in red
    plt.plot(x + 0.1, truePf(x), 'ro', ms=8)
    plt.vlines(x + 0.1, 0, truePf(x), colors='r', lw=5, alpha=0.5)

def plot_continuousDitribution(sample, truePf, xmin, xmax):
    sampleSize = len(sample)
    x = np.linspace(xmin, xmax, 100)

    #Plot the distribution obtained in the experiment in blue
    th = np.linspace(xmin, xmax, 30)
    hi = np.array([np.sum(np.logical_and(th[i] < sample, sample <= th[i + 1])) for i in range(30 - 1)]) / (sampleSize * (th[1] - th[0]))
    plt.bar((th[:-1] + th[1:]) / 2, hi, width=(th[1] - th[0]))

    #Plot the true distribution in red
    plt.plot(x, truePf(x), 'r', linewidth=4)

    plt.xlim(xmin, xmax)

Geometric distribution (from Bernoulli distribution)

p = 0.2
xmin = 1
xmax = 20

sample = geom(p, sampleSize)
truePf = lambda x: stats.geom.pmf(x, p)
plot_descreteDitribution(sample, truePf, xmin, xmax)

--Result

幾何分布.png

Red is the distribution created by scipy.stats, and blue is the distribution created from the function created this time. The sampleSize was set to 10000. The same applies thereafter.

Exponential distribution (from Bernoulli distribution-> geometric distribution)

lam = 2
xmin = 0
xmax = 10

sample = expon(lam, sampleSize)
truePf = lambda x: stats.expon.pdf(x, scale=lam)
plot_continuousDitribution(sample, truePf, xmin, xmax)

--Result

指数分布.png

Gamma distribution (from Bernoulli distribution-> geometric distribution-> exponential distribution)

alpha = 3
lam = 2
xmin = 0
xmax = 20

sample = gamma(alpha, lam, sampleSize)
truePf = lambda x: stats.gamma.pdf(x, alpha, scale=lam)
plot_continuousDitribution(sample, truePf, xmin, xmax)

--Result

ガンマ分布.png

Beta distribution (from Bernoulli distribution-> ...-> gamma distribution)

alpha = 3
bet = 2
xmin = 0
xmax = 1

sample = beta(alpha, bet, sampleSize)
truePf = lambda x: stats.beta.pdf(x, alpha, bet)
plot_continuousDitribution(sample, truePf, xmin, xmax)

--Result

ベータ分布.png

Uniform distribution (from Bernoulli distribution-> ...-> beta distribution)

xmin = 0
xmax = 1

sample = uniform(sampleSize)
truePf = lambda x: stats.uniform.pdf(x)
plot_continuousDitribution(sample, truePf, xmin, xmax)

--Result

一様分布.png

Bernoulli distribution (from Bernoulli distribution-> ...-> uniform distribution)! ??

p = 0.2
xmin = -2
xmax = 4

def bernoulli_(p, sampleSize):
    x = uniform(sampleSize)    
    return (x < p).astype(int)

sample = bernoulli_(p, sampleSize)
truePf = lambda x: stats.bernoulli.pmf(x, p)
plot_descreteDitribution(sample, truePf, xmin, xmax)

--Result

ベルヌーイ分布.png

You can see that it worked.

References

-Various probability distributions and their relationships -Summary of characteristics of typical probability distribution

Recommended Posts

I tried to summarize the relationship between probability distributions starting from the Bernoulli distribution
I tried to summarize the umask command
I tried to summarize the graphical modeling.
LeetCode I tried to summarize the simple ones
I tried to detect the iris from the camera image
I tried to summarize the basic form of GPLVM
I tried to summarize the string operations of Python
I tried to summarize SparseMatrix
[First COTOHA API] I tried to summarize the old story
I tried to summarize the code often used in Pandas
[Python] I tried to visualize the follow relationship of Twitter
I tried to summarize the commands often used in business
[Machine learning] I tried to summarize the theory of Adaboost
I tried to enumerate the differences between java and python
I tried changing the python script from 2.7.11 to 3.6.0 on windows10
I tried to get various information from the codeforces API
I tried to summarize how to use the EPEL repository again
I tried to summarize the languages that beginners should learn from now on by purpose
I tried to move the ball
I tried to estimate the interval.
[Linux] I tried to summarize the command of resource confirmation system
I tried to summarize the commands used by beginner engineers today
I tried to cut out a still image from the video
I tried to summarize the frequently used implementation method of pytest-mock
I tried to summarize Python exception handling
[Python] I tried to summarize the set type (set) in an easy-to-understand manner.
I tried to summarize until I quit the bank and became an engineer
I tried to execute SQL from the local environment using Looker SDK
I tried to visualize the age group and rate distribution of Atcoder
I tried to summarize the general flow up to service creation by self-education.
I tried to recognize the wake word
Python3 standard input I tried to summarize
I tried to summarize various sentences using the automatic summarization API "summpy"
I tried to summarize the logical way of thinking about object orientation.
I tried to learn the angle from sin and cos with chainer
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I tried to summarize the Linux commands used by beginner engineers today-Part 1-
I tried to summarize Ansible modules-Linux edition
I tried to summarize the settings for various databases of Django (MySQL, PostgreSQL)
I tried to summarize the operations that are likely to be used with numpy-stl
[IBM Cloud] I tried to access the Db2 on Cloud table from Cloud Funtions (python)
[Python] I tried to get the type name as a string from the type function
I didn't understand the Resize of TensorFlow so I tried to summarize it visually.
I tried to pass the G test and E qualification by training from 50
I tried web scraping to analyze the lyrics.
I tried hitting the Qiita API from go
I tried to optimize while drying the laundry
I tried to save the data with discord
I tried to touch the API of ebay
I tried to correct the keystone of the image
Qiita Job I tried to analyze the job offer
I tried to implement the traveling salesman problem
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
[LPIC 101] I tried to summarize the command options that are easy to make a mistake
[CodeIQ] I wrote the probability distribution of dice (from CodeIQ math course for machine learning [probability distribution])
I tried to sort out the objects from the image of the steak set meal-④ Clustering
I tried to summarize the new coronavirus infected people in Ichikawa City, Chiba Prefecture
I tried to uniquely determine the malware name from the report information obtained from Virus Total
[Python] I tried to summarize the array, dictionary generation method, loop method, list comprehension notation