The probability distribution calculated from the data is called ** "experience distribution" **. On the other hand, the function that generates the ** probability distribution is called the "theoretical distribution" **.
First, you can estimate the population distribution by first determining the ** distribution shape (type of probability distribution) **, and then the ** parameter (probability distribution parameter) **.
The module that collects such statistical functions is ** scipy.stats **. Its basic usage is ** unified notation ** as follows.

(1) Types of probability distribution

Probability functions are roughly divided into ** "discrete type" ** and ** "continuous type" **.
** Discrete type ** is a variable ** that takes discrete values, such as a dice roll. In addition, ** continuous type ** refers to those that take ** continuous values such as weight and temperature **.
Below is a list of 15 types you should know from the probability distributions implemented in ** scipy.stats **.

	Probability distribution	probability distribution	Method	data
1	Hypergeometric distribution	Hypergeometric distribution	scipy.stats.hypergeom	Discrete type
2	Bernoulli distribution	Bernoulli distribution	scipy.stats.bernoulli	Discrete type
3	Binomial distribution	binomial distribution	scipy.stats.binom	Discrete type
4	Poisson distribution	Poisson distribution	scipy.stats.poisson	Discrete type
5	Geometric distribution	Geometric Distribution	scipy.stats.geom	Discrete type
6	Negative binomial distribution	Negative Binomial Distribution	scipy.stats.nbinom	Discrete type
7	Uniform distribution	uniform distribution	scipy.stats.uniform	Discrete type
8	normal distribution	normal distribution	scipy.stats.norm	Continuous type
9	Exponential distribution	exponential distribution	scipy.stats.expon	Continuous type
10	Gamma distribution	gamma distribution	scipy.stats.gamma	Continuous type
11	Beta distribution	beta distribution	scipy.stats.betabinom	Continuous type
12	Cauchy distribution	Cauchy distribution	scipy.stats.cauchy	Continuous type
13	Lognormal distribution	lognormal distribution	scipy.stats.lognorm	Continuous type
14	Pareto distribution	Pareto distribution	scipy.stats.pareto	Continuous type
15	Weibull distribution	Weibull distribution	scipy.stats.dweibull	Continuous type

These probability distributions implement almost ** common methods **.
Therefore, if you understand the method, you can operate various probability distributions.

⑵ Functions of various methods

Here, we will look at the main methods using the most frequently used ** normal distribution scipy.stats.norm ** as an example.
Arguments are ** input data x **, parameters ** expected value (mean value) loc ** and ** standard deviation scale **, as well as ** elements. Number size , ** Random number generation seedrandom_state = integer, etc.

#Library for numerical calculations
import numpy as np
from scipy import stats

#Library for graph drawing
import matplotlib.pyplot as plt
%matplotlib inline

#Module to make matplotlib support Japanese display
!pip install japanize-matplotlib
import japanize_matplotlib

➀ rvs (Random variates) Random variables

Notation: rvs (loc = 0, scale = 1, size = 1, random_state = None)
** Random variables ** are values that change according to the law of probability **.
Based on the specified parameters, ** randomly specify the number of random variables (possible values) ** that follow a normal distribution **.

#Generate pseudo-random numbers that follow a normal distribution with rvs
norm_rvs = stats.norm.rvs(loc=50, scale=20, size=1000, random_state=0) #Expected value=50,standard deviation=20,Quantity=1000

#Visualization (expressed in histogram)
plt.hist(norm_rvs, bins=10, alpha=0.3, ec='blue')
plt.xlabel("class", fontsize=13)
plt.ylabel("frequency", fontsize=13)

plt.show()

➁ pdf (Probability density function) Probability density function

Notation: pdf (x, loc = 0, scale = 1)
** Probability density ** represents the ** relative likelihood ** of the value of the random variable X within the defined region.
To put it plainly, the ** probability density function is a function ** that calculates the probability density by taking continuous type data as an argument.

#Generate arithmetic progression
X = np.arange(start=1, stop=7.1, step=0.1)

#Generate probability density function in pdf
norm_pdf = stats.norm.pdf(x=X, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8 

#Visualization
plt.plot(X, norm_pdf)
plt.xlabel("Random variable X", fontsize=13)
plt.ylabel("Probability density pdf", fontsize=13)
plt.show()

As an attempt, let's check the probability density when the random variable x = 5.

#Calculate the probability density when the random variable 5
x = 5
y = stats.norm.pdf(x=x, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8
print("Random variable x=Probability density at 5:", y)

#Plot on the graph of probability density function
plt.plot(X, norm_pdf)
plt.plot(x, y, 'bo') #Place the blue dots

plt.vlines(x, 0.0, y, lw=2, linestyles='dashed') #Vertical line
plt.hlines(y, 1.0, x, lw=2, linestyles='dashed') #Horizon

plt.xlabel("Random variable X", fontsize=13)
plt.ylabel("Probability density pdf", fontsize=13)

plt.show()

** Probability density function pdf is a method for continuous data, and the following probability mass function pfm is used for discrete data. **

➂ pmf (Probability mass function) Probability mass function

Notation: pmf (k, n, p, loc = 0)
** Probability mass ** represents ** relative ease ** for each discrete element of random variable X.
To put it plainly, the ** probability mass function is a function ** that calculates the probability mass by taking discrete data as an argument.

Here, using ** binomial distribution ** as an example, "the probability of each success (0-4 times) when 5 trials with a success probability of 40% are performed" is shown.

#Probability of success
p = 0.4
#Number of trials
n = 5
#Number of successes
k = np.arange(0, 5) # array([0, 1, 2, 3, 4])

#Calculate the probability for each success with pmf
binom_pmf = stats.binom.pmf(k, n, p) #Number of successes=0～4,Number of trials=5,Probability of success=0.4

#Visualization
plt.plot(k, binom_pmf, 'bo', ms=8)
plt.vlines(k, 0, binom_pmf, colors='b', lw=3, alpha=0.5)

plt.xticks(k) #x-axis scale

plt.xlabel("Random variable X", fontsize=13)
plt.ylabel("Probability mass function pmph", fontsize=13)

plt.show()

➃ logpdf (Log of the probability density function) log probability density function

Notation: logpdf (x, loc = 0, scale = 1)
** The log probability density function ** is the logarithm of the probability density ** with the base e.
The value of $ a $ when the number of Napiers $ e = 2.7182818284 ... $ is multiplied by the transcendental number followed by $ a $ to obtain the probability density. The formula is $ e ^ a = $ pdf.

#Generate arithmetic progression
X = np.arange(start=1, stop=7.1, step=0.1)

#Generate logarithm of probability density function with logpdf
norm_logpdf = stats.norm.logpdf(x=X, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8

#Visualization
plt.plot(X, norm_logpdf)
plt.xlabel("Random variable X", fontsize=13)
plt.ylabel("Log probability density function logpdf", fontsize=13)
plt.show()

As an attempt, let's check the random variable x = 4, which shows the maximum logarithm of the probability density in the above figure.
As a procedure, first obtain the value of the log probability density ** of the random variable ** x = 4, then calculate the probability density of ** x = 4, then take the logarithm ** and compare both. I will.

#Get maximum log probability density
logpdf_max = np.max(norm_logpdf) 
print('x=Log probability density of 4:', logpdf_max)

#Calculate probability density
x = 4
v_pdf = stats.norm.pdf(x=x, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8

#Calculate the logarithm of probability density
v_pdf_log = np.log(v_pdf)
print('x=Logarithm of probability density of 4:', v_pdf_log)

#Visualization
plt.plot(X, norm_logpdf)
plt.plot(4, logpdf_max, 'bo') #Place the blue dots

logpdf_min = np.min(norm_logpdf) #Minimum log probability density(For drawing)
plt.vlines(x, logpdf_max, logpdf_min, lw=2, linestyles='dashed') #Vertical line
plt.hlines(logpdf_max, 1.0, x, lw=2, linestyles='dashed') #Horizon

plt.xlabel("Random variable X", fontsize=13)
plt.ylabel("Log probability density function logpdf", fontsize=13)

plt.show()

➄ cdf (Cumulative distribution function) Cumulative distribution function

Notation: cdf (x, loc = 0, scale = 1)
** Cumulative distribution function ** calculates the probability that the random variable $ X $ is less than or equal to a certain value $ x $ **.
For example, the probability that "the number of rolls is 3 or less" when throwing a dice is the sum of all the probability densities from 1 to 3.

#Generate x-axis arithmetic progression
X = np.linspace(start=1, stop=7, num=100)

#Generate cumulative distribution function with cdf
norm_cdf = stats.norm.cdf(x=X, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8

#x with cdf=Calculate cumulative probabilities of 5 or less
under_5 = stats.norm.cdf(x=5, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8
print('Probability of becoming 5 or less:', under_5)

#Visualization
plt.plot(X, norm_cdf)
plt.plot(5, under_5, 'bo') #Place the blue dots

plt.vlines(5, 0.0, under_5, lw=2, linestyles='dashed') #Vertical line
plt.hlines(under_5, 1.0, 5, lw=2, linestyles='dashed') #Horizon

plt.xlabel("Random variable X", fontsize=13)
plt.ylabel("Cumulative probability density cdf", fontsize=13)

plt.show()

The figure below shows the points of the random variable x = 5 on the probability density function with exactly the same parameters.
The cumulative distribution function corresponds to the light blue area in this figure.

➅ sf (Survival function) Survival function

Notation: sf (x, loc = 0, scale = 1)
** Survival function **, contrary to the cumulative distribution function cdf, ** Random variable $ X $ has a certain value $ x $ or more probability ** Calculate.

** Sometimes defined as "1-cdf" **, but ** sf may be more accurate ** scipy.org https://docs.scipy.org/ You can find it in doc / scipy / reference / generated / scipy.stats.norm.html.

#Generate x-axis arithmetic progression
X = np.linspace(start=1, stop=7, num=100)

#Generate survival function with sf
norm_sf = stats.norm.sf(x=X, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8

#x with sf=Calculate cumulative probabilities of 5 or more
upper_5 = stats.norm.sf(x=5, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8
print('Probability of 5 or more:', upper_5)

#Visualization
plt.plot(X, norm_sf)
plt.plot(5, upper_5, 'bo') #Place the blue dots

plt.vlines(5, 0.0, upper_5, lw=2, linestyles='dashed') #Vertical line
plt.hlines(upper_5, 1.0, 5, lw=2, linestyles='dashed') #Horizon

plt.xlabel("Random variable X", fontsize=13)
plt.ylabel("Survival function sf", fontsize=13)

plt.show()

The figure below shows the points of the random variable x = 5 on the probability density function with exactly the same parameters.
The survival function corresponds to the light blue area in the figure.

➆ ppf (Percent point function) Percent point function

Notation: ppf (q, loc = 0, scale = 1)
** The percentage point function ** is the inverse of the cumulative distribution function cdf, and if you specify the ** cumulative distribution function as q%, it returns a variable ** that takes that value.
Therefore, ppf (0.25) corresponds to the first quartile, ppf (0.5) corresponds to the median second quartile, and ppf (0.75) corresponds to the third quartile.

#Generate x-axis arithmetic progression
X = np.arange(start=1, stop=7, step=0.1)

#Generate probability density function in pdf
norm_pdf = stats.norm.pdf(x=X, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8

#Get the variable corresponding to the cumulative distribution function 75% with ppf
q_75 = stats.norm.ppf(q=0.75, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8
print('Cumulative distribution function 75% point random variable:', q_75)

#Get the probability density of the variable with pdf
v = stats.norm.pdf(x=q_75, loc=4, scale=0.8)
print('Cumulative distribution function 75% point probability density:', v)

#Visualization
plt.plot(X, norm_pdf)
plt.plot(q_75, v, 'bo') #Place the blue dots

plt.vlines(q_75, 0.0, v, lw=2, linestyles='dashed') #Vertical line
plt.hlines(v, 1.0, q_75, lw=2, linestyles='dashed') #Horizon

plt.xlabel("Random variable X", fontsize=13)
plt.ylabel("Probability density function pdf", fontsize=13)

plt.show()

➇ isf (Inverse survival function)

Notation: ʻisf (q, loc = 0, scale = 1) `
** The inverse survival function ** is the inverse of the survival function sf, but what you do is the opposite of the percentage point function ppf. ** If the remainder of the survival function is specified as q%, a variable ** that takes that value is returned.
Therefore, the quartile is the opposite of the percentage point function ppf, isf (0.25) is the third quartile, isf (0.5) is also the second quartile, and isf (0.75) is the first. Corresponds to the quartile.

#Generate x-axis arithmetic progression
X = np.arange(start=1, stop=7, step=0.1)

#Generate probability density function in pdf
norm_pdf = stats.norm.pdf(x=X, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8

#Get the variable corresponding to the reverse survival function 25% with isf
q_25 = stats.norm.isf(q=0.25, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8
print('Random variable with 25% reverse survival function:', q_25)

#Get the probability density of the variable with pdf
v = stats.norm.pdf(x=q_25, loc=4, scale=0.8)
print('Probability density of 25% point of inverse survival function:', v)

#Visualization
plt.plot(X, norm_pdf)
plt.plot(q_25, v, 'bo') #Place the blue dots

plt.vlines(q_25, 0.0, v, lw=2, linestyles='dashed') #Vertical line
plt.hlines(v, 1.0, q_25, lw=2, linestyles='dashed') #Horizon

plt.xlabel("Random variable X", fontsize=13)
plt.ylabel("Probability density function pdf", fontsize=13)

plt.show()

➈ interval (Endpoints of the range that contains alpha percent of the distribution) Interval estimation

Notation: ʻinterval (alpha, loc = 0, scale = 1)`
In the ** empirical distribution ** calculated from actual data, neither the mean nor the standard deviation is always known. I don't really know if the mean obtained from the sample can be approximated to the ** true population parameter $ θ $ ** in the population.
Therefore, ** Finding the interval including parameter $ θ $ with a certain probability is called interval estimation **.

#Generate x-axis arithmetic progression
X = np.arange(start=1, stop=7, step=0.1)

#Generate probability density function in pdf
norm_pdf = stats.norm.pdf(x=X, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8

#Get the variable corresponding to the confidence interval 95% with interval
lower, upper = stats.norm.interval(alpha=0.95, loc=4, scale=0.8) #Confidence factor=0.95,Expected value=4,standard deviation=0.8
print('Lower limit of 95% confidence interval:', lower)
print('Upper limit of 95% confidence interval:', upper)

#Get the probability density of each variable with pdf
v_lower = stats.norm.pdf(x=lower, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8
v_upper = stats.norm.pdf(x=upper, loc=4, scale=0.8) #Expected value=4,standard deviation=0.8

#Visualization
plt.plot(X, norm_pdf)
plt.plot(lower, 0.0, 'k|')
plt.plot(upper, 0.0, 'k|')

plt.vlines(lower, 0.0, v_lower, lw=0.8) #Lower limit vertical line
plt.vlines(upper, 0.0, v_upper, lw=0.8) #Upper vertical line

plt.xlabel("Random variable X", fontsize=13)
plt.ylabel("Probability density function pdf", fontsize=13)

plt.show()

In interval estimation, ** first specify the probability of the interval including the parameter $ θ $ **. In this example, it is ʻalpha = 0.95`, which is called the ** confidence factor **. Often 99%, 95% and 90% are used, but this is optional.
Under the specified probability, the lower and upper limits of the interval containing the parameter $ θ $ are calculated. In this example, there is a 95% chance that the parameter $ θ $ indicates that the random variable is between 2.43 and 5.57.

Afterword

I have omitted the log cumulative distribution function logcdf, log survival function logsf, etc., but I think that the ones that seem to be used frequently have been suppressed.
Using these, I would like to unravel the 15 types of probability distributions listed above one by one.

1. Statistics learned with Python 2. Probability distribution [Thorough understanding of scipy.stats]