Check the asymptotic nature of the probability distribution in Python

Introduction

I tried to visualize the asymptotic property of the probability distribution represented by the central limit theorem with Python. The asymptotic properties are mainly excerpted from the book "Statistics for the Officially Certified Statistical Test Level 1 of the Japan Statistical Society".

Central limit theorem

Suppose the probability distribution $ X_i (i = 1, \ cdots, n) $ follows an independent identical distribution with mean $ \ mu $ and variance $ \ sigma ^ 2 $. When $ \ frac {\ sum_ {i = 1} ^ nX_i} {n} $ is $ n \ to \ infinity $, the mean $ \ mu $, variance $ \ frac {\ sigma ^ 2} {n} $ Follow a normal distribution.

Experiment

Since it is a big deal, we will carry out the experiment with a distorted distribution. $ X_i (i = 1, \ cdots, n) $ density function

f(x) = 11 x^{10}\ \ \ (0\leq x\leq1)

And.

The blue histogram is the numerical value of the density function of $ \ frac {\ sum_ {i = 1} ^ nX_i} {n} $, and the solid orange line is the density function of the normal distribution that theoretically converges. ..

Click here for Python source code when $ n = 1 $.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

#Probability density function average 11/12,Variance 11/13-(11/12)^2
def f(x):
    return 11 * (x ** 10)
mu = 11 / 12
sigma2 = 11 / 13 - (11 / 12) ** 2
#Cumulative distribution function
def F(x):
    return x ** 11
#Inverse function of cumulative distribution function
def F_inv(x):
    return x ** (1 / 11)

n = 1
xmin = 0.6
xmax = 1.2
meanX = []
for _ in range(100000):
    meanX.append(np.sum(F_inv(np.random.rand(n))) / n)
plt.hist(meanX, bins=200, density=True)
s = np.linspace(xmin, xmax, 100)
plt.plot(s, norm.pdf(s, mu, (sigma2 / n) ** 0.5), linewidth=4)
plt.xlim(xmin, xmax)

Statistical test statistic of goodness of fit test (1)

Suppose $ (X_1, X_2, \ cdots, X_m) $ follows a multinomial distribution with $ n $ trials and $ (p_1, p_2, \ cdots, p_m) $ probabilities.

\sum_{i=1}^m\frac{(X_i-np_i)^2}{np_i}

Follows the $ \ chi ^ 2 $ distribution of $ (m-1) $ degrees of freedom when $ n \ to \ infinity $.

Experiment

Probability is $ \ big (\ frac {1} {16}, \ frac {1} {4}, \ frac {1} {4}, \ frac {7} {16} \ big) $ Think. Obtained numerically

\sum_{i=1}^4\frac{(X_i-np_i)^2}{np_i}

Is shown by a blue histogram, and the $ \ chi ^ 2 $ distribution with 3 degrees of freedom, which is theoretically converged, is shown by a solid orange line.

Click here for the Python source code when $ n = 4 $.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2
p = [1/16, 1/4, 1/4, 7/16]
n = 4
xmin = 0
xmax = 15
xx = []
for _ in range(100000):
    r = np.random.rand(1, n)
    x = [0] * 4
    x[0] = np.sum(r < sum(p[:1]))
    x[1] = np.sum(np.logical_and(sum(p[:1]) <= r, r < sum(p[:2])))
    x[2] = np.sum(np.logical_and(sum(p[:2]) <= r, r < sum(p[:3])))
    x[3] = np.sum(sum(p[:3]) <= r)
    xx.append(sum([(x[i] - n * p[i]) ** 2 / (n * p[i]) for i in range(4)]))
plt.hist(xx, bins=int(max(xx))*5, density=True)
s = np.linspace(xmin, xmax, 100)
plt.plot(s, chi2.pdf(s, 3), linewidth=4)
plt.xlim(xmin, xmax)

Statistical test statistic of goodness of fit test (Part 2)

The true probability $ p_i (i = 1,2, \ cdots, m) $ is unknown, but $ p_i $ is a $ l $ dimensional parameter $ \ boldsymbol {\ theta} (l \ leq m-2) $ Suppose you know that you can express it with. When the maximum likelihood estimator of $ p_i $ is $ \ hat {p} _i $,

\sum_{i=1}^m\frac{(X_i-n\hat{p}_i)^2}{n\hat{p}_i}

Follows the $ \ chi ^ 2 $ distribution of $ (m-l-1) $ degrees of freedom when $ n \ to \ infinity $.

Experiment

Probability is $ \ big (\ frac {1} {16}, \ frac {1} {4}, \ frac {1} {4}, \ frac {7} {16} \ big) $ Think. The true $ p $ is unknown, but only $ p_2 = p_3 $ is known. At this time, it can be expressed as $ p = [q, r, r, 1-2r-q] $. Find $ q, r $ by maximum likelihood estimation.

\sum_{i=1}^m\frac{(X_i-n\hat{p}_i)^2}{n\hat{p}_i}

Is shown in blue histogram, and the $ \ chi ^ 2 $ distribution with 1 degree of freedom, which is theoretically converged, is shown in orange.

Click here for the Python source code when $ n = 4 $.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2
n = 4
xmin = 0
xmax = 3
xx = []
for _ in range(100000):
    r = np.random.rand(1, n)
    x = [0] * 4
    x[0] = np.sum(r < sum(p[:1]))
    x[1] = np.sum(np.logical_and(sum(p[:1]) <= r, r < sum(p[:2])))
    x[2] = np.sum(np.logical_and(sum(p[:2]) <= r, r < sum(p[:3])))
    x[3] = np.sum(sum(p[:3]) <= r)
    p_ = [0] * 4
    p_[0] = x[0] / sum(x)
    p_[1] = (x[1] + x[2]) / (2 * sum(x))
    p_[2] = (x[1] + x[2]) / (2 * sum(x))
    p_[3] = x[3] / sum(x)  
    xx.append(sum([(x[i] - n * p_[i]) ** 2 / (n * p[i]) for i in range(4)]))
plt.hist(xx, bins=int(max(xx))*20, density=True)
s = np.linspace(xmin, xmax, 100)
plt.plot(s, chi2.pdf(s, 1), linewidth=4)
plt.xlim(xmin, xmax)

Asymptotic normality of maximum likelihood estimator (1)

For the random variable $ X_i (i = 1, \ cdots, n) $ characterized by the parameter $ \ theta $, $ \ theta_0 $ is the true parameter, $ \ hat \ theta $ is the maximum likelihood estimate, Fisher information. Amount $ J_n (\ theta) $

J_n(\theta)=E_\theta\Big[\Big(\frac{\delta}{\delta\theta}\log f(X_1,...,X_n;\theta)\Big)^2\Big]

And. When $ \ hat \ theta $ is $ n \ to \ infinity $, it follows a normal distribution with mean $ \ theta_0 $ and variance $ J_n (\ theta_0) ^ {-1} $.

Experiment

$ X_i (i = 1, \ cdots, n) $ density function

f(x) = (\theta+1) x^{\theta}\ \ \ (0\leq x\leq1)

And. Let the true $ \ theta $ be 10.

The experimentally obtained distribution of $ \ hat \ theta $ is shown in blue histogram, and the normal distribution that is theoretically converged is shown in orange.

Click here for the Python source code when $ n = 1 $.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
#Probability density function average 11/12,Variance 11/13-(11/12)^2
def f(x):
    return 11 * (x ** 10)
mu = 11 / 12
sigma2 = 11 / 13 - (11 / 12) ** 2
#Cumulative distribution function
def F(x):
    return x ** 11
#Inverse function of cumulative distribution function
def F_inv(x):
    return x ** (1 / 11)

n = 1
theta_min = -20
theta_max = 40
theta = []
for _ in range(100000):
    x = F_inv(np.random.rand(n))
    theta.append(- n / sum(np.log(x)) - 1)
theta = np.array(theta)

#theta histogram
th = np.linspace(theta_min, theta_max, 100)
hi = np.array([np.sum(np.logical_and(th[i] < theta, theta <= th[i + 1])) for i in range(100 - 1)]) / (len(theta) * (th[1] - th[0]))
plt.bar((th[:-1] + th[1:]) / 2, hi, width=(th[1] - th[0]))
#normal distribution
s = np.linspace(theta_min, theta_max, 100)
plt.plot(s, norm.pdf(s, 10, (11 ** 2 / n) ** 0.5), 'tab:orange', linewidth=4)
plt.xlim(theta_min, theta_max)
plt.ylim(0.0, 0.1)

Asymptotic normality of maximum likelihood estimator (2)

Let $ g $ be a differentiable function with $ \ theta_0 $. When $ g (\ hat \ theta) $ is $ n \ to \ infty $, mean $ g (\ theta_0) $, variance $ g ^ \ prime (\ theta_0) ^ 2 J_n (\ theta_0) ^ {-1 } Follows a normal distribution of $. However, the definitions of $ \ theta $, $ \ hat \ theta $, and $ \ theta_0 $ are the same as in Part 1.

Experiment

We will continue the experiment of Part 1. Define $ g $ with the following formula.

g(\theta)=\frac{1}{\theta}

The experimentally obtained distribution of $ g (\ hat \ theta) $ is shown in blue histogram, and the normal distribution that is theoretically converged is shown in orange.

Click here for Python source code when $ n = 1 $.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

#Probability density function average 11/12,Variance 11/13-(11/12)^2
def f(x):
    return 11 * (x ** 10)
mu = 11 / 12
sigma2 = 11 / 13 - (11 / 12) ** 2
#Cumulative distribution function
def F(x):
    return x ** 11
#Inverse function of cumulative distribution function
def F_inv(x):
    return x ** (1 / 11)

n = 1
theta_min = -0.2
theta_max = 0.6
theta = []
for _ in range(100000):
    x = F_inv(np.random.rand(n))
    theta.append(1 / (- n / sum(np.log(x)) - 1))
theta = np.array(theta)
th = np.linspace(theta_min, theta_max, 100)
hi = np.array([np.sum(np.logical_and(th[i] < theta, theta <= th[i + 1])) for i in range(100 - 1)]) / (len(theta) * (th[1] - th[0]))
plt.bar((th[:-1] + th[1:]) / 2, hi, width=(th[1] - th[0]))
s = np.linspace(theta_min, theta_max, 100)
plt.plot(s, norm.pdf(s, 1/10, (11 ** 2 / n / 10 ** 4) ** 0.5), 'tab:orange', linewidth=4)
plt.xlim(theta_min, theta_max)

Statistical test statistic for likelihood ratio test

Suppose there is a probability distribution $ f (x; \ theta) $ characterized by the parameter $ \ theta $. Hypothesis test problem

H_0:\theta\in\Theta_0 vs. $H_1:\theta\in\Theta_1 $

In, the likelihood ratio $ L $ is defined by the following equation.

L=\frac{\sup_{\theta\in\Theta_1} f(x_1,\cdots,x_n;\theta)}{\sup_{\theta\in\Theta_0} f(x_1,\cdots,x_n;\theta)}

$ 2 \ log L $ follows the $ \ chi ^ 2 $ distribution with $ p $ degrees of freedom when $ n \ to \ infinity $ under the null hypothesis. However, $ p = \ dim (\ Theta_1)-\ dim (\ Theta_0) $

Experiment

f(x;\theta) = (\theta+1) x^{\theta}\ \ \ (0\leq x\leq1)

As a hypothesis test problem

H_0:\theta\=10 vs. $H_1\neq10 $

And.

Under the null hypothesis, the density function of $ 2 \ log L $ is numerically obtained by the blue histogram, and the density function of the $ \ chi ^ 2 $ distribution, which is theoretically converged, is the solid orange line.

Click here for the source code when $ n = 1 $.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chi2
#True distribution: Probability density function mean 11/12,Variance 11/13-(11/12)^2
def f(x):
    return 11 * (x ** 10)

#Parametric model
def f(x, theta):
    return (theta + 1) * (x ** theta)

#Cumulative distribution function
def F(x):
    return x ** 11

#Inverse function of cumulative distribution function
def F_inv(x):
    return x ** (1 / 11)

n = 1
l_min = 0
l_max = 5
l = []
for _ in range(100000):
    x = F_inv(np.random.rand(n))
    theta = - n / sum(np.log(x)) - 1  #Maximum likelihood estimator
    l.append(2 * np.log(np.prod(f(x, theta)) / np.prod(f(x, 10))))
l = np.array(l)
#l histogram
th = np.linspace(l_min, l_max, 100)
hi = np.array([np.sum(np.logical_and(th[i] < l, l <= th[i + 1])) for i in range(100 - 1)]) / (len(l) * (th[1] - th[0]))
plt.bar((th[:-1] + th[1:]) / 2, hi, width=(th[1] - th[0]))
#Chi-square distribution
s = np.linspace(l_min, l_max, 100)
plt.plot(s, chi2.pdf(s, 1), 'tab:orange', linewidth=4)
plt.xlim(l_min, l_max)
plt.ylim(0, 2)

Recommended Posts

Check the asymptotic nature of the probability distribution in Python
Check the behavior of destructor in Python
Try transcribing the probability mass function of the binomial distribution in Python
Match the distribution of each group in Python
Check the operation of Python for .NET in each environment
Check the existence of the file with python
How to check the memory size of a variable in Python
Check if the URL exists in Python
The result of installing python in Anaconda
Check the path of the Python imported module
The basics of running NoxPlayer in Python
Check the in-memory bytes of a floating point number float in Python
I made a program to check the size of a file in Python
Output the number of CPU cores in Python
[python] Check the elements of the list all, any
[Python] Sort the list of pathlib.Path in natural sort
Check if the characters are similar in Python
Explain the nature of the multivariate normal distribution graphically
Defeat the probability density function of the normal distribution
Get the caller of a function in Python
View the result of geometry processing in Python
Statistical test grade 2 probability distribution learned in Python ②
Make a copy of the list in Python
Check the date of the flag duty with Python
Find the divisor of the value entered in python
Find the solution of the nth-order equation in python
About the behavior of Model.get_or_create () of peewee in Python
Solving the equation of motion in Python (odeint)
Output in the form of a python array
Statistical test grade 2 probability distribution learned in Python ①
Logistic distribution in Python
the zen of Python
Check if the string is a number in python
Easy way to check the source of Python modules
Experience the good calculation efficiency of vectorization in Python
How to get the number of digits in Python
Carefully understand the exponential distribution and draw in Python
Plot and understand the multivariate normal distribution in Python
Learn the design pattern "Chain of Responsibility" in Python
Implement the solution of Riccati algebraic equations in Python
Carefully understand the Poisson distribution and draw in Python
Reproduce the execution example of Chapter 4 of Hajipata in Python
Let's use the open data of "Mamebus" in Python
Check for the existence of BigQuery tables in Java
Implemented the algorithm of "Algorithm Picture Book" in Python3 (Heapsort)
[Python] Outputs all combinations of elements in the list
Get the URL of the HTTP redirect destination in Python
A reminder about the implementation of recommendations in Python
Reproduce the execution example of Chapter 5 of Hajipata in Python
To do the equivalent of Ruby's ObjectSpace._id2ref in Python
Check the type and version of your Linux distribution
How to check in Python if one of the elements of a list is in another list
Check the processing time and the number of calls for each process in python (cProfile)
Towards the retirement of Python2
Download the file in Python
Find the difference in Python
Write beta distribution in Python
Python --Check type of values
About the ease of Python
Equivalence of objects in Python
Generate U distribution in Python