[PYTHON] Understanding the meaning of complex and bizarre normal distribution formulas

The normal distribution plays a very important role in statistics. The mathematical formula (density function) that expresses the normal distribution is


\Phi(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp \left( -\frac{(x-\mu)^2}{2\sigma^2} \right)

But this is a very complicated formula ...

It looks a little easier with a standard normal distribution with variance $ \ sigma ^ 2 = 1 $ and mean $ \ mu = 0 $.


\phi(x) = \frac{1}{\sqrt{2\pi}}\exp \left( -\frac{x^2}{2} \right)

The graph looks like this. It's a so-called bell type. (Hereafter, I will draw a graph appropriately with python)

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(-4,4, 100)
y = (1/np.sqrt(2*np.pi))*np.exp(-x**2/2)

plt.ylim(0,0.45)
plt.plot(x,y)
plt.show()

In the first place, the normal distribution is smooth and symmetrical, and I think the origin is that we want to express the probability with a function that is gathered at one point. So take the plunge and quadratic function


f(x) = x^2

If you draw it in a graph as

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(-10,10, 100)
y = x**2

plt.plot(x,y)
plt.show()

Hmmm, I'm sick. If this is the case, it will not be distributed, so multiply it by minus and turn it upward.


f(x) = -x^2


x = np.linspace(-1,1, 100)
y = -x**2

plt.xlim(-1.2,1.2)
plt.ylim(-1,0.2)
plt.plot(x,y)
plt.show()

It's getting a little like that. To extend the hem and make it bell-shaped, you can ride it on $ e $.


f(x) = e^{-x^2}

x = np.linspace(-1,1, 100)
y = np.exp(-x**2)

plt.xlim(-1.5,1.5)
plt.ylim(0,1.2)
plt.plot(x,y)
plt.show()

The shape is now perfectly normal. The origin of the form of this normal distribution was $ e ^ {-x ^ 2} $.

After that, $ x $ is $ 1 / \ sqrt {2} $ so that it can be calculated easily when differentiated. That is, change the variable as $ y = \ sqrt {2} x $.


g(y) = \exp \left(-\frac{y^2}{2} \right)

x = np.linspace(-3,3, 500)
y1 = np.exp(-(x**2))
y2 = np.exp(-(x**2)/2)

plt.xlim(-3,3)
plt.ylim(0,1.1)
plt.plot(x,y1,"b", label="exp(-x^2)")
plt.plot(x,y2,"g", label="exp(-(x^2)/2")
plt.legend()
plt.show()

It spread a little sideways.

It is necessary to integrate so that the area of this f (x) becomes 1 (because it is a probability, and if all possible events are added, it becomes 100%).

From [Gauss integral (see Wikipedia)](http://ja.m.wikipedia.org/wiki/Gauss integral), the integrated value for the entire range of $ x $ is


\int_{-\infty}^{\infty} e^{-x^2}dx = \sqrt{\pi}

So if you apply the change of variables $ y = \ sqrt {2} x $


\int_{-\infty}^{\infty} \exp \left( {-\frac{y^2}{2}}\right)dy = \sqrt{2\pi}

is. Divide both sides by $ \ sqrt {2 \ pi} $ and


\int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} \exp \left( {-\frac{y^2}{2}}\right)dy = １

I got the formula for the standard normal distribution: blush:


x = np.linspace(-1,1, 100)
y = np.exp(-(x**2)/2)

plt.xlim(-1.5,1.5)
plt.ylim(0,1.2)
plt.plot(x,y)
plt.show()

This formula was adjusted based on $ e ^ {-x ^ 2} $ so that it would be 1 when integrated to obtain the area.