[PYTHON] [Statistics] Let's visualize the relationship between the normal distribution and the chi-square distribution.

This is a sequel to the series of statistics and visualization.

The chi-square distribution is a distribution often used in the chi-square test of the AB test. It is chi-square by writing $ \ chi ^ 2 $. The graph shows the following shape, and the shape changes according to the value of k, which is called the degree of freedom. chi2_dist2-compressor.gif

(The graph drawing code is here)

When I asked Wikipedia teacher about the definition of chi-square distribution,

Independently take $ k $ random variables $ X_1, ..., X_k $ that follow a standard normal distribution. At this time, the distribution according to the statistic $ Z = \ sum_ {i = 1} ^ k X_i ^ 2 $ is called the chi-square distribution with $ k $ degrees of freedom.

I got a reply. Hmm, what do you mean? Do you square the density function of the normal distribution? Apparently it's different.

First of all, since it is "$ k $ random variables that independently follow the standard normal distribution", I will first write a histogram of random numbers that follow the standard normal distribution. Random numbers according to 30,000 $ X \ sim \ N (\ mu, \ sigma) $. norm_dist.png

x = np.random.normal(0, 1, 30000)
plot_dist(x, bins=80, title="normal distribution.")

(The full code for drawing the graph is here)

The distribution that the random numbers plotted by squaring this random number follow is the chi-square distribution. In code

#30 random numbers that follow a standard normal distribution,Generate 000
x = np.random.normal(0, 1, 30000)
#Square the generated random number [[This is the key! !! !! ]]
x2 = x**2

#Histogram drawing
plt.figure(figsize=(7,5))
plt.title("chi2 distribution.[k=1]")
plt.hist(x2, 80, color="lightgreen", normed=True)

#Drawing a chi-square distribution with one degree of freedom
xx = np.linspace(0, 25 ,1000)
plt.plot(xx, chi2.pdf(xx, df=1), linewidth=2, color="r")

It will be. The display of this graph is as follows. Since it is squared, everything is positive, so all the data has moved to the right from $ x = 0 $, Because it is squared

At the same time, the line of the density function of the chi-square distribution with 1 degree of freedom is drawn, but they are almost the same! This is plotting $ X_1 ^ 2 $ because it squares a random number that follows a standard normal distribution and plots it as is. Since there is only one $ X $, it has a chi-square distribution with one degree of freedom.

chi2_hist_dist-compressor.png

Then, if you draw an animation from $ X_1 ^ 2 + X_2 ^ 2 $ with 2 degrees of freedom to $ \ sum_ {i = 1} ^ {10} X_i ^ 2 $, it will be as follows. ..

chi2_hist_dist-compressor.gif

This is also a perfect match! The "square" of chi-square can be interpreted as the square of a random number that follows a standard normal distribution! I was able to add an image to this image by writing a histogram!

Below is the code for drawing the animation of the graph with 1 to 10 degrees of freedom.

def animate(nframe):
    n = 30000
    k = nframe + 1
    cum = np.zeros(n)
    for i in range(k):
        #30 random numbers that follow a standard normal distribution,Generate 000
        x = np.random.normal(0, 1, n)
        #Square the generated random number [This is the key! ]
        x2 = x**2
        #The added number is the degree of freedom.
        cum += x2

    #Histogram drawing
    plt.clf()
    #plt.figure(figsize=(9,7))
    plt.ylim(0, 0.6)
    plt.xlim(0, 25)
    plt.title("chi2 histgram & pdf [k=%d]"%k)
    plt.hist(cum, 80, color="lightgreen", normed=True)

    #Drawing a chi-square distribution with one degree of freedom
    xx = np.linspace(0, 25 ,1000)
    plt.plot(xx, chi2.pdf(xx, df=k), linewidth=2, color="r")


fig = plt.figure(figsize=(10,8))
anim = ani.FuncAnimation(fig, animate, frames=10, blit=True)
anim.save('chi2_hist_dist.gif', writer='imagemagick', fps=1, dpi=64)

Code supplement

Since imagemagick is used to draw gif animation, Honke HP and PythonMagick Please install by referring to download / python /).

However, installing ImageMagick and PythonMagick is difficult depending on the environment, so if you just want to create animations easily, you can generate animations with mp4 as shown below without additional libraries.

anim.save('filename.mp4', fps=13)

Recommended Posts

[Statistics] Let's visualize the relationship between the normal distribution and the chi-square distribution.
Let's visualize the relationship between average salary and industry with XBRL data and seaborn! (7/10)
The subtle relationship between Gentoo and pip
Investigating the relationship between ice cream spending and temperature
Investigate the relationship between TensorFlow and Keras in transition
Understanding the meaning of complex and bizarre normal distribution formulas
Basic statistics and Gaussian distribution
Relationship and approximation error of binomial distribution, Poisson distribution, normal distribution, hypergeometric distribution
I investigated the relationship between Keras stateful LSTM and hidden state
Examine the relationship between two variables (2)
Examine the relationship between two variables (1)
I tried to visualize the age group and rate distribution of Atcoder
Easily visualize the correlation coefficient between variables
Relationship between netfilter, firewalld, iptables and nftables
I tried to summarize the relationship between probability distributions starting from the Bernoulli distribution
[Introduction to statistics] What kind of distribution is the t distribution, chi-square distribution, and F distribution? A little summary of how to use [python]