[PYTHON] Move your hand to understand the chi-square distribution

https://bellcurve.jp/statistics/course/9208.html

According to the URL above, the chi-square distribution is the sum of the squares of random variables that follow a standard normal distribution. However, even if you look at the distribution, you can't really feel it, so try it with Jupyter.

Generate a random variable that follows N (0,1) according to the definition, perform multiple trials of taking the sum of squares, and confirm the distribution. The left side of the figure below is a drawing of KDE. The right side is the histogram.

In the figure on the left, it can be confirmed that almost the same shape as the distribution can be reproduced. The figure on the right has almost the same shape. It feels a little strange, but it seems to be closer if you increase the number of samples or adjust the vertical and horizontal axes.

カイ二乗分布.png

From this, I could understand a little that "the chi-square distribution is the sum of the squares of random variables that follow the standard normal distribution". When the degree of freedom is 1, the value is often close to 0, but when the degree of freedom is large, the sum of them is taken, so the peak of the distribution gradually shifts to the right. Assuming that the mean value when the degree of freedom is 1 is 1 (although it is difficult to understand from the figure), the degree of freedom = the number of independent standard normal distributions, so the expected value matches the degree of freedom. It makes sense to do it.

On the other hand, the question remains, "So what?" After investigating, the following was easy to understand.

https://atarimae.biz/archives/13511

However, it is not possible to conclude that, for example, "when you throw the dice 120 times, you get only 1 and 6", but "it is hard to think of a coincidence" with the sample average alone.

Naturally, the bias of the sample cannot be expressed only by the sample "mean". Therefore, it is not possible to point out the contradiction of "results that are valid when viewed only on average, but are clearly biased." The idea to solve this is to "confirm the distribution of the sum of squares (≈ variance) of the sample", and it can be said that the tool for checking it is the chi-square distribution.

Until now, I had only a superficial understanding, but I feel that my understanding of the chi-square distribution has deepened.

The notebooks I used are as follows.

https://github.com/takotaketako/public-notebook/blob/master/%E3%82%AB%E3%82%A4%E4%BA%8C%E4%B9%97%E5%88%86%E5%B8%83.ipynb

Recommended Posts

Move your hand to understand the chi-square distribution
I tried to move the ball
Record the steps to understand machine learning
Run the Matrix to your boss's terminal!
Combinatorial optimization to find the hand of "Millijan"
Knowledge notes needed to understand the Python framework
The best tool to protect your privacy from your photos ...!
A memo to visually understand the axis of pandas.Panel
Flow memo to move LOCUST for the time being
Steps to install the latest Python on your Mac
Carefully understand the exponential distribution and draw in Python
Plot and understand the multivariate normal distribution in Python
I want to fully understand the basics of Bokeh
Carefully understand the Poisson distribution and draw in Python
Steps to calculate the likelihood of a normal distribution
Post to your account using the API on Twitter
Bayesian update, tried to understand binomial distribution / likelihood function
14 quizzes to understand the surprisingly confusing scope of Python
Check the type and version of your Linux distribution