Yesterday explained the "law of large numbers" and "central limit theorem" which are important probability theorems when dealing with large size data. ..

Today, let's actually use a computer to simulate the distribution of many stochastic trials.

Pseudo-random number generation

You can't make real random numbers with a calculator. Therefore, we use pseudo-random numbers. (In addition, it seems that pseudo-random numbers are officially "pseudo" with no bias.) Random numbers are very deep, and there is no end to pursuing them.

Here, we will use numpy.random.randint that is associated with NumPy. It returns an integer from a uniformly discrete distribution.

Scatter plot the results of 40,000 coin tosses

I explained yesterday that 40,000 coin tosses will not show up 20,400 times when the central limit theorem is applied. It has been proved by the theorem, but it may be difficult for an engineer to understand it without actually running it on a computer.

Therefore, use the following code to simulate.

def coin_toss(lim):
    """Simulate a coin toss"""
    #Stores 1 if the front appears and 0 if the back appears
    _randomized = np.random.randint(2, size=lim)
    #Aggregate the total number of times the table appears
    _succeed = [i for i in _randomized if i == 1]
    #Returns the aggregation result
    return len(_succeed)

X = []
Y = []
lim = 10000

# 40,000 coin toss 10,Do 000 times
for i in range(lim):
    X.append(i)
    Y.append(coin_toss(lim = 40000))

print (X)
print (Y)
_over_lim = [i for i in Y if i >= 20400]
# 20,Number of times over 400
print( len(_over_lim) )
_under_lim = [i for i in Y if i <= 19600]
# 19,Number of times 600 or less
print( len(_under_lim) )

This means that you will toss 400 million coins in one experiment.

Results of the first experiment

First is the result of the first experiment.

If you think straightforwardly with a 40,000 coin toss, you should get a table of 20,000 times, but this is a graph that repeats this 10,000 times.

It's almost centered. This time, the number of times the table appeared was neither more than 20,400 nor less than 19,600.

Results of the second experiment

When I experimented again, there were only 1 out of 10,000 cases where the table appeared less than 19,600 times.

I think you can somehow read it from the graph.

Results of the third experiment

This is the result of the third experiment.

There was only one case where the number of times the table was barely displayed did not exceed 20,400. In addition, there was only one case that was less than 19,600 times.

Results of the 4th experiment

This is the result of the 4th experiment.

After all, it did not exceed 20,400 times.

Results of the 5th experiment

As a follow-up test, I tried the same experiment again on a different day.

There was one case where the table appeared more than 20,400 times and one case where the table appeared less than 19,600 times.

Hypothesis test

Now, let's statistically investigate whether the hypothesis that 40,000 coin toss yields 20,400 tables is correct.

The hypothesis can be made as follows.

hypothesis	Description
Null hypothesis	40,20 coin toss 000 times,400 times table appears
Alternative hypothesis	40,20 coin toss 000 times,400 times table does not appear(20,000 times table appears)

Chi-square test

** Pearson's chi-square test ** is the Chi-square test (http://en.wikipedia.org/wiki/%E3%82%AB%E3%82%A4%E4) % BA% 8C% E4% B9% 97% E5% 88% 86% E5% B8% 83) is the most basic and widely used method. The formula looks like this:

X^2 = \sum\frac {(O-E)^2} {E}

Implementation of chi-square test

The chi-square test is easy to implement using SciPy.

Let's start by checking if a 400 coin toss will give you a 204 flip table. To find out, whether the hypothesis that the results of 204 fronts and 196 backs are from 400 coin tosses is significant.

# -*- coding:utf-8 -*-

import numpy as np
import scipy.stats

s = 204 #Number of times the table appears
f = 196 #Number of times the back comes out
e = 200 #Expected number of times

#Null hypothesis(204:196)
observed = np.array([s,f])
#Alternative hypothesis(200:200)
expected = np.array([e,e])

#Perform a chi-square test
x2, p = scipy.stats.chisquare(observed, expected)

print("The chi-square value is%(x2)s" %locals() )
print("The probability is%(p)s" %locals() )

#Statistical significance level 0.Find out if it is higher than 05
if p > 0.05:
    print("Significant")
else:
    print("Not significant")

The result is like this. 0.68 is higher than 0.05, so it's significant. It can be said that it can happen enough.

Chi-square value is 0.16
Probability is 0.689156516779
Significant

Let's look at this as 2,040 times on the front and 1,960 times on the back.

Chi-square value is 1.6
Probability is 0.205903210732
Significant

Sounds still significant. It can happen enough.

Then what about 20,400 times on the front and 19,600 times on the back?

Chi-square value is 16.0
Probability is 6.33424836662e-05
Not significant

It was only a very low value (note floating point numbers). It can be said that this is unlikely to happen.

What if you try 40,000 coin toss to flip 20,100 times?

Chi-square value is 1.0
Probability is 0.317310507863
Significant

It's significant. It turned out that it is quite possible that the table will appear if it is about 20,100 times.

Consideration

By using a computer, you can perform a large amount of calculations, perform simulations, and visualize the results. We also found that hypothesis testing can test whether the relative frequency of events observed follows a frequency distribution.

[PYTHON] Asymptotic theory and its simulation (2)