[PYTHON] [Uncorrelated test] I tried to put out the boundary line with or without rejection

Introduction

In the process of learning various calculation formulas while studying statistics, I couldn't easily remember the ** uncorrelated test **, so I was staring at the formulas. And since there was something I was curious about, I let Python calculate and draw the result.

--Environment

Uncorrelated test

It tests whether it can be said that "the population has a similar correlation" from the correlation coefficient obtained from the sample.

-** Null hypothesis H0 **: Population correlation coefficient is 0 (no similar correlation)

-** Alternative hypothesis H1 **: Population correlation coefficient is not 0

From the formula below, find the statistic $ t $ and get the $ p $ value. The degrees of freedom $ ν $ for the statistic $ t $ is $ n-2 $.

t = \frac{|r| \sqrt{n - 2}}{\sqrt{1 - r^2}}

If the significance level $ a $ is 0.05, it is sufficient to see the $ p $ value of 0.025 points in the two-sided test.


... I can't remember this formula because I don't use it easily. However, I thought, "If n (sample size) is large, the t-value will be large, ** after all, it's the sample size !! **", so I made a round-robin of the sample size and the correlation coefficient, and * * I looked at how far the null hypothesis is not rejected **.

Preparation

#Used for data creation
import pandas as pd
import numpy as np
import math
from scipy import stats
import itertools

#Used for graph drawing
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D


%matplotlib inline


plt.style.use('seaborn-darkgrid')
plt.rcParams['font.family'] = 'Yu Gothic'
plt.rcParams['font.size'] = 20


#Correlation coefficient(coef)And sample size(n)If you put in, t value(t),Degree of freedom(df), P-value(p)Create a function that returns.
def Uncorrelated(coef, n):
    t = (np.abs(coef) * math.sqrt( (n - 2) ) ) / (math.sqrt( ( 1 - (coef**2) ) ) )
    df = (n - 2)
    p = np.round(( 1 - stats.t.cdf(np.abs(t), df) ), 3) #The p-value is rounded.
    return coef, n, t, df, p


#Number of samples from 10 to 1000 in 10 increments
samplesizes = np.arange(10, 1001, 10)

#Correlation coefficient-0.99 to 0.0 up to 99.01 increments
coefficients = np.linspace(-0.99, 0.99, 199)
#print(coefficients)

#Cross join the above two(Cartesian product)
c_s = list(itertools.product(coefficients, samplesizes) )

#Put the list containing the correlation coefficient and sample size into the Uncorrelated function, and convert the returned one into a DataFrame with Pandas.
df_prelist = []
for i in range(len(c_s)): 
    df_prelist.append(Uncorrelated(c_s[i][0],c_s[i][1])) 

#Preparation is complete
df = pd.DataFrame(df_prelist,columns=['coef','sample_size','t','df','p_value'])

df is like this

df

image.png

df.sample(10)

image.png

Correlation coefficient -0.99 to 0.99, contains t-value, degree of freedom, and p-value for uncorrelated test for sample sizes 10 to 1000.

Graph drawing

fig = plt.figure( figsize=(16, 12) )
ax = Axes3D(fig)
cm = plt.cm.get_cmap('RdYlBu')
mappable = ax.scatter( np.array(df['coef']), np.array(df['sample_size']), np.array(df['p_value']), c=np.array(df['p_value']), cmap=cm)
fig.colorbar(mappable, ax=ax)
ax.set_xlabel('Correlation coefficient', labelpad=15)
ax.set_ylabel('sample size', labelpad=15)
ax.set_zlabel('p-value', labelpad=15)
plt.savefig('3D graph.png', bbox_inches='tight', pad_inches=0.3)
plt.show()
3次元グラフ.png

... the closer it is to blue, the higher the p-value and it is not rejected ... ** It's hard to understand **

I created a Judge column and recreated a DataFrame with a p-value greater than 0.025 as "Do not reject H0".

#p_value is 0.If it is 025 or higher`Do not reject H0`Put on
df['judge'] = 'Reject H0'
for index, series in df.query('p_value > 0.025').iterrows():
    df.at[index, 'judge'] = 'Do not reject H0'


#Graph redraw
grid = sns.FacetGrid( df, hue = 'judge', height=10 )
grid.map(plt.scatter, 'coef', 'sample_size')
grid.add_legend(title='Judgment')
plt.ylabel('sample size')
plt.xlabel('Correlation coefficient')
plt.title('Correlation coefficient x sample size With or without rejection of uncorrelated test', size=30)

#Draw a red line
plt.vlines(df[df['judge'] == 'Do not reject H0']['coef'].max(), -50, 50, color='red', linestyles='dashed')
plt.vlines(df[df['judge'] == 'Do not reject H0']['coef'].min(), -50, 50, color='red', linestyles='dashed')
plt.annotate('|' + str(df[df['judge'] == 'Do not reject H0']['coef'].max().round(2) ) + '|The outer side is n=If it is 10 or more, reject all',
            xy=(df[df['judge'] == 'Do not reject H0']['coef'].max(), 80), size=15, color='black')
plt.savefig('2D graph.png', bbox_inches='tight', pad_inches=0.3)
plt.show()
2次元グラフ.png

Indeed, ** if the correlation coefficient of the sample is greater than the absolute value of 0.62, the null hypothesis H0 is rejected even at n = 10 and "the population correlation coefficient is not 0" is adopted ** ($ a =) 0.05 $)!



... By the way, I remembered the original purpose of "learning formulas" by writing this article: upside_down:

Recommended Posts

[Uncorrelated test] I tried to put out the boundary line with or without rejection
I tried to put out the frequent word ranking of LINE talk with Python
I tried to notify the train delay information with LINE Notify
Mayungo's Python Learning Episode 2: I tried to put out characters with variables
I tried to save the data with discord
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to learn the sin function with chainer
I tried to touch the CSV file with Python
I tried to solve the soma cube with python
I tried to put pytest into the actual battle
I tried to solve the problem with Python Vol.1
I tried to notify the honeypot report on LINE
I tried to find out how to streamline the work flow with Excel x Python ②
I tried to find out how to streamline the work flow with Excel x Python ④
I tried to find out how to streamline the work flow with Excel x Python ⑤
I tried to find out how to streamline the work flow with Excel x Python ①
I tried to find out how to streamline the work flow with Excel x Python ③
I dare to fill out the form without using selenium
I tried to find out the outline about Big Gorilla
I tried to analyze the whole novel "Weathering with You" ☔️
I tried to find the average of the sequence with TensorFlow
I tried to describe the traffic in real time with WebSocket
I tried to solve the ant book beginner's edition with python
I tried to find out how to streamline the work flow with Excel × Python, my article summary ★
I want to know the weather with LINE bot feat.Heroku + Python
I tried to automate the watering of the planter with Raspberry Pi
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Introduction ~
I tried to process the image in "sketch style" with OpenCV
I tried to find out if ReDoS is possible with Python
I tried to get started with Bitcoin Systre on the weekend
I tried to process the image in "pencil style" with OpenCV
I tried to expand the size of the logical volume with LVM
I tried to cut out a still image from the video
I tried to improve the efficiency of daily work with Python
I tried to move the ball
I tried to estimate the interval.
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Implementation ~
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Battle Edition ~
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I tried to refer to the fun rock-paper-scissors poi for beginners with Python
I tried to express sadness and joy with the stable marriage problem.
I tried to make "Sakurai-san" a LINE BOT with API Gateway + Lambda
I tried to get the authentication code of Qiita API with Python.
Matching karaoke keys ~ I tried to put it on Laravel ~ <on the way>
I tried to automatically extract the movements of PES players with software
I tried to learn the angle from sin and cos with chainer
I tried with the top 100 PyPI packages> I tried to graph the packages installed on Python
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
I tried to streamline the standard role of new employees with Python
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to visualize the model with the low-code machine learning library "PyCaret"
I tried to get the movie information of TMDb API with Python
I tried to verify the result of A / B test by chi-square test
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried to control the network bandwidth and delay with the tc command
I tried to implement Autoencoder with TensorFlow
I tried to summarize the umask command
I tried to visualize AutoEncoder with TensorFlow
I tried to recognize the wake word
I tried to get started with Hy
I tried to estimate the pi stochastically