[PYTHON] Examine the relationship between two variables (2)

Today I'm going to stand in the kitchen and make a fried cabbage Chinese soup. Stir well and then taste a little bit to judge the cooking.

Alternatively, in a company recruitment interview, just a few tens of minutes of face-to-face meetings will determine if the person is suitable for an employee.

Or, after just a few months or a year of dating, I decide to marry as a companion for the rest of my life.

In this way, inferring the population from some samples is the essence of inference statistics.

Sampling

In choosing the two variates, you will be extracting a sample from the population. As explained before, there are various types of Sampling method.

In Previous example, we focused on 10 students in a high school class and extracted sports results.

This does not mean that the grades of all high school students can be seen at all. However, it is possible to infer the whole with a certain degree of accuracy from the statistical information of such a sample. In other words, sampling is not an end in itself, but a means to grasp the whole.

Correlation

In the previous example, the relationship between grip strength and bead throwing seemed to be distributed somewhat upward to the right.

And its correlation coefficient was 0.53. There will be some positive correlation.

The value of the correlation coefficient r (x, y) ranges from -1 to 1, and the closer it is to the absolute value 1, the stronger the degree of correlation.

Regression line

Now consider again the two variates x and y.

item value
Fluent x x_1, x_2, ..., x_n
Fluent y y_1, y_2, ..., y_n

A straight line passing through the center O'(x, y) in the correlation diagram of the variates x, y

y=a(x-\overline{x})+\overline{y}

N points out of

P_1(x_1,y_1), P_2(x_2,y_2), ... P_N(x_N,y_N)

Consider the straight line closest to.

The regression line of y to x is as follows.

\frac {y-\overline{y}} {\sigma(y)} = r(x,y) \frac {x-\overline{x}} {\sigma(x)}

I explained linear regression before. Let's recall again the least squares method.

image.png

When the correlation coefficient approaches 1, (r (x, y) → 1), S_0 → 0 above, so all the points in the scatter plot gradually become distributed in a form close to a straight line. is.

reference

Statistical analysis learned from scratch http://www.amazon.co.jp/dp/4061546562

Recommended Posts

Examine the relationship between two variables (2)
Examine the relationship between two variables (1)
Calculate the correspondence between two word-separators
Estimate the delay between two signals
Easily visualize the correlation coefficient between variables
The subtle relationship between Gentoo and pip
About the relationship between Git and GitHub
Bayesian modeling-estimation of the difference between the two groups-
Investigating the relationship between ice cream spending and temperature
Examine the dual problem
Calculate the time difference between two columns with Pandas DataFrame
Understand the difference between cumulative assignment to variables and cumulative assignment to objects
[Statistics] Let's visualize the relationship between the normal distribution and the chi-square distribution.
I investigated the relationship between Keras stateful LSTM and hidden state