[PYTHON] [Multivariate analysis] About quantification type I (001)

Analysis story of quantification type I

  1. Convert a qualitative variable to a dummy variable and assume a multiple regression model by considering the dummy variable as a quantitative variable.
  2. Obtain the degree of freedom adjusted contribution rate and evaluate the performance of the obtained regression equation.
  3. Select the explanatory variables (variable selection) and select useful variables.
  4. Examine the residual and leverage ratio and judge the validity of the obtained regression equation.
  5. Using the obtained regression equation, estimate the population regression for the value of the explanatory variable specified arbitrarily, and predict the value of the data to be obtained in the future.

How to handle qualitative variables

A qualitative variable is a variable that is not originally a numerical variable, such as "excellent", "good", or "acceptable", but is quantified as 0,1.

This time,

Qualitative variables Quantitative variables
Yu 3
Good 2
Yes 1

Instead of quantifying like

x_{1\left(1\right)}=\left\{\begin{array}{l}
1 When you are good\\
0 When not excellent
\end{array}\right.
x_{1\left(2\right)}=\left\{\begin{array}{1}
1 Good time\\
0 When not good
\end{array}\right.
x_{1\left(3\right)}=\left\{\begin{array}{1}
1 When it is possible\\
0 When not possible
\end{array}\right.

Convert as follows. This is because the difference between "excellent" and "good", the difference between "excellent" and "acceptable", and the difference between "good" and "acceptable" cannot be quantitatively expressed.

Practical example of quantification type I

The following data is handled as a specific example.

original data

No Math grades Overall grade
1 Yu 96
2 Yu 88
3 Yu 77
4 Yu 89
5 Good 80
6 Good 71
7 Good 77
8 Yes 78
9 Yes 70
10 Yes 62

Data after conversion from qualitative variable to quantitative variable

sample Math grades x_1 x_2 x_3 Overall grade
1 Yu 1 0 0 96
2 Yu 1 0 0 88
3 Yu 1 0 0 77
4 Yu 1 0 0 89
5 Good 0 1 0 80
6 Good 0 1 0 71
7 Good 0 1 0 77
8 Yes 0 0 1 78
9 Yes 0 0 1 70
10 Yes 0 0 1 62

Perform multiple regression analysis

The following consciousness is described below, but honestly, I don't think it is necessary to "force" understanding. Basically, the calculation is executed by python, and if you solve about 20 questions, you can understand it as a feeling. .. ..

  1. Multiple regression model y_{i}=\beta_{0}+\beta_{1\left(2\right)}x_{i1\left(2\right)}+\beta_{1\left(3\right)}x_{i1\left(3\right)}+\epsilon_{i}
  2. Error (assuming it follows a normal distribution) \epsilon_{i}\sim N\left(0,\ \sigma^{2}\right)
  3. Predicted value \hat{y_{i}}=\hat{\beta_{0}}+\hat{\beta_{1\left(2\right)}}x_{i1\left(2\right)}+\hat{\beta_{1\left(3\right)}}x_{i1\left(3\right)}
  4. Value of each coefficient of predicted value \displaystyle \left[\begin{array}{l} \hat{\beta_{1\left(2\right)}}\\\\ \hat{\beta_{1\left(3\right)}} \end{array}\right]=\frac{1}{S_{11}S_{22}-S_{12}^{2}}\left[\begin{array}{l} S_{22}S_{1y}-S_{12}S_{2y}\\\\ -S_{12}S_{1y}+S_{11}S_{2y} \end{array}\right]
  5. Sum of squares and sum of deviations of each coefficient S_{11}=\displaystyle \sum_{i=1}^{n}x_{i1\left(2\right)}^{2}-\frac{1}{n}\left(\sum_{i=1}^{n}x_{i1\left(2\right)}\right)^{2}

S_{22}=\displaystyle \sum_{i=1}^{n}x_{i1\left(3\right)}^{2}-\frac{1}{n}\left(\sum_{i=1}^{n}x_{i1\left(3\right)}\right)^{2}

S_{12}=\displaystyle \sum_{i=1}^{n}x_{i1\left(2\right)}x_{i1\left(3\right)}-\frac{1}{n}\sum_{i=1}^{n}x_{i1\left(2\right)}\sum_{i=1}^{n}x_{i1\left(3\right)}

S_{1y}=\displaystyle \sum_{i=1}^{n}x_{i1\left(2\right)}y_{i}-\frac{1}{n}\sum_{i=1}^{n}x_{i1\left(2\right)}\sum_{i=1}^{n}y_{i}

S_{2y}=\displaystyle \sum_{i=1}^{n}x_{i1\left(3\right)}y_{i}-\frac{1}{n}\sum_{i=1}^{n}x_{i1\left(3\right)}\sum_{i=1}^{n}y_{i} 6. Normal equation \hat{\beta_{0}}=\overline{y}-\hat{\beta_{1\left(2\right)}}\overline{x_{1\left(2\right)}}-\hat{\beta_{1\left(3\right)}}\overline{x_{1\left(3\right)}} 7. Mean of each coefficient \displaystyle \overline{y}=\frac{1}{n}\sum_{i=1}^{n}y_{i}

\displaystyle \overline{x_{1\left(2\right)}}=\frac{1}{n}\sum_{i=1}^{n}\overline{x_{i1\left(2\right)}}

\displaystyle \overline{x_{1\left(3\right)}}=\frac{1}{n}\sum_{i=1}^{n}\overline{x_{i1\left(3\right)}}

Calculation of various constants

References

Introduction to Multivariate Analysis (Library New Math) Yasushi Nagata (Author), Masahiko Muchinaka (Author)

Recommended Posts

[Multivariate analysis] About quantification type I (001)
About reference type
I searched about Pynamodb
I studied about Systemd properly
About type comparison by PHP
What I learned about Linux