Target person

--Those who do not understand simple regression analysis and multiple regression analysis and want to understand

Introduction

When I looked up the multiple regression analysis on wikipedia, it was as follows.

From Wikipedia

Multiple regression analysis is one of the multivariate analysis. Two or more independent variables (two or more dimensions) in regression analysis. One with one independent variable is called simple regression analysis. The least squares method, which is commonly used, and the multiple regression of generalized linear models are mathematically a kind of linear analysis, and are mathematically similar to analysis of variance. By selecting multiple appropriate variables, it is possible to create a prediction formula that is easy to calculate and has few errors. The coefficient of each explanatory variable of the multiple regression model is called the partial regression coefficient. The degree of influence on the objective variable does not show the partial regression coefficient, but the standardized partial regression coefficient shows the degree of influence on the objective coefficient.

I'm not sure

To make it easier to understand, I will first explain simple regression analysis with examples.

What is simple regression analysis?

For example, I want to predict the height of a person from their weight! !! !! Suppose that For that purpose, it is said to collect the weight and height data for 10 people and draw the following straight line.

Vertical axis: height (cm) Horizontal axis: weight (kg) Figure_1

Mathematically speaking, such a straight line has the following shape.

\hat{y}=ax+b

\ hat {y}: Height

x: Weight

a: Slope (a is a constant)

b: intercept (constant like a)

Finding a and b in the above equation is called simple regression analysis. There is something like ^ on top of y, but this is called a hat and seems to be attached to the predicted value y. The least squares method is used as the method for obtaining these a and b. Since the method of least squares will be explained when explaining multiple regression analysis, it will be omitted here. Also, there was a person who explained the least squares method in an easy-to-understand manner at the following URL, so you can refer to that.

Least Squares in Python

If you can find a and b, you can predict y (height) by substituting the weight for x.

The official names of the guys in this example are given for the next explanation of multiple regression analysis.

--Weight: ** Explanatory variable **

--Height: ** Objective variable **

In this example, the height was predicted based on the weight data. The value you want to find is called the objective variable, and the value used to predict the value you want to find (objective variable) is called the explanatory variable.

--There were 10 weight data (explanatory variable data), so this is called ** sample number **.

――Simple regression analysis is the case of predicting the objective variable with one explanatory variable, which is called simple regression analysis.

What is multiple regression analysis?

** Regression analysis that predicts the objective variable from multiple explanatory variables ** is called multiple regression analysis.

As an example ... In the simple regression analysis I mentioned earlier, I predicted height only from weight. However, if you think about it in reality, it seems difficult to predict your height from your weight alone. It seems difficult to predict the exact height with the simple regression analysis mentioned earlier because some people are crunchy and some are fat.

Then what should I do

It is a feeling of multiple regression analysis that I thought that it would be possible to make a more accurate prediction by using the waist and foot size, which are likely to be related when predicting height, as explanatory variables.

Did you get the image? ❓

From here, I'll talk a little mathematically.

The general formula for multiple regression analysis is as follows.

is t {y} = \ solid _0 + \ solid _1x_1 + \ solid _2x_2 + \ solid _3x_3 + ... + \ solid _n x_n

\ hat {y}: Objective variable

\ beta_0: Bias

\ beta_1, \ beta_2, ..., \ beta_n: Regression coefficient

x_1, x_2, ... x_n: Explanatory variables

--The objective variable is the predicted value (height) as in the case of simple regression analysis.

-$ \ Beta_0 $ is called bias, which is like b (intercept) in simple regression analysis.

――Regression coefficient is like a (slope) in simple regression analysis. Multiple regression analysis is an image that finds the number of a according to each explanatory variable because there are multiple explanatory variables.

--For example, there are three explanatory variables, weight, waist, and foot size, so the above formula is as follows. $\hat{y}=\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3$

I will explain how to find it.

For the sake of clarity, we will use the following example.

[Example] Height (objective variable) is predicted from three explanatory variables of weight, waist, and foot size. The number of samples is 5. Specifically ... Mr. A ・・・ Weight: 50kg, Waist: 60cm, Foot size: 23cm → Height: 155cm Mr. B ・・・ Weight: 60kg, Waist: 70cm, Foot size: 25cm → Height: 162cm Mr. C ・・・ Weight: 70kg, Waist: 78cm, Foot size: 27cm → Height: 173cm Mr. D ・・・ Weight: 30kg, Waist: 50cm, Foot size: 19cm → Height: 135cm Mr. E ・・・ Weight: 80kg, Waist: 95cm, Foot size: 27cm → Height: 175cm The purpose is to create a nice function ($ \ hat {y} $) that can predict the height of an unknown human by finding a nice regression coefficient β1, β2, β3 from the above data of 5 people in total. is!

Since there are many letters, please take a break before reading.

First of all, as a procedure, if you can find a good $ \ beta $, you need to evaluate whether it really feels good. There is a method called ** least squares ** as a method to evaluate it. Those who read the url during the simple regression analysis may have understood it, but I will explain it briefly here as well.

What is the least squares method? Suppose you draw a nice line from some data as shown in the image below. Then there is an error in each point and line. (The red line in the image is the error.) If the error of all points and lines is added and the total of the errors is small, a straight line that feels good is drawn! It means that.

However, the error may have a point below the line (negative error), a point above the line (plus error), or a different sign. If you add the error as it is, it may happen that it feels like Pramai 0 and it is evaluated that the error is large but the error is small. I want to avoid it. I want to unify the sign by finally!

Therefore, there is an absolute value that can be considered as a thing that can unify the sign, but I want to avoid the absolute value because it is very difficult to handle because it is necessary to divide the calculation into cases. Another method that could erase the same sign was to square the error and unify the sign ... the least squares method! It became.

Figure_1

The general formula for the least squares method is as follows.

E(D)=\sum_{i=1}^{n} (y_i-\hat{y}_i)^2

In this example ...

E(D)=\sum_{i=1}^{n} (y_i-（\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3）)^2

E (D): Error function

y_i: Actual height data

n: Number of samples

\ hat {y}: Objective variable

\ beta_0: Bias

\ beta_1, \ beta_2, \ beta_3: Regression coefficient

x_1, x_2, x_3: Explanatory variables

It will be. (Supplement) In this example, $ y_i $ is the actual height data from A to E. In short, it is the sum of the squared error between the heights of Mr. A and Mr. E and the predicted heights of Mr. A and Mr. E. $ n $ is 5 in this example.

And I want to minimize this error function $ E (D) $! It means that.

Then what should I do ❓

What do you think of when you want to make it the minimum or the maximum? ❓ If you differentiate it to = 0, you can think of it as the maximum or minimum value-❓ I couldn't think of anything.

So, the following steps are the conclusion.

① Partially differentiate $ E (D) $ with $ \ beta_0, \ beta_1, \ beta_2, \ beta_3 $. It's a symbol like the one below.

\frac{\partial E(D)}{\partial \beta_i}=0

i=(0,1,2,3)

② And let 0 be the left side of the partial derivative. If the partial derivative is 0, then each β at the point where the slope is 0, that is, the point where E (D) is the minimum value can be obtained, so 0 is set. However, with this explanation alone, you may think that the point where the slope is 0 does not mean that E (D) is not the minimum value but the maximum value ❓. I will omit the detailed explanation, but if you fix $ \ beta_0 $ in the above formula of E (D), the maximum order of other $ \ beta_1, \ beta_2, \ beta_3 $ will be 2 (quadratic function). And since the sign is positive, it becomes a downwardly convex quadratic function, so it is the minimum value to be calculated as = 0 after partial differentiation. I think it's very difficult to understand about this, so I will post the URL of the easy-to-understand video of Mr. Yobinori.

[University Mathematics] Least Squares (Regression Analysis) [Probability Statistics]

③ Since we have partially differentiated each of $ \ beta_0, \ beta_1, \ beta_2, \ beta_3 $, we have four equations.

If you do your best to solve the simultaneous equations ...

You can find $ \ beta_0, \ beta_1, \ beta_2, \ beta_3 $ that minimizes $ E (D) $! I feel like.

The above derivation has three explanatory variables in this example, but even if this number increases, the derivation method is basically the same, and there is only a difference that makes it difficult to solve simultaneous equations.

And there is a formula for the form after derivation. (So you don't have to do the above annoying derivation lol) It's as follows. I will explain how to express the calculation explained before that with a matrix. (Because the formula after derivation is expressed by a matrix.)

E(D)=\sum_{i=1}^{n} (y_i-（\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3）)^2

And when the above formula is expressed by a matrix ...

E(D)=||y-X\beta||^2

|| ||If you expand inside ...

\begin{align}
 & = \left(
\begin{matrix} 
y_0 \\ 
y_1 \\
y_2 \\
y_3 \\
y_4 \\
y_5 \\
\end{matrix} 
\right)-
\left(
\begin{matrix}
 1 & x_{11} & x_{12} & x_{13} \\
 1 & x_{21} & x_{22} & x_{23} \\
 1 & x_{31} & x_{32} & x_{33} \\
 1 & x_{41} & x_{42} & x_{43} \\
 1 & x_{51} & x_{52} & x_{53} \\
\end{matrix} 
\right)
\left(
\begin{matrix} 
\beta_0 \\ 
\beta_1 \\
\beta_2 \\
\beta_3
\end{matrix} 
\right)\\
& =\left(
\begin{matrix} 
y_0 \\ 
y_1 \\
y_2 \\
y_3 \\
y_4 \\
y_5 \\
\end{matrix} 
\right)-
\left(
\begin{matrix}
 \beta_0+\beta_1x_{11}+\beta_2x_{12}+\beta_3x_{13} \\
 \beta_0+\beta_1x_{21}+\beta_2x_{22}+\beta_3x_{23} \\
 \beta_0+\beta_1x_{31}+\beta_2x_{32}+\beta_3x_{33} \\
 \beta_0+\beta_1x_{41}+\beta_2x_{42}+\beta_3x_{43} \\
 \beta_0+\beta_1x_{51}+\beta_2x_{52}+\beta_3x_{53} \\
\end{matrix} 
\right)
\end{align}

You can see that it has the same shape as $ E (D) $.

When each component of the above formula is represented by a matrix, it looks like this.

\beta = \left(
\begin{matrix} 
\beta_0 \\ 
\beta_1 \\
\beta_2 \\
\beta_3
\end{matrix} 
\right)

y = \left(
\begin{matrix} 
y_0 \\ 
y_1 \\
y_2 \\
y_3 \\
y_4 \\
y_5 \\
\end{matrix} 
\right)

X = \left(
\begin{matrix}
 1 & x_{11} & x_{12} & x_{13} \\
 1 & x_{21} & x_{22} & x_{23} \\
 1 & x_{31} & x_{32} & x_{33} \\
 1 & x_{41} & x_{42} & x_{43} \\
 1 & x_{51} & x_{52} & x_{53} \\
\end{matrix} 
\right)

Even if you suddenly say something like this, you don't understand the meaning at all. I will explain one by one.

$ \ beta $: This was the value I finally wanted to get to draw a nice line. The number is the number of explanatory variables + 1 (bias), so in this example, there are four in a column.

$ y $: This is the height data from Mr. A to Mr. E given as a sample from the top, so 5 items are listed in a column.

$ X $: I think I didn't understand this the most. The first row is all 1. This is because there is no need to apply anything to the bias, so it feels like one is applied, and the first row is all 1. If you look at the expanded formula above, you can see that it is correct because there is 1 in the first column. (In other literature, the bias is calculated separately, and there may be no column with only 1 in the first column.) The second column is the weight data of Mr. A to Mr. E from the top. The third column is the waist data of Mr. A to Mr. E from the top. The fourth column is the foot size data of Mr. A to Mr. E from the top. The number of rows is the number of samples, and the number of columns is the number of β to be obtained (explanatory variable + 1). For example, in the case of $ x_ {23} $, 25 cm of the size of Mr. B's foot will be included.

Now that you know how to express it in a matrix, let me tell you the formula for finding $ \ beta $.

\beta = (X^TX)^{-1}X^Ty

This. It may not come out well, but this is lol $ X ^ T $ is the transposed matrix of $ X $. A transposed matrix is a swap of rows and columns.

Let's actually use the formula to find it.

Here is the example data again. Mr. A ・・・ Weight: 50kg, Waist: 60cm, Foot size: 23cm → Height: 155cm Mr. B ・・・ Weight: 60kg, Waist: 70cm, Foot size: 25cm → Height: 162cm Mr. C ・・・ Weight: 70kg, Waist: 78cm, Foot size: 27cm → Height: 173cm Mr. D ・・・ Weight: 30kg, Waist: 50cm, Foot size: 19cm → Height: 135cm Mr. E ・・・ Weight: 80kg, Waist: 95cm, Foot size: 27cm → Height: 175cm In addition, a part of the following program referred to the following url. The following program uses a formula to calculate the regression coefficient and predicts the height of a person with [weight: 80 kg, waist: 90 cm, foot size: 27 cm]. [[Understanding in 5 minutes] Simple explanation of multiple regression analysis [with example]](https://mathmatical22.xyz/2019/09/13/ [Understanding in 5 minutes] Easy-to-understand and easy solution of multiple regression analysis /)

`python.`


# coding=utf-8
#Multiple regression analysis

def Multiple_regression(X, y):
    #Partial regression coefficient vector
    A = np.dot(X.T, X)  # X^T*X
    A_inv = np.linalg.inv(A)  # (X^T*X)^(-1)
    B = np.dot(X.T, y)  # X^T*y
    beta = np.dot(A_inv, B)
    return beta


#Explanatory variable matrix(Weight, waist, foot size)
X = np.array([[1, 50, 60, 23], [1, 60, 70, 25], [
             1, 70, 78, 27], [1, 30, 50, 19], [1, 80, 95, 27]])
#Objective variable vector (height)
y = np.array([[155], [162], [173], [135], [175]])
#Partial regression coefficient vector
beta = Multiple_regression(X, y)

print(beta)

predict_data = [1, 80, 90, 27]  #Data you want to predict


def predict(beta, predict_data):
    #Calculation of actual forecasts
    predict_tall = beta[0] * predict_data[0] + beta[1] * predict_data[1] + \
        beta[2] * predict_data[2] + beta[3] * predict_data[3]
    return predict_tall


tall = predict(beta, predict_data)

print(tall)

The results obtained are as follows.

β = [[90.85638298]
     [ 0.76276596]
     [-0.28723404]
     [ 1.86702128]]

Predicted height (weight:80kg,Waist:90cm,foot size:27 cm human)
y = 176.43617021cm

β is $ \ beta_0, \ beta_1, \ beta_2, \ beta_3 $ from the top.

If you apply it to the following formula and calculate ...

\hat{y}=\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3

\hat{y}=90.85638298+0.76276596 × 80 - 0.28723404 × 90 + 1.86702128 × 27 = 176.43617021

I was able to predict it to be 176 cm. It seems that it will be a good prediction somehow.

This is the end of the explanation. Thank you very much for your hard work. I wonder if I could understand the multiple regression analysis somehow. I hope you can understand the atmosphere alone.

To summarize the story this time ...

○ Multiple regression analysis is a power-up of simple regression analysis that can predict the objective variable from multiple explanatory variables.

○ For multiple regression analysis, it is most important to find the optimum regression coefficient. The least squares method is used there! $E(D)=\sum_{i=1}^{n} (y_i-（\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3）)^2$

Find the regression coefficient that minimizes the E (D) error function. The following formula can be obtained by doing our best by partially differentiating. $\beta = (X^TX)^{-1}X^Ty$

I would like to add a little more about multiple regression analysis other than this example.

Supplement ○ The multiple regression analysis that can be illustrated in the graph has up to two explanatory variables. This is because only the X-axis, y-axis, and Z-axis can be shown. Beyond that, it's an imaginary world, but the idea is basically the same as for simple regression analysis.

○ This time, I predicted my height by drawing a nice straight line, but there are things in the world that cannot be expressed by a straight line.

X = \left(
\begin{matrix}
 1 & x_{11} & x_{12} & x_{13} \\
 1 & x_{21} & x_{22} & x_{23} \\
 1 & x_{31} & x_{32} & x_{33} \\
 1 & x_{41} & x_{42} & x_{43} \\
 1 & x_{51} & x_{52} & x_{53} \\
\end{matrix} 
\right)

This time, the explanatory variables were predicted using a simple linear function, so the prediction line was a straight line. However, in multiple regression analysis, there are times when you want to draw a non-linear (non-linear shape, curved) prediction line. In such a case, it seems that the function of the matrix of explanatory variables may be expressed as quadratic or cubic. When you actually predict the curve, problems such as overfitting will occur next. Overfitting means that the forecast line is too fit for the given data and cannot be predicted properly. Please see the image below for the image. (Explanation of the image) Data group with light blue dots The prediction line you actually want to draw is the blue line The red line is overfitting and overfitting the given data set.