[PYTHON] Regression with linear model

Hello everyone. I am now [Machine learning starting with Python](https://www.amazon.co.jp/Python%E3%81%A7%E3%81%AF%E3%81%98%E3%82%81%E3 % 82% 8B% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92-% E2% 80% 95scikit-learn% E3% 81% A7% E5% AD% A6% E3% 81% B6% E7% 89% B9% E5% BE% B4% E9% 87% 8F% E3% 82% A8% E3% 83% B3% E3% 82% B8% E3% 83% 8B% E3% 82% A2% E3% 83% AA% E3% 83% B3% E3% 82% B0% E3% 81% A8% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% I am studying at 92% E3% 81% AE% E5% 9F% BA% E7% A4% 8E-Andreas-C-Muller / dp / 4873117984).

An interesting concept called linear regression has emerged, so I think I should summarize it a little.

What is regression in the first place?

Regression is the application of the model Y = f (X) to data when Y is a continuous value in statistics. In other words, fit the model between the dependent variable (objective variable) Y and the independent variable (explanatory variable) X of the continuous scale. If X is one-dimensional, it is called simple regression, and if X is two-dimensional or more, it is called multiple regression. Wikipedia

In other words, the purpose of regression is to land on ** continuous value prediction **. For example ... Predicting a person's ** annual income (objective variable) ** from educational background, age, and address (explanatory variable). Predict the ** yield (objective variable) ** of a corn farmer from the previous year's yield, weather, and number of employees (explanatory variable).

So what is regression with a linear model?

Regression with a linear model literally uses a linear function to predict the objective variable.

A general prediction formula by a linear model in a regression problem is

y = w[0] \times x[0] + w[1] \times x[1] + \dots + w[p] \times x[p] + b

Can be expressed as.

Here, x [0] ... x [p] indicates the ** features ** of one data point, w and b are the ** parameters ** of the trained model, and y is * from the model. * Forecast **.

In other words, when the optimized w and b are obtained (learned) from the training data and a new x [0] ... x [p] is entered, It comes down to outputting ** y ** as accurate as possible.

Here, there are various algorithms for regression using a linear model. The difference between these models lies in the method of finding (learning) ** w, b ** and the method of living the complexity of the model.

Today I would like to touch on the simplest and most classical linear regression method, ** usually least squares (OLS) **.

Usually least squares (OLS)

Let's consider one explanatory variable for simplicity. In other words, the formula is

y = wx + b

In this linear regression, w and b are calculated so that the mean square error between y and the predicted value is minimized in the training data. It is difficult to understand from the text, so if you think about it with an image, it is as follows.

IMG_2226.jpeg In other words, find ** w ** and ** b ** so that the sum of squares of the length of the blue line (error) is as small as possible, and the result is a red straight line.

If you write in mathematical formulas

Mean squared error=\frac{1}{n}\sum_{i=1}^{n} (y_i - y'_i)^2,  (y_i:Value in i-th training data, y'_i:i-th predicted value)

Select w and b so that this mean square error is small. This is OLS. It's easy and beautiful.

Sample code

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

import mglearn

X, y =mglearn.datasets.make_wave(n_samples=60)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

lr = LinearRegression().fit(X_train, y_train)

print("lr.coef_(Slope OR weight) : {}".format(lr.coef_))
print("lr.intercept_(Intercept) : {}".format(lr.intercept_))

print("Training set score: {:.2f}".format(lr.score(X_train, y_train)))
print("test set score: {:.2f}".format(lr.score(X_test, y_test)))

Click here for results

lr.coef_(Slope OR weight) : [0.39390555]
lr.intercept_(Intercept) : -0.031804343026759746
Training set score: 0.67
test set score: 0.66

Let's plot this straight line. Figure_1.png

However, the forecast is not very good at 66%. Perhaps this is underfitting.

Next time, we will talk about different regression models, Ridge and Lasso. Have a nice night, everyone. good night.

Recommended Posts

Regression with linear model
Linear regression with statsmodels
Predict hot summers with a linear regression model
Linear regression
[Python] Linear regression with scikit-learn
Robust linear regression with scikit-learn
Linear regression with Student's t distribution
<Course> Machine Learning Chapter 1: Linear Regression Model
Optimization learned with OR-Tools [Linear programming: multi-stage model]
Model fitting with lmfit
Machine learning linear regression
Implement a discrete-time logistic regression model with stan
Linear Programming with PuLP
Regression analysis with NumPy
Try regression with TensorFlow
Multivariable regression model with scikit-learn --SVR comparison verification
Getting Started with Tensorflow-About Linear Regression Hypothesis and Cost
Machine Learning: Supervised --Linear Regression
Multiple regression analysis with Keras
Ridge regression with Pyspark's Mllib
Linear regression method using Numpy
Calibrate the model with PyCaret
Online linear regression in Python
Implementing logistic regression with NumPy
For those who are analyzing in atmosphere (Linear Regression Model 1)
Try to implement linear regression using Pytorch with Google Colaboratory
Use MLflow with Databricks ④ --Call model -
SNS Flask (Model) edition made with Flask
Customize Model / Layer / Metric with TensorFlow
Machine learning beginners try linear regression
Django Model with left outer join
Simple classification model with neural network
Automatically generate model relationships with Django
Introduction to Bayesian Statistical Modeling with python ~ Trying Linear Regression with MCMC ~
Multi-input / multi-output model with Functional API
Regression model comparison-ARMA vs. Random Forest Regression
[Python] Mixed Gauss model with Pyro
Cat detection with OpenCV (model distribution)
Make a model iterator with PySide
Validate the learning model with Pylearn2
Logistic regression analysis Self-made with python
Linear regression (for beginners) -Code edition-
Seq2Seq (2) ~ Attention Model edition ~ with chainer
Inverted pendulum with model predictive control
Sine wave prediction (regression) with Pytorch
Try to infer using a linear regression model on android [PyTorch Mobile]
(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.
First TensorFlow (Revised) -Linear Regression and Logistic Regression
I tried multiple regression analysis with polynomial regression
Machine learning algorithm (generalization of linear regression)
Linear regression in Python (statmodels, scikit-learn, PyMC3)
Algorithm learned with Python 9th: Linear search
Let's tune the model hyperparameters with scikit-learn!
[Logistic regression] Implement k-validation with stats models
Online Linear Regression in Python (Robust Estimate)
Machine learning with python (2) Simple regression analysis
PRML implementation Chapter 3 Linear basis function model
Run the interaction model with Attention Seq2 Seq
Difference between linear regression, Ridge regression and Lasso regression
Plate reproduction of Bayesian linear regression (PRML §3.3)
Regression model and its visualization using scikit-learn