[PYTHON] Linear regression method using Numpy

Introduction

This article was written by the author at https://www.udemy.com/share/1013lqB0AedFdUR34=0 The purpose is to review what I learned in

to write

--Light derivation of least squares --Convert from pandas DataFrame to numpy array and try to calculate

Derivation of least squares method

Suppose you are given a dataset [x, y]. x is the explanatory variable and y is the objective variable. For example, if you increase your height, you will gain weight, so in this case x = height and y = weight.

And I want to predict y from the given data x. Let the predicted value at that time be $ \ hat {y} $, and assume the following relational expression.

\hat{y} = ax + b

Here, the goal is to make $ \ hat {y} $ as close as possible to the correct answer value $ y $. Therefore

Error = y - \hat{y} = y - ax + b =0

It is important to find a and b that are. Please see the following links for the following explanations. http://arduinopid.web.fc2.com/P7.html

Linear regression with numpy

First import the module

python


import pandas as pd
from pandas import Series, DataFrame
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

Also download the dataset used this time

python


from sklearn.datasets import load_boston

boston = load_boston()

This time, we will use RM (average number of rooms per dwelling) and target (Price) in this data frame. Originally, we use sns.pairplot and sns.jointplot to search for variables that are likely to have a linear regression (proportional) relationship, but this time we will assume that these two variables have a proportional relationship in advance.

python


boston_df = DataFrame(boston.data)
#Give a column name
boston_df.columns = boston.feature_names
#Copy a new column because it is difficult to understand with target
boston_df['Price'] = boston.target
#Scatter plot and regression line display
sns.lmplot('RM', 'Price', data=boston_df)

b.png

Let's calculate this regression line. Use np.linalg.lstsq (X, Y). However, since this X requires an array with a specific shape, it is molded for that purpose.

python


X = boston_df.RM
Y = boston_df.Price
#[x,1]In the shape of
X = np.array([ [value[0], 1] for value in X])
#Convert to floating point type
X = X.astype(np.float64)
#a,Each predicted value is stored in b
a, b = np.linalg.lstsq(X, Y)[0]

This is the end of the calculation. Let's see the result

python


plt.plot(boston_df.RM, boston_df.Price, 'o')
x = boston_df.RM
plt.plot(x, a*x+b, 'r')

a.png

Supplement about np.linalg.lstsq

Click here for official documentation https://numpy.org/doc/stable/reference/generated/numpy.linalg.lstsq.html#numpy.linalg.lstsq

numpy.linalg.lstsq(a, b, rcond='warn')

--Parameters --Coefficient matrix a (M, N), independent variable b (M,) or (M, K), rcond

If np.linalg.lstsq (X, Y) [1], the total residuals can be taken out.

Recommended Posts

Linear regression method using Numpy
Linear regression
Regression analysis method
Regression using Gaussian process
Regression with linear model
Regression analysis with NumPy
Numpy Useful method list
Method call using __getattr__
Kernel regression with Numpy only
Gaussian process regression using GPy
Machine Learning: Supervised --Linear Regression
[Python] Calculation method with numpy
[Python] LASSO regression with equation constraints using the multiplier method
Data visualization method using matplotlib (1)
FFT & trend removal using Numpy
Data visualization method using matplotlib (2)
[Python] Linear regression with scikit-learn
Online linear regression in Python
SQL connection method using pyodbc
Try to implement linear regression using Pytorch with Google Colaboratory
Implementing logistic regression with NumPy
Robust linear regression with scikit-learn
[Statistics] [R] Try using quantile regression.
Machine learning beginners try linear regression
Linear regression with Student's t distribution
Noise removal method using wavelet transform
Data visualization method using matplotlib (+ pandas) (3)
Image binarization using linear discriminant analysis
Linear regression (for beginners) -Code edition-
Data visualization method using matplotlib (+ pandas) (4)
Try to infer using a linear regression model on android [PyTorch Mobile]