2. Multivariate analysis spelled out in Python 6-2. Ridge regression / Lasso regression (scikit-learn) [Ridge regression vs. Lasso regression]

2_6_2_01.PNG

** Here, I would like to compare three multiple regression models, including the lasso regression. ** **

⑴ Import library

#Data processing / calculation / analysis library
import numpy as np
import pandas as pd

#Graph drawing library
import matplotlib.pyplot as plt
%matplotlib inline

#Machine learning library
import sklearn

⑵ Data acquisition and reading

#Get data
url = 'https://raw.githubusercontent.com/yumi-ito/datasets/master/datasets_auto_4variables_pre-processed.csv'

#Read the acquired data as a DataFrame object
df = pd.read_csv(url, header=None)

#Set column label
df.columns = ['width', 'height', 'horsepower', 'price']

print(df)

2_6_2_02.PNG

#Confirmation of data shape
print('Data shape:', df.shape)

#Confirmation of missing values
print('Number of missing values:{}\n'.format(df.isnull().sum().sum()))

#Data type confirmation
print(df.dtypes)

2_6_2_03.PNG

(3) Division of training data and test data

#Import for model building
from sklearn.linear_model import Ridge, Lasso, LinearRegression

#Import for data splitting
from sklearn.model_selection import train_test_split
#Set explanatory variables and objective variables
x = df.drop('price', axis=1)
y = df['price']

#Divided into training data and test data
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.5, random_state=0)

⑷ Model generation and evaluation

#Initialize each class and store it in models of dict type variable
models = {
    'linear': LinearRegression(),
    'ridge': Ridge(random_state=0),
    'lasso': Lasso(random_state=0)}

#Initialize the dict type variable that stores the correct answer rate
scores = {}

#Generate each model in sequence, calculate the correct answer rate, and store it.
for model_name, model in models.items():
    #Model generation
    model.fit(X_train, Y_train)
    #Correct answer rate of training data
    scores[(model_name, 'train')] = model.score(X_train, Y_train)
    #Test data accuracy rate
    scores[(model_name, 'test')] = model.score(X_test, Y_test)

#Convert dict type to pandas one-dimensional list
print(pd.Series(scores))

2_6_2_04.PNG

Multiple regression Ridge regression Lasso return
Correct answer rate of training data 0.733358 0.733355 0.733358
Test data accuracy rate 0.737069 0.737768 0.737084

** So I would like to change the regularization parameters and compare. ** **

Regularization parameters

2_6_2_05.PNG

#parameter settings
alpha = 10.0

#Initialize each class and store in models
models = {
    'ridge': Ridge(alpha=alpha, random_state=0),
    'lasso': Lasso(alpha=alpha, random_state=0)}

#Initialize the dict type variable that stores the correct answer rate
scores = {}

#Execute each model in sequence and store the correct answer rate
for model_name, model in models.items():
    model.fit(X_train, Y_train)
    scores[(model_name, 'train')] = model.score(X_train, Y_train)
    scores[(model_name, 'test')] = model.score(X_test, Y_test)

print(pd.Series(scores))

2_6_2_06.PNG

λ Ridge(train) Ridge(test) Lasso(train) Lasso(test)
1 0.733355 0.737768 0.733358 0.737084
10 0.733100 0.743506 0.733357 0.737372
100 0.721015 0.771022 0.733289 0.740192
200 0.705228 0.778607 0.733083 0.743195
400 0.680726 0.779004 0.732259 0.748795
500 0.671349 0.777338 0.731640 0.751391
1000 0.640017 0.767504 0.726479 0.762336

Recommended Posts

2. Multivariate analysis spelled out in Python 6-2. Ridge regression / Lasso regression (scikit-learn) [Ridge regression vs. Lasso regression]
2. Multivariate analysis spelled out in Python 6-1. Ridge regression / Lasso regression (scikit-learn) [multiple regression vs. ridge regression]
2. Multivariate analysis spelled out in Python 6-3. Ridge regression / Lasso regression (scikit-learn) [How regularization works]
2. Multivariate analysis spelled out in Python 1-1. Simple regression analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 2-1. Multiple regression analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 7-1. Decision tree (scikit-learn)
2. Multivariate analysis spelled out in Python 1-2. Simple regression analysis (algorithm)
2. Multivariate analysis spelled out in Python 3-1. Principal component analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 8-1. K-nearest neighbor method (scikit-learn)
2. Multivariate analysis spelled out in Python 5-3. Logistic regression analysis (stats models)
2. Multivariate analysis spelled out in Python 2-3. Multiple regression analysis [COVID-19 infection rate]
2. Multivariate analysis spelled out in Python 8-2. K-nearest neighbor method [Weighting method] [Regression model]
2. Multivariate analysis spelled out in Python 3-2. Principal component analysis (algorithm)
2. Multivariate analysis spelled out in Python 8-3. K-nearest neighbor method [cross-validation]
2. Multivariate analysis spelled out in Python 7-2. Decision tree [difference in division criteria]
Regression analysis in Python
Simple regression analysis in Python
First simple regression analysis in Python
Linear regression in Python (statmodels, scikit-learn, PyMC3)
Easy Lasso regression analysis with Python (no theory)
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
Association analysis in Python
Multiple regression expressions in Python
Axisymmetric stress analysis in Python
[Python] Linear regression with scikit-learn
Online linear regression in Python
[Python] PCA scratch in the example of "Introduction to multivariate analysis"
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (2)
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (1)
EEG analysis in Python: Python MNE tutorial
I can't install scikit-learn in Python
Python unittest module execution in vs2017
Planar skeleton analysis in Python (2) Hotfix
Simple regression analysis implementation in Keras
Logistic regression analysis Self-made with python