2. Multivariate analysis spelled out in Python 6-3. Ridge regression / Lasso regression (scikit-learn) [How regularization works]

⑴ Import library

#Data processing / calculation / analysis library
import numpy as np
import pandas as pd

#Graph drawing library
import matplotlib.pyplot as plt
%matplotlib inline

#Machine learning library
import sklearn
from sklearn.linear_model import Ridge, Lasso #Class for regression model generation
#Module to make matplotlib support Japanese display
!pip install japanize-matplotlib
import japanize_matplotlib

⑵ Data acquisition and reading

#Get data
url = 'https://raw.githubusercontent.com/yumi-ito/sample_data/master/ridge_lasso_50variables.csv'

#Read the acquired data as a DataFrame object
df = pd.read_csv(url)

print(df)

2_6_3_01.PNG

#Create explanatory variable x by deleting the "y" column
x = df.drop('y', axis=1)

#Extract the "y" column to create the objective variable y
y = df['y']

(3) Generation of regularization parameter λ

# λ(alpha)Generate 50 ways
num_alphas = 50
alphas = np.logspace(-2, 0.7, num_alphas)

print(alphas)

2_6_3_02.PNG

np.log10(alphas)

2_6_3_03.PNG

Logarithmic scale

2_6_3_09.PNG

⑷ Estimate by ridge regression

#Variable to store regression coefficients
ridge_coefs = []

#Repeat the estimation of ridge regression while exchanging alpha
for a in alphas:
    ridge = Ridge(alpha = a, fit_intercept = False)
    ridge.fit(x, y)
    ridge_coefs.append(ridge.coef_)
#Convert the accumulated regression coefficients to a numpy array
ridge_coefs = np.array(ridge_coefs)

print("Array shape:", ridge_coefs.shape)
print(ridge_coefs)

2_6_3_05.PNG

#Logarithmic conversion of alphas(-log10)
log_alphas = -np.log10(alphas)

#Specifying the size of the graph area
plt.figure(figsize = (8,6))

#Line graph with λ on the x-axis and coefficients on the y-axis
plt.plot(log_alphas, ridge_coefs)

#Explanatory variable x_Show 1
plt.text(max(log_alphas) + 0.1, np.array(ridge_coefs)[0,0], "x_1", fontsize=13)

#Specify x-axis range
plt.xlim([min(log_alphas) - 0.1, max(log_alphas) + 0.3])

#Axis label
plt.xlabel("Regularization parameter λ(-log10)", fontsize=13)
plt.ylabel("Regression coefficient", fontsize=13)

#Scale line
plt.grid()

2_6_3_06.PNG

⑸ Estimate by lasso regression

#Variable to store regression coefficients
lasso_coefs = []

#Repeat the estimation of the lasso regression while exchanging alpha
for a in alphas:
    lasso = Lasso(alpha = a, fit_intercept = False)
    lasso.fit(x, y)
    lasso_coefs.append(lasso.coef_)
#Convert the accumulated regression coefficients to a numpy array
lasso_coefs = np.array(lasso_coefs)

print("Array shape:", lasso_coefs.shape)
print(lasso_coefs)

2_6_3_07.PNG

#Specifying the size of the graph area
plt.figure(figsize = (8,6))

#Line graph with λ on the x-axis and coefficients on the y-axis
plt.plot(log_alphas, lasso_coefs)

#Explanatory variable x_Show 1
plt.text(max(log_alphas) + 0.1, np.array(lasso_coefs)[0,0], "x_1", fontsize=13)

#Specify x-axis range
plt.xlim([min(log_alphas) - 0.1, max(log_alphas) + 0.3])

#Axis label
plt.xlabel("Regularization parameter λ(-log10)", fontsize=13)
plt.ylabel("Regression coefficient", fontsize=13)

#Scale line
plt.grid()

2_6_3_08.PNG

Summary

Recommended Posts

2. Multivariate analysis spelled out in Python 6-3. Ridge regression / Lasso regression (scikit-learn) [How regularization works]
2. Multivariate analysis spelled out in Python 6-2. Ridge regression / Lasso regression (scikit-learn) [Ridge regression vs. Lasso regression]
2. Multivariate analysis spelled out in Python 6-1. Ridge regression / Lasso regression (scikit-learn) [multiple regression vs. ridge regression]
2. Multivariate analysis spelled out in Python 1-1. Simple regression analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 2-1. Multiple regression analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 7-3. Decision tree [regression tree]
2. Multivariate analysis spelled out in Python 7-1. Decision tree (scikit-learn)
2. Multivariate analysis spelled out in Python 1-2. Simple regression analysis (algorithm)
2. Multivariate analysis spelled out in Python 3-1. Principal component analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 8-1. K-nearest neighbor method (scikit-learn)
2. Multivariate analysis spelled out in Python 5-3. Logistic regression analysis (stats models)
2. Multivariate analysis spelled out in Python 2-3. Multiple regression analysis [COVID-19 infection rate]
2. Multivariate analysis spelled out in Python 8-2. K-nearest neighbor method [Weighting method] [Regression model]
2. Multivariate analysis spelled out in Python 3-2. Principal component analysis (algorithm)
2. Multivariate analysis spelled out in Python 8-3. K-nearest neighbor method [cross-validation]
2. Multivariate analysis spelled out in Python 7-2. Decision tree [difference in division criteria]
Regression analysis in Python
Simple regression analysis in Python
First simple regression analysis in Python
Linear regression in Python (statmodels, scikit-learn, PyMC3)
Easy Lasso regression analysis with Python (no theory)
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
Association analysis in Python