I wrote a graph like R glmnet in Python for sparse modeling in Lasso

I wanted to write a graph that looks like it is drawn with R glmnet in Python

The problem with Lasso regression in Python is

For the time being, there are people who use Scikit-Learn in Python to calculate the Lasso regression. I also calculate with Python. However, I feel envious when I see the graph of Lasso of the person who is calculating with glmnet of R. The graph of R is such a graph. .. Reprinted from How to perform sparse estimation by LASSO with R / glmnet package.

It's easy to understand, isn't it? As $ λ $ (Scikit-Learn in Pyhon is ʻalpha) changes, it is obvious how the regression coefficient changes. But this is not implemented in Scikit-Learn`.

I also made such a graph! !! I thought, I made a script of python this time. You can't do sparse modeling without such a graph. Even if you give only the score and regression coefficient with Scikit-Learn, it seems that you have not seen the ability of sparse modeling performed with Lasso.

I tried to make a graph that seems to come out with glmnet with Python

That's why I made it myself using the For statement.

I thought about various things, but in a solid way, I repeated the Lasso regression while putting the numerical value created with numpy in the For statement in the alpha ( lambda), and calculated the regression coefficient and score. I calculated.

# -*- coding: utf-8 -*-
 
from sklearn import datasets
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso


#Load a Boston dataset
boston = datasets.load_boston()

alpha_lasso = []
coef_list = []
intercept_list = []
train_score = []
test_score = []

#print(boston['feature_names'])
#Separate features and objective variables
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8,random_state = 0)

#Determine the search range

lasso_1 = np.linspace(0.01,100,1000)

for i in lasso_1:
    #Train and create a Lasso regression model
    Lasso_regr = Lasso(alpha = i, max_iter=10000)

    Lasso_regr.fit(X_train,y_train)
    pre_train = Lasso_regr.predict(X_train)
    pre_test = Lasso_regr.predict(X_test)


    #View results
    print("alpha=",i)
    print("Fits training data")
    print("Training data accuracy=", Lasso_regr.score(X_train, y_train))
    print("Fits test data")
    print("Test data accuracy=", Lasso_regr.score(X_test, y_test))

    alpha_lasso.append(i)
    coef_list.append(Lasso_regr.coef_)
    intercept_list.append(Lasso_regr.intercept_)
    train_score.append(Lasso_regr.score(X_train, y_train))
    test_score.append(Lasso_regr.score(X_test, y_test))

df_count = pd.Series(alpha_lasso,name = 'alpha')
df_coef= pd.DataFrame(coef_list,columns = boston.feature_names)
df_inter = pd.Series(intercept_list,name = 'intercept')
df_train_score = pd.Series(train_score,name = 'trian_score')
df_test_score = pd.Series(test_score,name = 'test_score')

#Now make a graph of alpha and regression coefficients
plt.plot(df_count,df_coef)
plt.xscale('log')
plt.legend(labels = df_coef.columns,loc='lower right',fontsize=7)
plt.xlabel('alpha')
plt.ylabel('coefficient')
plt.title('alpha vs cosfficent graph like R/glmnet')

plt.show()

#Now make a graph of alpha and regression coefficients
df_score = pd.concat([df_train_score,df_test_score], axis=1)
plt.plot(df_count,df_score)
plt.xscale('log')
plt.legend(labels = df_score.columns,loc='lower right',fontsize=8)
plt.xlabel('alpha')
plt.ylabel('r2_score')
plt.title('alpha vs score(train/test)')

plt.show()

This is the completed graph. co_Figure 2020-08-14 121012.png

score_Figure 2020-08-14 121045.png

By comparing the graph that sparsifies the regression coefficient with the graph of the score in this way, you can see at a glance how much ʻalpha` should be done. I feel that 0.5 is better than the default 1. If it is larger than this, the score will drop, and if it is smaller than this, the score will not increase, but the features that are judged to be unnecessary when sparse will be included. So it feels like 0.5 is just right.

If you calculate with Lasso and do sparse modeling! !!

If you calculate with Lasso and do sparse modeling, you can see that these two graphs are also included in Python.