I wrote a graph like R glmnet in Python for sparse modeling in Lasso

I wanted to write a graph that looks like it is drawn with R glmnet in Python

The problem with Lasso regression in Python is

For the time being, there are people who use Scikit-Learn in Python to calculate the Lasso regression. I also calculate with Python. However, I feel envious when I see the graph of Lasso of the person who is calculating with glmnet of R. The graph of R is such a graph. .. Reprinted from How to perform sparse estimation by LASSO with R / glmnet package. glmnet-lasso-solution-path.png

It's easy to understand, isn't it? As $ λ $ (Scikit-Learn in Pyhon is ʻalpha) changes, it is obvious how the regression coefficient changes. But this is not implemented in Scikit-Learn`.

I also made such a graph! !! I thought, I made a script of python this time. You can't do sparse modeling without such a graph. Even if you give only the score and regression coefficient with Scikit-Learn, it seems that you have not seen the ability of sparse modeling performed with Lasso.

I tried to make a graph that seems to come out with glmnet with Python

That's why I made it myself using the For statement.

I thought about various things, but in a solid way, I repeated the Lasso regression while putting the numerical value created with numpy in the For statement in the alpha ( lambda), and calculated the regression coefficient and score. I calculated.

# -*- coding: utf-8 -*-
 
from sklearn import datasets
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso


#Load a Boston dataset
boston = datasets.load_boston()

alpha_lasso = []
coef_list = []
intercept_list = []
train_score = []
test_score = []

#print(boston['feature_names'])
#Separate features and objective variables
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8,random_state = 0)

#Determine the search range

lasso_1 = np.linspace(0.01,100,1000)

for i in lasso_1:
    #Train and create a Lasso regression model
    Lasso_regr = Lasso(alpha = i, max_iter=10000)

    Lasso_regr.fit(X_train,y_train)
    pre_train = Lasso_regr.predict(X_train)
    pre_test = Lasso_regr.predict(X_test)


    #View results
    print("alpha=",i)
    print("Fits training data")
    print("Training data accuracy=", Lasso_regr.score(X_train, y_train))
    print("Fits test data")
    print("Test data accuracy=", Lasso_regr.score(X_test, y_test))

    alpha_lasso.append(i)
    coef_list.append(Lasso_regr.coef_)
    intercept_list.append(Lasso_regr.intercept_)
    train_score.append(Lasso_regr.score(X_train, y_train))
    test_score.append(Lasso_regr.score(X_test, y_test))

df_count = pd.Series(alpha_lasso,name = 'alpha')
df_coef= pd.DataFrame(coef_list,columns = boston.feature_names)
df_inter = pd.Series(intercept_list,name = 'intercept')
df_train_score = pd.Series(train_score,name = 'trian_score')
df_test_score = pd.Series(test_score,name = 'test_score')

#Now make a graph of alpha and regression coefficients
plt.plot(df_count,df_coef)
plt.xscale('log')
plt.legend(labels = df_coef.columns,loc='lower right',fontsize=7)
plt.xlabel('alpha')
plt.ylabel('coefficient')
plt.title('alpha vs cosfficent graph like R/glmnet')

plt.show()

#Now make a graph of alpha and regression coefficients
df_score = pd.concat([df_train_score,df_test_score], axis=1)
plt.plot(df_count,df_score)
plt.xscale('log')
plt.legend(labels = df_score.columns,loc='lower right',fontsize=8)
plt.xlabel('alpha')
plt.ylabel('r2_score')
plt.title('alpha vs score(train/test)')

plt.show()

This is the completed graph. co_Figure 2020-08-14 121012.png

score_Figure 2020-08-14 121045.png

By comparing the graph that sparsifies the regression coefficient with the graph of the score in this way, you can see at a glance how much ʻalpha` should be done. I feel that 0.5 is better than the default 1. If it is larger than this, the score will drop, and if it is smaller than this, the score will not increase, but the features that are judged to be unnecessary when sparse will be included. So it feels like 0.5 is just right.

If you calculate with Lasso and do sparse modeling! !!

If you calculate with Lasso and do sparse modeling, you can see that these two graphs are also included in Python.

Recommended Posts

I wrote a graph like R glmnet in Python for sparse modeling in Lasso
A memo that I wrote a quicksort in Python
I wrote a class in Python3 and Java
I wrote python in Japanese
I made a puzzle game (like) with Tkinter in Python
I wrote Fizz Buzz in Python
I wrote the queue in Python
I wrote the stack in Python
I wrote a function to load a Git extension script in Python
I wrote a script to extract a web page link in Python
I wrote a tri-tree that can be used for high-speed dictionary implementation in D language and Python.
I wrote a code to convert quaternions to z-y-x Euler angles in Python
I made a payroll program in Python!
[Python] I forcibly wrote a short Perlin noise generation function in Numpy.
Get a token for conoha in python
I searched for prime numbers in python
I created a password tool in Python.
I wrote FizzBuzz in python using a support vector machine (library LIVSVM).
[Fundamental Information Technology Engineer Examination] I wrote a linear search algorithm in Python.
I searched for the skills needed to become a web engineer in Python
I made a python dictionary file for Neocomplete
Draw a graph of a quadratic function in Python
I want to create a window in Python
Create a standard normal distribution graph in Python
I tried playing a typing game in Python
I wrote "Introduction to Effect Verification" in Python
[Memo] I tried a pivot table in Python
I wrote a design pattern in kotlin Prototype
Make a joyplot-like plot of R in python
I tried adding a Python3 module in C
I wrote a Japanese parser in Japanese using pyparsing.
I made a Caesar cryptographic program in Python.
I created a stacked bar graph with matplotlib in Python and added a data label
The concept of reference in Python collapsed for a moment, so I experimented a little.
Draw a graph in Julia ... I tried a little analysis
I tried to graph the packages installed in Python
I want to embed a variable in a Python string
I want to easily implement a timeout in python
I made a prime number generation program in Python
I wrote a design pattern in kotlin Factory edition
Building a Docker working environment for R and Python
I wrote a design pattern in kotlin Builder edition
I want to write in Python! (2) Let's write a test
Until drawing a 3D graph in Python on windows10
I wrote a design pattern in kotlin Singleton edition
Try searching for a million character profile in Python
I made a VM that runs OpenCV for Python
I wrote a design pattern in kotlin Adapter edition
I tried to implement a pseudo pachislot in Python
I wrote a design pattern in kotlin, Iterator edition
I want to randomly sample a file in Python
I want to work with a robot in python.
[Python] I made a classifier for irises [Machine learning]
I made a prime number generation program in Python 2
[Python, ObsPy] I wrote a beach ball with Matplotlib + ObsPy
I wrote a design pattern in kotlin Template edition
I want to use the R dataset in python
I want to manipulate strings in Kotlin like Python!
Set a proxy for Python pip (described in pip.ini)
I implemented a Vim-like replacement command in Slackbot #Python
I wrote python3.4 in .envrc with direnv and allowed it, but I got a syntax error