Easy Lasso regression analysis with Python (no theory)

For the time being, it is for those who want to perform LASSO regression analysis with python. Do not mess with the parameters. Click here for data to use https://gist.github.com/tijptjik/9408623

Import module for LASSO regression

Import only Lasso from sklearn.linear_model.

from sklearn.linear_model import Lasso

Import module to split data

Import only train_test_split from sklearn.model_selection.

from sklearn.model_selection import train_test_split

Import modules that handle matrices

Import numpy with the name np available.

import numpy as np

Import module that handles csv

Import pandas with the name pd available.

import pandas as pd

Import the module to draw the graph

import matplotlib.pyplot as plt

Import the module to find the mean square error

from sklearn.metrics import mean_squared_error

load csv

Load iris.csv into df (data frame).

df=pd.read_csv('wine_type.csv')
df=pd.read_csv('Desktop/Documents/wine.csv')

And so on. (Linux)

Divide the data for training and testing

Training: Learning = 6: 4.

df_train, df_test = train_test_split(df, test_size=0.4)

When the data is displayed, it looks like this.

df_train=
     wine_type  alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  flavanoids  nonflavanoid_phenols  proanthocyanins  color_intensity   hue  OD280/OD315_of_diluted_wines  proline
106          2    12.25        1.73  2.12               19.0         80           1.65        2.03                  0.37             1.63             3.40  1.00                          3.17      510
157          3    12.45        3.03  2.64               27.0         97           1.90        0.58                  0.63             1.14             7.50  0.67                          1.73      880
75           2    11.66        1.88  1.92               16.0         97           1.61        1.57                  0.34             1.15             3.80  1.23                          2.14      428
142          3    13.52        3.17  2.72               23.5         97           1.55        0.52                  0.50             0.55             4.35  0.89                          2.06      520
83           2    13.05        3.86  2.32               22.5         85           1.65        1.59                  0.61             1.62             4.80  0.84                          2.01      515
..         ...      ...         ...   ...                ...        ...            ...         ...                   ...              ...              ...   ...                           ...      ...
117          2    12.42        1.61  2.19               22.5        108           2.00        2.09                  0.34             1.61             2.06  1.06                          2.96      345
129          2    12.04        4.30  2.38               22.0         80           2.10        1.75                  0.42             1.35             2.60  0.79                          2.57      580
60           2    12.33        1.10  2.28               16.0        101           2.05        1.09                  0.63             0.41             3.27  1.25                          1.67      680
25           1    13.05        2.05  3.22               25.0        124           2.63        2.68                  0.47             1.92             3.58  1.13                          3.20      830
41           1    13.41        3.84  2.12               18.8         90           2.45        2.68                  0.27             1.48             4.28  0.91                          3.00     1035

[106 rows x 14 columns]

Separate the explanatory variable from the objective variable

Insert the column you want to use for analysis in x. (Explanatory variable) Insert a column of analysis results in y. (Objective variable) This time, we predict'proline' from'color_intensity'.

x_train = df_train[['color_intensity']]
x_test  = df_test[['color_intensity']]

y_train = df_train['proline']
y_test  = df_test['proline']

Make an empty model

lss = Lasso()

Learn regression

fit (explanatory variable, objective variable)

The training result is stored in the model lss created above.

lss.fit(x_train, y_train)

Make a regression

predict (data for regression analysis)

Regress with and assign to y_pred.

y_pred = lss.predict(x_test)

Try to display on the graph

You can make a scatter plot with plt.scatter (x-axis, y-axis). Display the correct answer. (Blue dot)

plt.scatter(x_test, y_test)

From the minimum value of x_test ["color_intensity"] to the maximum value, create an array in 0.1 increments and make a matrix. Then run lss.predict to display the predicted value. (Red dot)

x_for_plot = np.arange(np.min(x_test["color_intensity"])
                      ,np.max(x_test["color_intensity"]),0.1).reshape(-1,1)
plt.scatter(x_for_plot,lss.predict(x_for_plot),color="red")

Label settings

plt.xlabel("color_intensity")
plt.ylabel("proline")

display

plt.show()

Blue is the actual value and red is the predicted value. スクリーンショット 2019-12-01 22.08.18.png

Finally, find the mean square error.

print(mean_squared_error(y_test,y_pred)) #90027.41397601982 That's lol

I think that the accuracy will be improved by playing with the parameters.

Below is the code for copying

from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
df=pd.read_csv('wine.csv')
df_train, df_test = train_test_split(df, test_size=0.4)
x_train = df_train[['color_intensity']]
x_test  = df_test[['color_intensity']]

y_train = df_train['proline']
y_test  = df_test ['proline']
print(y_train)
lss = Lasso()
lss.fit(x_train, y_train)
y_pred = lss.predict(x_test)

plt.scatter(x_test, y_test)
x_for_plot = np.arange(np.min(x_test["color_intensity"]),np.max(x_test["color_intensity"]),0.1).reshape(-1,1)
plt.scatter(x_for_plot,lss.predict(x_for_plot),color="red")
plt.xlabel("color_intensity")
plt.ylabel("proline")
plt.show()

print(mean_squared_error(y_test,y_pred))

Recommended Posts

Easy Lasso regression analysis with Python (no theory)
[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]
Logistic regression analysis Self-made with python
Machine learning with python (2) Simple regression analysis
Data analysis with python 2
Voice analysis with python
Regression analysis with NumPy
Data analysis with Python
Regression analysis in Python
Calculate the regression coefficient of simple regression analysis with python
[Python] Morphological analysis with MeCab
Multiple regression analysis with Keras
Sentiment analysis with Python (word2vec)
Easy folder synchronization with Python
Planar skeleton analysis with Python
Easy Python compilation with NUITKA-Utilities
Simple regression analysis in Python
Muscle jerk analysis with Python
[Python] LASSO regression with equation constraints using the multiplier method
[Python] Easy parallel processing with Joblib
First simple regression analysis in Python
3D skeleton structure analysis with Python
Impedance analysis (EIS) with python [impedance.py]
Text mining with Python ① Morphological analysis
Easy email sending with haste python3
Bayesian optimization very easy with Python
Data analysis starting with python (data visualization 1)
Easy parallel execution with python subprocess
Easy modeling with Blender and Python
Data analysis starting with python (data visualization 2)
2. Multivariate analysis spelled out in Python 6-2. Ridge regression / Lasso regression (scikit-learn) [Ridge regression vs. Lasso regression]
Easy keyword extraction with TermExtract for Python
I tried multiple regression analysis with polynomial regression
[Python] Super easy test with assert statement
[Python] Easy argument type check with dataclass
[In-Database Python Analysis Tutorial with SQL Server 2017]
Marketing analysis with Python ① Customer analysis (decyl analysis, RFM analysis)
Suddenly with Python PyInstaller No module named pyinstaller
Two-dimensional saturated-unsaturated osmotic flow analysis with Python
2D FEM stress analysis program with Python
Easy introduction of speech recognition with Python
[Easy Python] Reading Excel files with openpyxl
2. Multivariate analysis spelled out in Python 6-1. Ridge regression / Lasso regression (scikit-learn) [multiple regression vs. ridge regression]
Tweet analysis with Python, Mecab and CaboCha
Easy web app with Python + Flask + Heroku
Principal component analysis with Power BI + Python
Easy image processing in Python with Pillow
Data analysis starting with python (data preprocessing-machine learning)
[Easy Python] Reading Excel files with pandas
Two-dimensional unsteady heat conduction analysis with Python
Easy web scraping with Python and Ruby
Python: Simplified morphological analysis with regular expressions
Solve Lake Counting (POJ NO.2386) with Python3
[Python] Easy Reinforcement Learning (DQN) with Keras-RL
Easy Python data analysis environment construction with Windows10 Pro x VS Code x Docker
[Python] Easy introduction to machine learning with python (SVM)
Csv output from Google search with [Python]! 【Easy】
[Various image analysis with plotly] Dynamic visualization with plotly [python, image]
Medical image analysis with Python 1 (Read MRI image with SimpleITK)
Static analysis of Python code with GitLab CI
Two-dimensional elastic skeleton geometric nonlinear analysis with Python