[PYTHON] Robust linear regression with scikit-learn

Overview

Introducing how to draw robust linear regression using Python's machine learning library sckit-learn. In this article, I created a chart object with python's drawing library altair and [Streamlit](https://qiita.com/keisuke- Display it on the browser using the application framework ota / items / a18f158389f1585a9aa0).

Features of robust linear regression

It is less susceptible to outliers than linear regression using the least squares method.

Creating a robust linear regression

Create a robust regression line using HuberRegressor. Note that streamlit is run with streamlit run filename.py

streamlit_robust_linear.py


import streamlit as st
import numpy as np
import pandas as pd
import altair as alt
from sklearn.linear_model import HuberRegressor
from sklearn.datasets import make_regression

#Demo data generation

rng = np.random.RandomState(0)
x, y, coef = make_regression( n_samples=200, n_features=1, noise=4.0, coef=True, random_state=0)
x[:4] = rng.uniform(10, 20, (4, 1))
y[:4] = rng.uniform(10, 20, 4)
df = pd.DataFrame({
    'x_axis': x.reshape(-1,),
    'y_axis': y
     }) 

#Set parameters for robust regression

epsilon = st.slider('Select epsilon', 
          min_value=1.00, max_value=10.00, step=0.01, value=1.35)

#Robust regression execution

huber = HuberRegressor(epsilon=epsilon
    ).fit(
    df['x_axis'].values.reshape(-1,1), 
    df['y_axis'].values.reshape(-1,1)
    )

#Scatter plot generation

plot = alt.Chart(df).mark_circle(size=40).encode(
    x='x_axis',
    y='y_axis',
    tooltip=['x_axis', 'y_axis']
).properties(
    width=500,
    height=500
).interactive()

#Get the coefficients of robust linear regression

a1 = huber.coef_[0]
b1 = huber.intercept_

#Specify the domain of the regression line

x_min = df['x_axis'].min()
x_max = df['x_axis'].max()

#Creating a regression line

points = pd.DataFrame({
    'x_axis': [x_min, x_max],
    'y_axis': [a1*x_min+b1, a1*x_max+b1],
})

line = alt.Chart(points).mark_line(color='steelblue').encode(
    x='x_axis',
    y='y_axis'
    ).properties(
    width=500,
    height=500
    ).interactive()

#Graph display

st.write(plot+line)

About parameters

Epsilon is a real number greater than or equal to 1 and represents the degree of influence of outliers. Default is set to 1.35. スクリーンショット 2020-10-17 12.14.39.png

The larger the Epsilon, the greater the effect of outliers. (The image is `ʻepsilon = 10``) スクリーンショット 2020-10-17 12.16.33.png

Creating a linear regression line by the least squares method

Replacing HuberRegressor with LinearRegression allows you to create a linear regression line using the least squares method.

Recommended Posts

Robust linear regression with scikit-learn
[Python] Linear regression with scikit-learn
Linear regression with statsmodels
Regression with linear model
Linear regression
Linear regression with Student's t distribution
Isomap with Scikit-learn
Linear regression in Python (statmodels, scikit-learn, PyMC3)
Online Linear Regression in Python (Robust Estimate)
Clustering with scikit-learn (1)
Clustering with scikit-learn (2)
PCA with Scikit-learn
kmeans ++ with scikit-learn
Predict hot summers with a linear regression model
Multivariable regression model with scikit-learn --SVR comparison verification
PCA with Scikit-learn
Background / moving object separation using dynamic mode decomposition
Moving average with numpy
Robust linear regression with scikit-learn
Multi-class SVM with scikit-learn
Clustering with scikit-learn + DBSCAN
Machine learning linear regression
Linear Programming with PuLP
DBSCAN (clustering) with scikit-learn
Regression analysis with NumPy
Try regression with TensorFlow
Install scikit.learn with pip
Calculate tf-idf with scikit-learn
Python Scikit-learn Linear Regression Analysis Nonlinear Simple Regression Analysis Machine Learning
Getting Started with Tensorflow-About Linear Regression Hypothesis and Cost
Solving the iris problem with scikit-learn ver1.0 (logistic regression)
Kernel regression with Numpy only
Machine Learning: Supervised --Linear Regression
Multiple regression analysis with Keras
Neural network with Python (scikit-learn)
Ridge regression with Pyspark's Mllib
Parallel processing with Parallel of scikit-learn
Linear regression method using Numpy
Online linear regression in Python
Classification / regression by stacking (scikit-learn)
Try to implement linear regression using Pytorch with Google Colaboratory
Implementing logistic regression with NumPy
[Machine learning] Understanding linear simple regression from both scikit-learn and mathematics
Introduction to Bayesian Statistical Modeling with python ~ Trying Linear Regression with MCMC ~
[Machine learning] Understanding linear multiple regression from both scikit-learn and mathematics
Grid search of hyperparameters with Scikit-learn
Creating a decision tree with scikit-learn
Image segmentation with scikit-image and scikit-learn
Machine learning beginners try linear regression
Identify outliers with RandomForestClassifier in scikit-learn
[Translation] scikit-learn 0.18 User Guide 1.15. Isotonic regression
Standardize non-normal distribution with robust Z-score
Non-negative Matrix Factorization (NMF) with scikit-learn
Scikit-learn DecisionTreeClassifier with datetime type values
Logistic regression analysis Self-made with python
Linear regression (for beginners) -Code edition-
Sine wave prediction (regression) with Pytorch
(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.