[PYTHON] Multiple regression analysis with Keras

Multiple regression analysis with Keras

I tried a simple multiple regression analysis with deep learning (or Keras). Deep learning has the image of classification problems and reinforcement learning, but it does not mean that regression analysis cannot be performed separately. Neural networks are also used for regression analysis, so it is an attempt to perform regression analysis even in deep learning.

The code I made is here. https://github.com/shibuiwilliam/keras_regression_sample

What to do this time

Multiple regression analysis is performed using Keras's Keras Regressor API. The data is sample data of diabetic patients provided by scikit-learn. It is often used in regression analysis, and it is small and convenient data.

The purpose of this time is to write a procedure to perform regression analysis in deep learning and neural networks. However, creating a regression analysis model with deep learning does not improve accuracy.

Also, please note that this time, regression analysis is not time series numerical data prediction by RNN or LSTM.

Supplement: About machine learning

If you position the model of machine learning and deep learning ** very roughly **, it will look like this.

1.png

I don't think this is all, as new papers and models are being proposed on a daily basis, but it's a rough image. This time, what we will do is DNN.

Advance preparation

Load the data as a preliminary preparation.

# import libraries
import numpy as np
import pandas as pds
from keras.models import Sequential
from keras.layers import Input, Dense, Dropout, BatchNormalization
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_diabetes

# use diabetes sample data from sklearn
diabetes = load_diabetes()

# load them to X and Y
X = diabetes.data
Y = diabetes.target

Data like this will be loaded.

2.JPG

It seems that it has already been normalized. It is a small sample data with 442 lines and 10 input variables.

8.JPG

Click here for details of the data. http://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf

KerasRegressor Keras provides an API for regression analysis called Keras Regressor. https://keras.io/ja/scikit-learn-api/

Keras itself doesn't give much detail, but the point is that it seems to be a wrapper for scikit-learn's regression model. Perhaps Keras Regressor was created to work with scikit-learn's useful metric APIs for regression analysis (such as cross_val_score and mean_squared_error).

The way to write a neural network model is Keras itself. First, let's make a simple model (one layer each for the input layer, middle layer, and output layer).

# create regression model
def reg_model():
    model = Sequential()
    model.add(Dense(10, input_dim=10, activation='relu'))
    model.add(Dense(16, activation='relu'))
    model.add(Dense(1))

    # compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

It looks like this when I take a summary.

3.JPG

Up to this point, the conventional Keras remains the same. The difference from the past is how to write fit when learning.

There are roughly two ways to learn.

  1. Learn by separating training data and test data
  2. Learn with cross-validation

It seems that a general method can be used for regression analysis.

Example 1 Learning separately from training data and test data

Let's learn the above simple model separately from training data and test data.

# use data split and fit to run the model
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.1, random_state=0)
estimator = KerasRegressor(build_fn=reg_model, epochs=100, batch_size=10, verbose=0)
estimator.fit(x_train, y_train)
y_pred = estimator.predict(x_test)

# show its root mean square error
mse = mean_squared_error(y_test, y_pred)
print("KERAS REG RMSE : %.2f" % (mse ** 0.5))

Finally, the standard output gives the square root of the mean squared error (root mean squared erro). The writing style is scikit-learn-like (but Keras is scikit-learn-like in the first place).

Example 2 Learning by cross-validation

Let's continue learning with cross-validation.

# use Kfold and cross validation to run the model
seed = 7
np.random.seed(seed)
estimator = KerasRegressor(build_fn=reg_model, epochs=100, batch_size=10, verbose=0)
kfold = KFold(n_splits=10, random_state=seed)

# show its root mean square error
results = cross_val_score(estimator, X, Y, scoring='neg_mean_squared_error', cv=kfold)
mse = -results.mean()
print("KERAS REG RMSE : %.2f" % (mse ** 0.5))

Here, too, the square root of the mean square error is given at the end. Let's arrange each result.

9.JPG

Well, it doesn't make much difference.

Let's try deepening the network layer

I have done multiple regression analysis with a simple neural network so far. Now let's try deepening the network layer.

# create deep learning like regression model
def deep_reg_model():
    model = Sequential()
    model.add(Dense(10, input_dim=10, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.2))
    model.add(Dense(256, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.2))
    model.add(Dense(128, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.2))
    model.add(Dense(64, activation='relu'))
    model.add(BatchNormalization())
    model.add(Dropout(0.2))
    model.add(Dense(1))

    # compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

Since it's a big deal, I added Batch normalization and Dropout.

6.JPG

Let's learn.

# use data split and fit to run the model
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.1, random_state=0)
estimator = KerasRegressor(build_fn=deep_reg_model, epochs=100, batch_size=10, verbose=0)
estimator.fit(x_train, y_train)
y_pred = estimator.predict(x_test)

# show its root mean square error
mse = mean_squared_error(y_test, y_pred)
print("KERAS REG RMSE : %.2f" % (mse ** 0.5))


# use Kfold and cross validation to run the model
seed = 7
np.random.seed(seed)
estimator = KerasRegressor(build_fn=deep_reg_model, epochs=100, batch_size=10, verbose=0)
kfold = KFold(n_splits=10, random_state=seed)

# show its root mean square error
results = cross_val_score(estimator, X, Y, scoring='neg_mean_squared_error', cv=kfold)
mse = -results.mean()
print("KERAS REG RMSE : %.2f" % (mse ** 0.5))


10.JPG

It's not much different from a simple network. Considering the calculation time, there is no point in deepening it.

Finally

I tried multiple regression analysis with Keras Regressor. Perhaps there are a lot of people who have tried the same thing, but the reason why there aren't many examples of google is probably because the accuracy hasn't improved dramatically (Nageyari). Well, if you try with larger and more complex data, you may be able to say something different, so if you find data that looks good, try again.

reference

https://keras.io/ja/scikit-learn-api/ http://machinelearningmastery.com/regression-tutorial-keras-deep-learning-library-python/ http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html http://qiita.com/TomokIshii/items/f355d8e87d23ee8e0c7a http://s0sem0y.hatenablog.com/entry/2016/05/22/215529

Recommended Posts

Multiple regression analysis with Keras
I tried multiple regression analysis with polynomial regression
Regression analysis with NumPy
Machine learning algorithm (multiple regression analysis)
Simple regression analysis implementation in Keras
Logistic regression analysis Self-made with python
Machine learning with python (2) Simple regression analysis
Poisson regression analysis
Regression analysis method
[scikit-learn, matplotlib] Multiple regression analysis and 3D drawing
Creating multiple output models for regression analysis [Beginner]
Easy Lasso regression analysis with Python (no theory)
Data analysis with python 2
Multiple selections with Jupyter
Linear regression with statsmodels
Image recognition with keras
Add a constant term (y-intercept) when performing multiple regression analysis with Python's Statsmodels
Dependency analysis with CaboCha
Voice analysis with python
CIFAR-10 tutorial with Keras
Regression with linear model
Multivariate LSTM with Keras
Basics of regression analysis
Voice analysis with python
Dynamic analysis with Valgrind
Try regression with TensorFlow
Data analysis with Python
Regression analysis in Python
Deep learning image analysis starting with Kaggle and Keras
Calculate the regression coefficient of simple regression analysis with python
2. Multivariate analysis spelled out in Python 2-1. Multiple regression analysis (scikit-learn)
Kernel regression with Numpy only
[Co-occurrence analysis] Easy co-occurrence analysis with Python! [Python]
Install Keras (used with Anaconda)
What is Logistic Regression Analysis?
Ridge regression with Pyspark's Mllib
Auto Encodder notes with Keras
Implemented word2vec with Theano + Keras
Sentiment analysis with Python (word2vec)
Sentence generation with GRU (keras)
Tuning Keras parameters with Keras Tuner
Manipulate multiple proxies with Squid
Texture analysis learned with pyradiomics
Planar skeleton analysis with Python
Japanese morphological analysis with Python
Easily build CNN with Keras
Simple regression analysis in Python
[Python] Linear regression with scikit-learn
Implemented Efficient GAN with keras
Animate multiple graphs with matplotlib
Control multiple robots with jupyter-lab
Muscle jerk analysis with Python
[PowerShell] Morphological analysis with SudachiPy
Text sentiment analysis with ML-Ask
Image recognition with Keras + OpenCV
Implementing logistic regression with NumPy
Robust linear regression with scikit-learn
2. Multivariate analysis spelled out in Python 2-3. Multiple regression analysis [COVID-19 infection rate]
MNIST (DCNN) with Keras (TensorFlow backend)
First simple regression analysis in Python
Scraping multiple pages with Beautiful Soup