[PYTHON] Automatic brute force machine learning (regression analysis) -This greatly reduces the time for parameter tuning-

RegressionAutomaticSearch

I created a program that performs regression analysis with a model whose parameters have been changed by arbitrary machine learning. Now let's find the best model and parameters. This time, I will try to predict the house price in Boston. The model uses LinearRegression, DecisionTree, RandomForest, and AdaBoost.

Creating a virtual environment

Create venv

C:\RegressionAutomaticSearch>py -m venv venv

Apply venv

C:\RegressionAutomaticSearch>.\venv\Scripts\activate.bat
(venv) C:\RegressionAutomaticSearch>

Package installation

Package update

(venv)C:\RegressionAutomaticSearch>python -m pip install --upgrade pip

Bulk installation of required packages

(venv) C:\RegressionAutomaticSearch>pip install -r requirements.txt

Data set replacement

Change to the path of the dataset you want to regress.


###########################
# read datasets
#
#If there is an index on the far left
df = pd.read_csv('./datasets/boston_datasets.csv', index_col=0)

The contents are like this. Reads a file with the same explanatory variable and objective variable.

      CRIM    ZN  INDUS  CHAS    NOX     RM   AGE     DIS  RAD    TAX  PTRATIO       B  LSTAT  MONEY
0  0.00632  18.0   2.31   0.0  0.538  6.575  65.2  4.0900  1.0  296.0     15.3  396.90   4.98   24.0
1  0.02731   0.0   7.07   0.0  0.469  6.421  78.9  4.9671  2.0  242.0     17.8  396.90   9.14   21.6
2  0.02729   0.0   7.07   0.0  0.469  7.185  61.1  4.9671  2.0  242.0     17.8  392.83   4.03   34.7
3  0.03237   0.0   2.18   0.0  0.458  6.998  45.8  6.0622  3.0  222.0     18.7  394.63   2.94   33.4
4  0.06905   0.0   2.18   0.0  0.458  7.147  54.2  6.0622  3.0  222.0     18.7  396.90   5.33   36.2

Change objective variable

Drop the explanatory variable from the data frame.

#To the explanatory variable"Everything except MONEY"use
boston_X = df.drop("MONEY", axis=1)
X = boston_X.values

Parameter adjustment

Adjust the parameters in the list passed to the model.

    def model_import(self):
        models_names = [ self.LinearRegression(), 
                         self.DecisionTreeRegressor(list(range(2, 30, 2))),
                         self.RandomForestRegressor(list(range(2, 30, 2)), list(range(20, 200, 20))),
                         self.AdaBoostRegressor(list(range(20, 200, 20)))]
        models = []
        names  = []
        for model_, name_ in models_names:
            if isinstance(model_, list):
                models.extend(model_)
                names.extend(name_)
            else:
                models.append(model_)
                names.append(name_)
        return models, names 

Perform regression

Execute main.py.

(venv) C:\RegressionAutomaticSearch>python main.py  

Regression results

The result image, error, and coefficient of determination csv are output to the result folder. fig_demo.PNG result_MSE_R2.PNG

Source code

Put the source code on github at the link below. There are still more areas to adjust, so I will update it from time to time.

https://github.com/upamasaki/RegressionAutomaticSearch

Recommended Posts

Automatic brute force machine learning (regression analysis) -This greatly reduces the time for parameter tuning-
Python learning memo for machine learning by Chainer Chapter 7 Regression analysis
If you're learning Linux for the first time, do this!
Machine learning algorithm (multiple regression analysis)
Machine learning algorithm (simple regression analysis)
I tried logistic regression analysis for the first time using Titanic data
[Python] [Machine learning] Beginners without any knowledge try machine learning for the time being
One-click data prediction for the field realized by fully automatic machine learning
Machine learning with python (2) Simple regression analysis
[Machine learning] Regression analysis using scikit learn
Looking back on the machine learning competition that I worked on for the first time
GTUG Girls + PyLadiesTokyo Meetup I went to machine learning for the first time
Upgrade the Azure Machine Learning SDK for Python
virtualenv For the time being, this is all!
Write this in the ttl file for the time being for automatic SSH login with Teraterm