[PYTHON] I tried to predict Boston real estate prices with PyCaret

Introduction

I tried using the machine learning library PyCaret that was released the other day. Data feature analysis and performance comparison work with multiple models will be automated, and I think that the work time of data scientists so far will be significantly reduced.

This time, I will try to solve the Boston real estate problem of regression problem with PyCaret.

** Previous article: 1. I tried to classify wine quality with PyCaret ** ** 2. I tried to predict Titanic survival with PyCaret **

1. Install PyCaret

Execute the code below to install it. I'm using Anaconda, but I launched and installed a virtual environment dedicated to PyCaret. In an existing virtual environment managed by Conda, an error may occur. (Probably due to a collision between pip and conda)

pip install pycaret

2. Data acquisition

PyCaret provides several open source datasets with get_data (). You can check the list of provided datasets at the link below. https://pycaret.org/get-data/#datasets

This time we will use the Boston Real Estate Price Dataset.

from pycaret.datasets import get_data
dataset = get_data('boston')

Results image.png

Let's take a look at the contents of the data using Pandas profile_report ().

import pandas_profiling
dataset.profile_report()

result image.png

A description of the data.

The data size for Boston Real Estate is 506 rows x 14 columns. This data is a description of the explanatory variables.

  1. crim: Crime rate per capita by town

  2. zn: Percentage of residential areas divided into lots over 25,000 square feet.

  3. indus: Percentage of non-retailers per town (area ratio)

  4. chas: Charles River dummy variable (= 1 if the road touches the river; 0 others).

  5. nox: Nitrogen oxide concentration (1/10 million)

  6. rm: average number of rooms per dwelling

  7. age: Percentage of units inhabited by owners built before 1940. (Data set survey year is 1978)

  8. dis: Weighted average of distances to 5 Boston Employment Centers

  9. rad: Accessibility index for ring roads

  10. tax: Property tax rate per $ 10,000

  11. ptratio: Student-teacher ratio by town

  12. black: = 1000 (Bk-0.63) ^ 2, where Bk is the percentage of black people in the town.

  13. lstat: Low population status (%)

  14. medv (** Objective Variable **): Median home of the owner (\ $ 1000s)

3. Data preprocessing

Use sample () to divide the dataset 90% into training data and 10% into test data.

data = dataset.sample(frac=0.9, random_state=786).reset_index(drop=True)
data_unseen = dataset.drop(data.index).reset_index(drop=True)

print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))

Result (up to 10 items)

Data for Modeling: (455, 14) Unseen Data For Predictions: (51, 14)

Use setup () to preprocess the data. At that time, specify the objective variable as the argument target = medv.

from pycaret.regression import *
exp_reg101 = setup(data = data, target = 'medv',session_id=12) 

Result (up to 10 items) image.png

4. Model comparison

Use compare_models () to analyze the dataset using multiple regression models and summarize the results in a table. This is a very useful feature when considering which regression model to use.

There are more than 10 types of regression models provided by Pycaret, which can be confirmed at the links below.

https://pycaret.org/regression/

compare_models()

The catBoost Regressor had RMSE = 3.1399 and R ^ 2 = 0.859. This time, as it is a performance evaluation of PyCaret, we will proceed with the discussion with Linear Regression (R ^ 2 = 0.6739) in 8th place.

result image.png

5. Generation of analytical model

Select a classification model and model it. Use create_model (). This time, we will use the Linear Regression model.

lr = create_model('lr')

The average for R-2 was 0.6739. (k-fold method, n_fold = 10) result image.png

6. Tuning the analytical model

The model is also tuned using tune_model.

tuned_lr = tune_model('lr')

result image.png

The average of R ^ 2 before tuning was 0.6739, and the average after tuning was 0.6739, which did not improve. For Linear Regression, tuned_model () may not be very promising.

7. Visualization of analytical model

Visualize the analysis results using plot_model.

plot_model(tuned_lr)

result image.png

8. Evaluation of analytical model

It is possible to perform multiple evaluations at the same time using evaluate_model ().

evaluate_model(lr)

If you press the button in the yellow frame, each evaluation result will be displayed.

result image.png

9. Forecast

After finalizing the model with finalize_model (), make a prediction with predict_model (). At the time of prediction, test data (here, unseen_data) is used.

final_lr = finalize_model(tuned_lr)
unseen_predictions = predict_model(final_lr, data=data_unseen)
unseen_predictions.head()

The Label column represents the result of the prediction. The medv column is correct.

result image.png

10. Summary

  1. We analyzed the regression problem with PyCaret.

10.1 List of Pycaret functions used for analysis

  1. Data preprocessing: setup ()
  2. Compare models: compare_models ()
  3. Generate analytical model: create_model ()
  4. Tuning: tune_model ()
  5. Visualization: plot_model ()
  6. Evaluation: evaluate_model ()
  7. Prediction: finalize_model (), predict_model ()

11. References

1.PyCaret Home Page , http://www.pycaret.org/ 2.PyCaret Classification, https://pycaret.org/classification/ 3. I tried using PyCaret at the fastest speed, https://qiita.com/s_fukuzawa/items/5dd40a008dac76595eea 4. I tried to classify the quality of wine by PyCaret. https://qiita.com/kotai2003/items/c8fa7e55230d0fa0cc8e 5. I tried to predict the survival of Titanic with PyCaret. https://qiita.com/kotai2003/items/a377f45ddee9829ed2c5

Recommended Posts

I tried to predict Boston real estate prices with PyCaret
I tried to predict Titanic survival with PyCaret
I tried to predict next year with AI
I tried clustering with PyCaret
I tried to predict and submit Titanic survivors with Kaggle
I tried to describe the traffic in real time with WebSocket
I tried to implement Autoencoder with TensorFlow
I tried to visualize AutoEncoder with TensorFlow
I tried to get started with Hy
I tried to implement CVAE with PyTorch
I tried to solve TSP with QAOA
I tried to predict Covid-19 using Darts
How to write offline real time I tried to solve E11 with python
I tried to visualize the model with the low-code machine learning library "PyCaret"
I tried to predict the behavior of the new coronavirus with the SEIR model.
How to write offline real time I tried to solve E12 with python
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
I tried to use lightGBM, xgboost with Boruta
I tried to learn logical operations with TF Learn
I tried to move GAN (mnist) with keras
I tried to save the data with discord
I tried to detect motion quickly with OpenCV
I tried to integrate with Keras in TFv1.1
I tried to get CloudWatch data with Python
I tried to output LLVM IR with Python
I tried to detect an object with M2Det!
I tried to automate sushi making with python
I tried to operate Linux with Discord Bot
I tried to study DP with Fibonacci sequence
I tried to start Jupyter with Amazon lightsail
I tried to judge Tsundere with Naive Bayes
I tried to predict the price of ETF
I tried PyCaret2.0 (pycaret-nightly)
I tried to debug.
I tried using PyCaret
I tried to paste
I tried using PyCaret
Introduction to AI creation with Python! Part 2 I tried to predict the house price in Boston with a neural network
I tried to move machine learning (ObjectDetection) with TouchDesigner
I tried to extract features with SIFT of OpenCV
I tried to move Faster R-CNN quickly with pytorch
I tried to read and save automatically with VOICEROID2 2
I tried to implement and learn DCGAN with PyTorch
I tried to implement Minesweeper on terminal with python
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to draw a route map with Python
I tried to solve the soma cube with python
I tried to automatically read and save with VOICEROID2
I tried to get started with blender python script_Part 02
I tried to generate ObjectId (primary key) with pymongo
I tried to implement an artificial perceptron with python
I tried to build ML Pipeline with Cloud Composer
I tried to uncover our darkness with Chatwork API
I tried to automatically generate a password with Python3
[Introduction to Pytorch] I tried categorizing Cifar10 with VGG16 ♬
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python
I tried to implement Grad-CAM with keras and tensorflow
I tried to make an OCR application with PySimpleGUI