2. Multivariate analysis spelled out in Python 7-3. Decision tree [regression tree]

⑴ Import library

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.tree import DecisionTreeRegressor #A class that creates a regression tree model

⑵ Data acquisition and reading

from sklearn.datasets import load_boston
boston_dataset = load_boston()

** Build a regression tree model that predicts the price of a house using the 13 explanatory variables that characterize the house. ** **

#Store explanatory variables in DataFrame
boston = pd.DataFrame(boston_dataset.data, columns=boston_dataset.feature_names)

print(boston.head()) #Display the first 5 lines
print(boston.columns) #Show column name
print(boston.shape) #Check the shape

2_7_3_01.PNG

#Add objective variable
boston['MEDV'] = boston_dataset.target

print(boston.head()) #Display the first 5 lines
print(boston.shape) #Reconfirm the shape

2_7_3_02.PNG

⑶ Data division

#Convert dataset to Numpy array
array = boston.values

#Divide into explanatory variables and objective variables
X = array[:,0:13]
Y = array[:,13]
#Import module to split data
from sklearn.model_selection import train_test_split

#Split data
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1234)

⑷ Construction of regression tree model

#Model instantiation
reg = DecisionTreeRegressor(max_leaf_nodes = 20)

#Model generation by learning
model = reg.fit(X_train, Y_train)
print(model)

2_7_3_03.PNG

⑸ Evaluation of regression tree model

➀ Confirm the validity of the forecast

#Import Python standard pseudo-random number module
import random
random.seed(1)

#Randomly select id
id = random.randrange(0, X.shape[0], 1)
print(id)

2_7_3_04.PNG

#Extract the relevant sample from the original dataset
x = X[id]
x = x.reshape(1,13)

#Predict house prices from explanatory variables
YHat = model.predict(x)

#Convert the explanatory variable of the id to DataFrame
df = pd.DataFrame(x, columns = boston_dataset.feature_names)
#Added predicted value y
df["Predicted Price"] = YHat

2_7_3_05.PNG

boston.iloc[id]

2_7_3_06.PNG

➁ Check the coefficient of determination as an indicator of versatility

#Import the function to calculate the coefficient of determination
from sklearn.metrics import r2_score
YHat = model.predict(X_test)

2_7_3_07.PNG

r2 = r2_score(Y_test, YHat)
print("R^2 = ", r2)

2_7_3_08.PNG

⑹ Visualization of regression tree model

#Import sklearn tree module
from sklearn import tree

#Module to display images in Notebook
from IPython.display import Image

#Module for visualizing decision tree model
import pydotplus
#Convert decision tree model to DOT data
dot_data = tree.export_graphviz(model,
                                out_file = None,
                                feature_names = boston_dataset.feature_names,
                                class_names = 'MEDV',
                                filled = True)

#Draw a diagram
graph = pydotplus.graph_from_dot_data(dot_data)  

#View diagram
Image(graph.create_png())

2_7_3_09.PNG

Recommended Posts

2. Multivariate analysis spelled out in Python 7-3. Decision tree [regression tree]
2. Multivariate analysis spelled out in Python 7-1. Decision tree (scikit-learn)
2. Multivariate analysis spelled out in Python 7-2. Decision tree [difference in division criteria]
2. Multivariate analysis spelled out in Python 1-1. Simple regression analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 2-1. Multiple regression analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 1-2. Simple regression analysis (algorithm)
2. Multivariate analysis spelled out in Python 5-3. Logistic regression analysis (stats models)
2. Multivariate analysis spelled out in Python 6-2. Ridge regression / Lasso regression (scikit-learn) [Ridge regression vs. Lasso regression]
2. Multivariate analysis spelled out in Python 2-3. Multiple regression analysis [COVID-19 infection rate]
2. Multivariate analysis spelled out in Python 3-2. Principal component analysis (algorithm)
2. Multivariate analysis spelled out in Python 6-1. Ridge regression / Lasso regression (scikit-learn) [multiple regression vs. ridge regression]
2. Multivariate analysis spelled out in Python 8-2. K-nearest neighbor method [Weighting method] [Regression model]
2. Multivariate analysis spelled out in Python 3-1. Principal component analysis (scikit-learn)
2. Multivariate analysis spelled out in Python 6-3. Ridge regression / Lasso regression (scikit-learn) [How regularization works]
2. Multivariate analysis spelled out in Python 8-1. K-nearest neighbor method (scikit-learn)
2. Multivariate analysis spelled out in Python 8-3. K-nearest neighbor method [cross-validation]
Regression analysis in Python
Simple regression analysis in Python
First simple regression analysis in Python
Association analysis in Python
Multiple regression expressions in Python
Axisymmetric stress analysis in Python
[Python] Decision Tree Personal Tutorial
Online linear regression in Python
EEG analysis in Python: Python MNE tutorial
[Python] PCA scratch in the example of "Introduction to multivariate analysis"
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (2)
Compiler in Python: PL / 0 syntax tree
[Statistical test 2nd grade / quasi 1st grade] Regression analysis training in Python (1)
Planar skeleton analysis in Python (2) Hotfix
Algorithm (segment tree) in Python (practice)
Simple regression analysis implementation in Keras
Logistic regression analysis Self-made with python
Linear regression in Python (statmodels, scikit-learn, PyMC3)
Output tree structure of files in Python
Online Linear Regression in Python (Robust Estimate)
I implemented Cousera's logistic regression in Python
Draw a tree in Python 3 using graphviz
Delayed segment tree in Python (debug request)
Manipulate namespaced XML in Python (Element Tree)
Residual analysis in Python (Supplement: Cochrane rules)
Implemented in Python PRML Chapter 3 Bayesian Linear Regression
Survival time analysis learned in Python 2 -Kaplan-Meier estimator
Perform entity analysis using spaCy / GiNZA in Python
Data analysis in Python: A note about line_profiler
[Environment construction] Dependency analysis using CaboCha in Python 2.7
Compiler in Python: PL / 0 Abstract Syntax Tree (AST)
Create a decision tree from 0 with Python (1. Overview)
A well-prepared record of data analysis in Python
Put out a shortened URL string in Python
Easy Lasso regression analysis with Python (no theory)