[PYTHON] LightGBM/XGBoost tree structure visualization memo

Introduction

A reminder when I tried dtreeviz and plot_tree because I wanted to visualize the tree structure of LightGBM and XGboost.

environment

The executed environment is as follows.


$sw_vers
ProductName:	Mac OS X
ProductVersion:	10.13.6
BuildVersion:	17G14042

I used a Jupyter Notebook.

The version of the notebook server is: 5.7.8
The server is running on this version of Python:
Python 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)]

Model building

I used the scikit-learn boston dataset for the data because the purpose is to visualize the tree structure. Since the data is 506 rows and 13 columns and all are null-free and float type, the model is built as it is without feature engineering.

import pandas as pd
import sklearn.datasets as skd

data = skd.load_boston()

df_X = pd.DataFrame(data.data, columns=data.feature_names)
df_y = pd.DataFrame(data.target, columns=['y'])

df_X.info()
#<class 'pandas.core.frame.DataFrame'>
#RangeIndex: 506 entries, 0 to 505
#Data columns (total 13 columns):
#CRIM       506 non-null float64
#ZN         506 non-null float64
#INDUS      506 non-null float64
#CHAS       506 non-null float64
#NOX        506 non-null float64
#RM         506 non-null float64
#AGE        506 non-null float64
#DIS        506 non-null float64
#RAD        506 non-null float64
#TAX        506 non-null float64
#PTRATIO    506 non-null float64
#B          506 non-null float64
#LSTAT      506 non-null float64
#dtypes: float64(13)
#memory usage: 51.5 KB

LightGBM model construction. Hyperparameters etc. use almost defaults.


import lightgbm as lgb
from sklearn.model_selection import train_test_split

df_X_train, df_X_test, df_y_train, df_y_test = train_test_split(df_X, df_y, test_size=0.2, random_state=4)

lgb_train = lgb.Dataset(df_X_train, df_y_train)
lgb_eval = lgb.Dataset(df_X_test, df_y_test)

params = {
    'seed':4,
    'metric':'rmse'}

lgbm = lgb.train(params,
                lgb_train,
                valid_sets=lgb_eval,
                num_boost_round=200,
                early_stopping_rounds=20,
                verbose_eval=50)

#Training until validation scores don't improve for 20 rounds
#[50]	valid_0's rmse: 3.58803
#[100]	valid_0's rmse: 3.39545
#[150]	valid_0's rmse: 3.31867
#[200]	valid_0's rmse: 3.28222
#Did not meet early stopping. Best iteration is:
#[192]	valid_0's rmse: 3.27283

Model construction of XGBoost.

import xgboost as xgb

dtrain = xgb.DMatrix(df_X_train, label=df_y_train)
dtest = xgb.DMatrix(df_X_test, label=df_y_test)

params = {
    'objective': 'reg:squarederror',
    'eval_metric': 'rmse',
}

evals = [(dtrain, 'train'), (dtest, 'eval')]
evals_result = {}

xgbm = xgb.train(params,
                dtrain,
                num_boost_round=200,
                early_stopping_rounds=20,
                evals=evals,
                evals_result=evals_result
                )

#(Omission)
#[80]	train-rmse:0.05752	eval-rmse:3.96890
#[81]	train-rmse:0.05430	eval-rmse:3.96797
#[82]	train-rmse:0.05194	eval-rmse:3.96835

dtreeviz

A tool that visualizes the decision tree of a decision tree algorithm. The installation method is described in Official Github, but when you execute `pip install dtreeviz```, xgboost is installed internally. , I got an error and could not install. So, referring to [here](https://qiita.com/TakuyaToda/items/b0f91617c253cd79da8e), I was able to install `` `xgboost successfully.

xgboostAfter installation ofpip install dtreevizIs executed again, and the installation of dtreeviz is completed successfully.

When trying to use dtreeviz for a LightGBM model,

from dtreeviz.trees import *

viz = dtreeviz(lgbm,
               x_data=df_X_train,
               y_data=df_y_train['y'],
               target_name='y',
               feature_names=df_X_train.columns.tolist(),
              tree_index=0)
              
viz.save('./lgb_tree.svg')    
 
#ValueError: Tree model must be in (DecisionTreeRegressor, DecisionTreeClassifier, xgboost.core.Booster, but was Booster      

So I found that dtreeviz cannot be used for LightGBM in the first place.

Let's take a second look and visualize it for the XGBoost model.

from dtreeviz.trees import *

viz = dtreeviz(xgbm,
               x_data=df_X_train,      # x_data is DataFrame
               y_data=df_y_train['y'], # y_data is Series
               target_name='y',
               feature_names=df_X_train.columns.tolist(), #List of String
              tree_index=0 ) #When using the ensemble model, an error will occur if index is not specified.
              

viz.save('./xgb_tree.svg')   

In dtreeviz, the output is in svg format, so I started Inkscape once and converted it to pdf. Inkscape is using version 1.0.1. For the installation, I referred to here. The results shown below were obtained.

xgb_tree.png

plot_tree (graphviz)

LightGBM and XGBoost have a function called plot_tree, which allows you to visualize the tree structure. Since graphviz is used internally, it needs to be installed. The installation method is described in here, brew install graphvizIt should be ok, but in my environment

Error: graphviz: no bottle available!
You can try to install from source with e.g.
  brew install --build-from-source graphviz
Please note building from source is unsupported. You will encounter build
failures with some formulae. If you experience any issues please create pull
requests instead of asking for help on Homebrew's GitHub, Twitter or any other
official channels.

I got an error and could not install, so as in the error message


$brew install --build-from-source graphviz

Execute. You have now installed. First, visualize the tree structure of LightGBM.


ax = lgb.plot_tree(gbm, tree_index=0, figsize=(20, 20), show_info=['split_gain'])
plt.show()
graph = lgb.create_tree_digraph(gbm, tree_index=0, format='png', name='Tree')
graph.render(view=True)

The obtained output is shown below.

Tree.gv.png

Next, let's visualize XGBoost.

ax = xgb.plot_tree(xgbm, num_trees=0, figsize=(20, 20))
plt.show()
graph = xgb.to_graphviz(xgbm, num_trees=0)
graph.render(view=True,format='png')

When executed, the figure below is obtained.

Source.gv.png

Summary

I tried dtreeviz and plot_tree against LightGBM and XGBoost. With dtreeviz, a scatter plot is displayed, so I thought it would be easy to imagine. However, it is a little inconvenient because the algorithms that can be used are limited.

Recommended Posts

LightGBM/XGBoost tree structure visualization memo
Visualization memo by Python
Visualization memo by pandas, seaborn