A reminder when I tried dtreeviz and plot_tree because I wanted to visualize the tree structure of LightGBM and XGboost.
The executed environment is as follows.
$sw_vers
ProductName: Mac OS X
ProductVersion: 10.13.6
BuildVersion: 17G14042
I used a Jupyter Notebook.
The version of the notebook server is: 5.7.8
The server is running on this version of Python:
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)]
I used the scikit-learn boston dataset for the data because the purpose is to visualize the tree structure. Since the data is 506 rows and 13 columns and all are null-free and float type, the model is built as it is without feature engineering.
import pandas as pd
import sklearn.datasets as skd
data = skd.load_boston()
df_X = pd.DataFrame(data.data, columns=data.feature_names)
df_y = pd.DataFrame(data.target, columns=['y'])
df_X.info()
#<class 'pandas.core.frame.DataFrame'>
#RangeIndex: 506 entries, 0 to 505
#Data columns (total 13 columns):
#CRIM 506 non-null float64
#ZN 506 non-null float64
#INDUS 506 non-null float64
#CHAS 506 non-null float64
#NOX 506 non-null float64
#RM 506 non-null float64
#AGE 506 non-null float64
#DIS 506 non-null float64
#RAD 506 non-null float64
#TAX 506 non-null float64
#PTRATIO 506 non-null float64
#B 506 non-null float64
#LSTAT 506 non-null float64
#dtypes: float64(13)
#memory usage: 51.5 KB
LightGBM model construction. Hyperparameters etc. use almost defaults.
import lightgbm as lgb
from sklearn.model_selection import train_test_split
df_X_train, df_X_test, df_y_train, df_y_test = train_test_split(df_X, df_y, test_size=0.2, random_state=4)
lgb_train = lgb.Dataset(df_X_train, df_y_train)
lgb_eval = lgb.Dataset(df_X_test, df_y_test)
params = {
'seed':4,
'metric':'rmse'}
lgbm = lgb.train(params,
lgb_train,
valid_sets=lgb_eval,
num_boost_round=200,
early_stopping_rounds=20,
verbose_eval=50)
#Training until validation scores don't improve for 20 rounds
#[50] valid_0's rmse: 3.58803
#[100] valid_0's rmse: 3.39545
#[150] valid_0's rmse: 3.31867
#[200] valid_0's rmse: 3.28222
#Did not meet early stopping. Best iteration is:
#[192] valid_0's rmse: 3.27283
Model construction of XGBoost.
import xgboost as xgb
dtrain = xgb.DMatrix(df_X_train, label=df_y_train)
dtest = xgb.DMatrix(df_X_test, label=df_y_test)
params = {
'objective': 'reg:squarederror',
'eval_metric': 'rmse',
}
evals = [(dtrain, 'train'), (dtest, 'eval')]
evals_result = {}
xgbm = xgb.train(params,
dtrain,
num_boost_round=200,
early_stopping_rounds=20,
evals=evals,
evals_result=evals_result
)
#(Omission)
#[80] train-rmse:0.05752 eval-rmse:3.96890
#[81] train-rmse:0.05430 eval-rmse:3.96797
#[82] train-rmse:0.05194 eval-rmse:3.96835
dtreeviz
A tool that visualizes the decision tree of a decision tree algorithm.
The installation method is described in Official Github, but when you execute `pip install dtreeviz```,
xgboost is installed internally. , I got an error and could not install. So, referring to [here](https://qiita.com/TakuyaToda/items/b0f91617c253cd79da8e), I was able to install `` `xgboost
successfully.
xgboost
After installation ofpip install dtreeviz
Is executed again, and the installation of dtreeviz is completed successfully.
When trying to use dtreeviz for a LightGBM model,
from dtreeviz.trees import *
viz = dtreeviz(lgbm,
x_data=df_X_train,
y_data=df_y_train['y'],
target_name='y',
feature_names=df_X_train.columns.tolist(),
tree_index=0)
viz.save('./lgb_tree.svg')
#ValueError: Tree model must be in (DecisionTreeRegressor, DecisionTreeClassifier, xgboost.core.Booster, but was Booster
So I found that dtreeviz cannot be used for LightGBM in the first place.
Let's take a second look and visualize it for the XGBoost model.
from dtreeviz.trees import *
viz = dtreeviz(xgbm,
x_data=df_X_train, # x_data is DataFrame
y_data=df_y_train['y'], # y_data is Series
target_name='y',
feature_names=df_X_train.columns.tolist(), #List of String
tree_index=0 ) #When using the ensemble model, an error will occur if index is not specified.
viz.save('./xgb_tree.svg')
In dtreeviz, the output is in svg format, so I started Inkscape once and converted it to pdf. Inkscape is using version 1.0.1. For the installation, I referred to here. The results shown below were obtained.
plot_tree (graphviz)
LightGBM and XGBoost have a function called plot_tree, which allows you to visualize the tree structure.
Since graphviz is used internally, it needs to be installed.
The installation method is described in here,
brew install graphviz
It should be ok, but in my environment
Error: graphviz: no bottle available!
You can try to install from source with e.g.
brew install --build-from-source graphviz
Please note building from source is unsupported. You will encounter build
failures with some formulae. If you experience any issues please create pull
requests instead of asking for help on Homebrew's GitHub, Twitter or any other
official channels.
I got an error and could not install, so as in the error message
$brew install --build-from-source graphviz
Execute. You have now installed. First, visualize the tree structure of LightGBM.
ax = lgb.plot_tree(gbm, tree_index=0, figsize=(20, 20), show_info=['split_gain'])
plt.show()
graph = lgb.create_tree_digraph(gbm, tree_index=0, format='png', name='Tree')
graph.render(view=True)
The obtained output is shown below.
Next, let's visualize XGBoost.
ax = xgb.plot_tree(xgbm, num_trees=0, figsize=(20, 20))
plt.show()
graph = xgb.to_graphviz(xgbm, num_trees=0)
graph.render(view=True,format='png')
When executed, the figure below is obtained.
I tried dtreeviz and plot_tree against LightGBM and XGBoost. With dtreeviz, a scatter plot is displayed, so I thought it would be easy to imagine. However, it is a little inconvenient because the algorithms that can be used are limited.