[PYTHON] Using Graphviz with Jupyter Notebook

at first

In this article, the author who has never used Graphviz uses ** Graphviz ** in ** jupyter-notebook **. By the way, there are various information on the net such as how to use Graphviz. For example

Or something. You can use it regardless of which of the above two articles you set. If you look at the site and feel that this is okay, it's all right. This time, I will mix the two articles a little to create an environment.

What is Graphviz

First, let's take a brief look at Graphviz. I think it is a summary of the following sites.

Graphviz is an abbreviation for Graph Visualization Software, a tool for creating graphs. A text file written using a data description language called the dot language can be converted and output as an image file. It is also used when drawing decision trees for machine learning. Various platforms (Windows, Mac, Linux) are available. By the way, dot seems to be a program that draws directed graphs.             fig01.png

Source: Statistical text analysis (6) -Word network analysis-

The figure above is the difference between a directed graph and an undirected graph. To put it in words, is there a specific direction from one vertex to another? Is there an arrow in the figure above? In other words, the relationship between each vertex is fixed or not. For example, a directed graph is a hyperlink, and an undirected graph is a train route map. I can't go back to the original page when I click the hyperlink (don't think about the back button lol) </ font> can do.

Graphviz installation

There are two installation methods I tried this time. By the way, when I went to the download page of the official website, it was different from the one introduced on the above site.

  • How to download from the official website
  • How to drop the zip file

These two methods.

Download from the official website

First, click the link on the official website of here. Then click download image.png Click Graphviz Windows packages under Windows image.png Since it will jump to the github page, select the file "2.38 yaml" If you search the link attached to the URL, it will be downloaded without permission. image.png After that, start the installer and start the installation. You may refer to the here site. Please judge whether to check "everyone" by the number of accounts on your computer.

Drop the zip file

You can drop the zip file from the here site. Once dropped, all you have to do is unzip the file.

Implementation

This time, we will visualize the graph of the decision tree using the surviving dataset of the Titanic. This code is [here](https://qiita.com/5sigma_AAA/items/0c23907da9330681147b#%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92%E3%81% AE% E5% AE% 9F% E8% A3% 85) is referred to the code. I will write it separately in a jupyter style.

First, load the required libraries. (The user part is your account name)

from sklearn.tree import DecisionTreeClassifier 
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc, accuracy_score

import pandas as pd

from sklearn.tree import export_graphviz
import pydotplus 
from io import StringIO
from IPython.display import Image

#Data reading
train = pd.read_csv("/Users/user/jupyter/train.csv") 

Basically, it seems that general data sets have missing values, so check

train.isnull().sum()

PassengerId 0 Survived 0 Pclass 0 Name 0 Sex 0 Age 177 SibSp 0 Parch 0 Ticket 0 Fare 0 Cabin 687 Embarked 2

There seems to be missing values in Age, Cabin, and Embarked. I will ignore Cabin because I will not use it this time. The missing value of Age complements the average of the entire Age, and Embarked complements the mode "S".

train["Age"] = train["Age"].fillna(train["Age"].median())
train["Embarked"] = train["Embarked"].fillna("S")

When learning, we have to make it an int type, so we convert the strings'Sex' and'Embarked' to numbers. This time, the numbers are assigned manually, but it can also be converted using dummy variables.

#Conversion of categorical variables
train['Sex'] = train['Sex'].apply(lambda x: 1 if x == 'male' else 0)
train['Embarked'] = train['Embarked'].map( {'S': 0 , 'C':1 , 'Q':2}).astype(int)

Delete unused classes.

train = train.drop(['Cabin','Name','PassengerId','Ticket'],axis =1)

Since the data preprocessing is completed, check once for missing values.

train.isnull().sum()

Survived 0 Pclass 0 Sex 0 Age 0 SibSp 0 Parch 0 Fare 0 Embarked 0

I was able to eliminate missing values. Next, we will separate the data set for train and the data set for test. The distribution of the test images was 30% of the total.

#Divided into training data and test data
train_X = train.drop('Survived',axis = 1)
train_y = train.Survived
(train_X , test_X , train_y , test_y) = train_test_split(train_X, train_y , test_size = 0.3 , random_state = 0)

Build a model and train. The depth is 3 layers.

#train
model = DecisionTreeClassifier(max_depth=3,random_state = 0)
model.fit(train_X , train_y)

#accuracy
pred = model.predict(test_X)
fpr, tpr, thresholds = roc_curve(test_y , pred,pos_label = 1)
auc(fpr,tpr)
print("accuracy=",accuracy_score(pred,test_y)

accuracy=0.8208955223880597

From here, I will draw with Graphviz.

#Process to treat the character string like a file object
dot_data = StringIO() 

export_graphviz( 
    model, out_file=dot_data, 
    feature_names=train_X.columns,
    class_names=["Death", "Survival"]
) 

graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) 
#Specify the absolute path of the directory where you downloaded graphviz
graph.progs = {'dot': u"C:\\Users\\user\\anaconda3\\bin\\release\\bin\\dot.exe"}
#Visualize in notebook
Image(graph.create_png()) 

image.png

I was able to draw a decision tree.

Finally

This time I used jupyter, but of course it is possible with other IDEs. By the way, the hardest part was finding the installation file from the official website.

Recommended Posts

Using Graphviz with Jupyter Notebook
Try using conda virtual environment with Jupyter Notebook
Use pip with Jupyter Notebook
Try using Jupyter Notebook dynamically
Use Cython with Jupyter Notebook
Play with Jupyter Notebook (IPython Notebook)
Allow external connections with jupyter notebook
Formatting with autopep8 on Jupyter notebook
Visualize decision trees with jupyter notebook
Make a sound with Jupyter notebook
Use markdown with jupyter notebook (with shortcut)
Add more kernels with Jupyter Notebook
Convenient analysis with Pandas + Jupyter notebook
Somehow I tried using jupyter notebook
Use nb extensions with Anaconda's Jupyter notebook
Jupyter Notebook memo
Introducing Jupyter Notebook
Use apache Spark with jupyter notebook (IPython notebook)
I want to blog with Jupyter Notebook
Use Jupyter Lab and Jupyter Notebook with EC2
Try SVM with scikit-learn on Jupyter Notebook
Powerful Jupyter Notebook
How to use jupyter notebook with ABCI
Linking python and JavaScript with jupyter notebook
Using Java's Jupyter Kernel with Google Colaboratory
Jupyter notebook password
Jupyter Notebook memo
[Jupyter Notebook memo] Display kanji with matplotlib
Rich cell output with Jupyter Notebook (IPython)
How to debug with Jupyter or iPython Notebook
When Html cannot be output with Jupyter Notebook
Data analysis for improving POG 2 ~ Analysis with jupyter notebook ~
Using MLflow with Databricks ① --Experimental tracking on notebook -
Verify NLC accuracy with Watson Studio's Jupyter Notebook
Enable Jupyter Notebook with conda on remote server
[Pythonocc] I tried using CAD on jupyter notebook
Fill the browser with the width of Jupyter Notebook
Settings when using Jupyter Notebook under Proxy server
Proxy settings when using pip or Jupyter Notebook
Graph drawing with jupyter (ipython notebook) + matplotlib + vagrant
Try using Jupyter Notebook of Azure Machine Learning
Multiple selections with Jupyter
Candlestick with plotly + Jupyter
Get started Jupyter Notebook
3 Jupyter notebook (Python) tricks
I tried using Jupyter
Use nim with Jupyter
[Cloud103] # 3 Jupyter Notebook again
Virtual environment construction with Docker + Flask (Python) + Jupyter notebook
Easy Machine Learning with AutoAI (Part 4) Jupyter Notebook Edition
Monitor the training model with TensorBord on Jupyter Notebook
Drawing a tree structure with D3.js in Jupyter Notebook
Import specific cells from other notebooks with Jupyter notebook
EC2 provisioning with Vagrant + Jupyter (IPython Notebook) on Docker
Easily prepare a Jupyter Notebook environment with Docker (Tensorflow and Graphviz are also available)
How to quickly create a machine learning environment using Jupyter Notebook with UbuntuServer 16.04 LTS
Using X11 with ubuntu18.04 (C)
Embed audio data with Jupyter
When using optparse with iPython
Parallel computing with iPython notebook
Try using PythonTex with Texpad.