[PYTHON] Creating a decision tree with scikit-learn

I was making a decision tree using Weka, but it was troublesome to make the data arff. I created a decision tree with the Python machine learning library scikit-learn. I wanted to use it Installation of scikit-learn will be politely taught on other web sites

The decision tree object itself is fairly easy to do. Install Graphviz with brew (Mac) Because the library called pyparsing has been updated When you want to draw

sudo pip install -U pydot pyparsing==1.5.7

Please downgrade I don't understand Windows (low voice)

tree_ex.py


#-*-coding:utf-8 -*-

#Null value cannot be used → What should I do?
# yes,no is 1,-1
#Characters cannot be used
from sklearn import tree
from sklearn.externals.six import StringIO
import pydot

if __name__ == '__main__':

    X = [
        [0,1],
        [0,-1],
        [1,1]
        ]
    Y = [1,2,3] #Corresponds in order from the top
    clf = tree.DecisionTreeClassifier()
    clf = clf.fit(X,Y) #This completes the decision tree object

    #Magic for drawing
    dot_data = StringIO()
    tree.export_graphviz(clf,out_file = dot_data)
    graph = pydot.graph_from_dot_data(dot_data.getvalue())
    graph.write_pdf("tree_ex.pdf")
    
    #pre = clf.predict([0,1])
    #print pre #The result is 1

X is the data and Y is the label for each data. Mass docking X and Y with fit function (maybe)

Since clf is a decision tree object and classifier when the fit function is applied. You can classify which class the new data belongs to with the commented out predict function.

After that, it should be a magic to call pydot and draw.

The above drawing result looks like this

tree_ex.png

Weka's decision tree is hard to see and I tried to make a decision tree in Python. Creating a decision tree itself is very easy. It's easy to see

・ No branching conditions (cannot be issued due to lack of ability)

・ Questionnaires with 1, 2 or 3 types of answers in one item cannot be sorted (yes and no can be realized with [1, -1])

・ Null value ・ Do not accept character strings

For now, I feel that Weka is easier to use. Should I add an argument option? .. ..

Recommended Posts

Creating a decision tree with scikit-learn
Create a decision tree from 0 with Python (1. Overview)
What is a decision tree?
Creating a Flask server with Docker
Creating a simple app with flask
Creating a simple PowerPoint file with Python
Creating a login screen with Django allauth
Implement a minimal self-made estimator with scikit-learn
Visualize scikit-learn decision trees with Plotly's Treemap
2. Make a decision tree from 0 with Python and understand it (2. Python program basics)
Isomap with Scikit-learn
Decision tree (classification)
DBSCAN with scikit-learn
Clustering with scikit-learn (1)
Clustering with scikit-learn (2)
PCA with Scikit-learn
Make a decision tree from 0 with Python and understand it (4. Data structure)
kmeans ++ with scikit-learn
Create a decision tree from 0 with Python and understand it (5. Information Entropy)
[Piyopiyokai # 1] Let's play with Lambda: Creating a Lambda function
Procedure for creating a LineBot made with Python
A memo when creating a python environment with miniconda
Commands for creating a python3 environment with virtualenv
Flow of creating a virtual environment with Anaconda
Try creating a FizzBuzz problem with a shell program
[Grasshopper] When creating a data tree on Python script
Cross Validation with scikit-learn
[Day 9] Creating a model
Looking back on creating a web service with Django 1
A4 size with python-pptx
Multi-class SVM with scikit-learn
I made a Christmas tree lighting game with Python
Clustering with scikit-learn + DBSCAN
How to visualize the decision tree model of scikit-learn
Learn with chemoinformatics scikit-learn
Drawing a tree structure with D3.js in Jupyter Notebook
Problems when creating a csv-json conversion tool with python
Creating a scraping tool
Looking back on creating a web service with Django 2
2. Multivariate analysis spelled out in Python 7-1. Decision tree (scikit-learn)
Machine learning beginners try to make a decision tree
Creating a dataset loader
DBSCAN (clustering) with scikit-learn
Current directory when creating a new one with Jupyter
Notes on creating a virtual environment with Anaconda Navigator
Decorate with a decorator
[Piyopiyokai # 1] Let's play with Lambda: Creating a Python script
Install scikit.learn with pip
Calculate tf-idf with scikit-learn
Create a decision tree from 0 with Python and understand it (3. Data analysis library Pandas edition)
Visualize the results of decision trees performed with Python scikit-learn
The first step to creating a serverless application with Zappa
Perform (Visualization> Clustering> Feature Description) with (t-SNE, DBSCAN, Decision Tree)
Creating a GUI as easily as possible with python [tkinter edition]
Scikit-learn decision Generate Python code from tree / random forest rules
Creating a temperature / humidity monitor with Raspberry Pi (pigpio version)
(For beginners) Try creating a simple web API with Django
Learn librosa with a tutorial 1
Draw a graph with NetworkX
Neural network with Python (scikit-learn)
Try programming with a shell!