[PYTHON] How to interactively draw a machine learning pipeline with scikit-learn and save it in HTML

In this article, I will explain the implementation of interactive pipeline confirmation installed from v0.23 of scikit-learn, and how to save and utilize it as HTML.

environment

The implementation code for this article is here https://github.com/YutaroOgawa/Qiita/tree/master/sklearn

Implementation

[1] Version update

First, the version of scikit-learn of Google Colaboratory is v0.22 in September 2020, so update it to v0.23.

!pip install scikit-learn==0.23.2

After updating with pip, execute "Runtime"-> "Restart Runtime" of Google Colaboratory, Restart the runtime. (This is the new v0.23 that scikit-learn put in with pip)

[2] Pipeline construction

For example, we combined preprocessing and machine learning models as follows: Build a ** machine learning pipeline **.

[Perform necessary import]

python from sklearn.pipeline import make_pipeline from sklearn.preprocessing import OneHotEncoder, StandardScaler from sklearn.impute import SimpleImputer from sklearn.compose import make_column_transformer from sklearn.linear_model import LogisticRegression


[Build a pipeline]

```python```
#Preprocessing of numerical data (median missing value complemented and standardized)
num_proc = make_pipeline(SimpleImputer(strategy='median'), StandardScaler())

#Preprocessing of category data (for missing values"misssing"Substitution completion, one-hot encoding)
cat_proc = make_pipeline(
    SimpleImputer(strategy='constant', fill_value='missing'),
    OneHotEncoder(handle_unknown='ignore'))

#Create preprocessing class
preprocessor = make_column_transformer((num_proc, ('feat1', 'feat3')),
                                       (cat_proc, ('feat0', 'feat2')))

#Combine preprocessing and machine learning models into one pipeline
clf = make_pipeline(preprocessor, LogisticRegression())

[3] Interactively visualize the pipeline

To interactively visualize the pipeline, it ’s simple,

sklearn.set_config(display="diagram")

Just add.

[Interactive visualization]

python

Settings to display the pipeline

from sklearn import set_config

set_config(display="diagram")

drawing

clf


Then, the pipeline will be drawn in the result column of JupyterNotebook (Google Colabortory) as shown below.
![pipe_sklearn3.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/191401/b43a1142-4172-94c6-42e7-fa93445ee81c.png)


Click each element in the diagram of this pipeline to
The image changes interactively, displaying the advanced settings for that element.
(The figure below shows the detailed confirmation of the missing value processing method for column preprocessing: pipeline.-Click 2 Simple Impactor)

![pipe_sklearn.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/191401/0ab138e6-8ade-33af-ca56-68291d827465.png)


##How to save the pipeline as HTML

As you mentioned in the comments, you can save this interactive pipeline as HTML.

"It's a little if it only works on Jupyter Notebook ..."

I thought, so it's very nice information.

@Thank you to DataSkywalker.

Finally, as an implementation,

```python
from sklearn.utils import estimator_html_repr

with open('my_estimator.html', 'w') as f:  
    f.write(estimator_html_repr(clf))

To execute. Then my_estimator.The HTML for the interactive pipeline is saved as html.

With Google Colaboratory

# Download from Google Colaboratory
from google.colab import files
files.download('my_estimator.html')

By running my_estimator.You can download html (The HTML file included CSS style and was about 300 lines).

As a material to explain the pipeline interactively It seems that you can paste HTML into documents etc.

You can put it in the md file as a link, or you can forcibly convert the md file to html and then combine it. (It is difficult to read html as it is into an md file ...?)

document_example.gif

All the files around here are also placed here https://github.com/YutaroOgawa/Qiita/tree/master/sklearn

##Summary

scikit-Learn version v0.23 or above sklearn.set_config(display="diagram")Just add You can interactively visualize (and save as HTML) your pipeline.

Please try it ♪


###Remarks

**【Writer】**Dentsu International Information Services (ISID)AI Transformation CenterDevelopment Gr Yutaro Ogawa (main book)"Learn while making!Development Deep Learning by PyTorch ",Other"Detailsofself-introduction"

【Twitter】 Focusing on IT / AI-related and business / management, I send out articles that I find interesting and impressions of new books that I recently read. If you want to collect information on these fields, please follow us ♪ (There is a lot of overseas information)

Yutaro Ogawa@ISID_AI_team

[Other] The "AI Transformation Center Development Team" that I lead is looking for members. If you are interested,This pageWe are looking forward to your application.

[Sokumen-kun] If you want to apply suddenly, we will have a casual interview with "Sokumen-kun". Please use this as well ♪ https://sokumenkun.com/2020/08/17/yutaro-ogawa/

[Disclaimer] The content of this article is the opinion of the author./It is a transmission, not an official view of the company to which the author belongs.


(reference) https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_0_23_0.html https://towardsdatascience.com/9-things-you-should-know-about-scikit-learn-0-23-9426d8e1772c

Recommended Posts

How to interactively draw a machine learning pipeline with scikit-learn and save it in HTML
How to build Anaconda virtual environment used in Azure Machine Learning and link with Jupyter
How to create a serverless machine learning API with AWS Lambda
scikit-learn How to use summary (machine learning)
How to make a container name a subdomain and make it accessible in Docker
[Python] How to save the installed package and install it in a new environment at once Mac environment
How to draw a 2-axis graph with pyplot
[Python] How to draw a histogram in Matplotlib
How to read a serial number file in a loop, process it, and graph it
How to generate a QR code and barcode in Python and read it normally or in real time with OpenCV
How to save the feature point information of an image in a file and use it for matching
How to convert / restore a string with [] in python
[Python] How to draw a line graph with Matplotlib
Run a machine learning pipeline with Cloud Dataflow (Python)
Easy machine learning with scikit-learn and flask ✕ Web app
How to use Decorator in Django and how to make it
Practical machine learning with Scikit-Learn and TensorFlow-TensorFlow gave up-
[Python] How to draw a scatter plot with Matplotlib
[TF] How to save and load Tensorflow learning parameters
How to set up a Google Colab environment with Coursera's advanced machine learning courses
Draw a weakness graph in Python and save it in various formats (Raspberry Pi, macOS)
How to split machine learning training data into objective variables and others in Pandas
How to quickly create a machine learning environment using Jupyter Notebook with UbuntuServer 16.04 LTS
Until you create a machine learning environment with Python on Windows 7 and run it
How to adapt multiple machine learning libraries in one shot
[TensorFlow 2 / Keras] How to run learning with CTC Loss in Keras
How to quickly create a machine learning environment using Jupyter Notebook with UbuntuServer 16.04 LTS with anaconda
Temporarily save a Python object and reuse it in another Python
How to output a document in pdf format with Sphinx
[Google Colab] How to interrupt learning and then resume it
Recursively get the Excel list in a specific folder with python and write it to Excel.
Return the image data with Flask of Python and draw it to the canvas element of HTML
Machine learning to learn with Nogizaka46 and Keyakizaka46 Part 1 Introduction
[TF] How to load / save Model and Parameter in Keras
How to create dataframes and mess with elements in pandas
[Python] What is a tuple? Explains how to use without tuples and how to use it with examples.
How to log in to AtCoder with Python and submit automatically
[Python] How to scrape a local html file and output it as CSV using Beautiful Soup
How to quickly create a machine learning environment using Jupyter Notebook on macOS Sierra with anaconda
How to set a shortcut to switch full-width and half-width with IBus
How to install OpenCV on Cloud9 and run it in Python
How to compare lists and retrieve common elements in a list
How about Anaconda for building a machine learning environment in Python?
How to make a surveillance camera (Security Camera) with Opencv and Python
How to create a heatmap with an arbitrary domain in Python
Draw a watercolor illusion with edge detection in Python3 and openCV3
I want to create a pipfile and reflect it in docker
[Reading Notes] Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow Chapter 1
People memorize learned knowledge in the brain, how to memorize learned knowledge in machine learning
Prepare a machine learning project format and run it on SageMaker
Classification and regression in machine learning
Try machine learning with scikit-learn SVM
How to collect machine learning data
How to make a face image data set used in machine learning (3: Face image generation from candidate images Part 1)
Beginners want to make something like a Rubik's cube with UE4 and make it a library for reinforcement learning # 4
(Machine learning) I tried to understand the EM algorithm in a mixed Gaussian distribution carefully with implementation.
I wrote a book that allows you to learn machine learning implementations and algorithms in a well-balanced manner.
How to input a character string in Python and output it as it is or in the opposite direction.
Beginners want to make something like a Rubik's cube with UE4 and make it a library for reinforcement learning # 5
Beginners want to make something like a Rubik's cube with UE4 and make it a library for reinforcement learning # 6
Coursera Machine Learning Challenges in Python: ex6 (How to Adjust SVM Parameters)