[PYTHON] Understand the contents of sklearn's pipeline

What is this

I used to use sklearn's pipeline from time to time, but when I set pipeline.fit_transform (X, y), I was curious about what kind of processing was done in the pipeline, so the official document [^ 1] and I read the source code [^ 2] and decided to organize it.

In addition, the problem awareness that I had is described in the comment of the code below. Some people may think, "It's natural!", But I was really curious, so I looked it up.

#Problem awareness 1:Fit in the converter_transform,Fit is called in the estimator??
#Problem awareness 2:What should I do if I want to pass parameters to the converter or estimator at this timing? ??
#Problem awareness 3:What are the requirements to be met if you want to install your own estimator / converter???
pipe.fit(X, y)

#Problem awareness 4:Fit in the converter_transform,The estimator calls predict??

1. What is pipeline

When using an estimator that performs classification and regression in a machine learning project, a transformer is often used together. Pipeline is provided as a function that can integrate the processing from data conversion to learning / estimation as one estimator.

1.1. Example of using pipeline

A pipeline consists of a list whose elements are tuples of (key, value). Pass the name of the estimator / converter in key and the object of estimator / converter in value as steps to pipeline. An example of use is shown below.

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn import datasets

#Preparation of sample data
iris = datasets.load_iris()
X, y = iris.data, iris.target

#Creating a pipeline
estimators = [('reduce_dim', PCA()), ('clf', SVC())]
pipe = Pipeline(steps=estimators)

pipe.fit(X, y)


2. Estimator / converter requirements

You may want to put your own estimator / converter in the pipeline. Describe the requirements that must be met at that time. The requirements change at the end of the pipeline steps (final_estimator) or at the other end (not_final_estimator).

--final_estimator: Have a fit method --not_final_estimator: Have fit and transform methods, or have fit_transform methods

Depending on the method called by pipeline, the requirements will increase, but the minimum requirements to be met are above.

3. Processing in pipeline

As shown in the code in 1.1., I checked the processing in the pipeline when calling pipeline.fit and pipeline.predict [^ 3]. The methods that will be used frequently in the pipeline are summarized below. From the left, the pipeline method, the parameters passed to it, the method called not_final_estimator, and the method called final_estimator.

pipeline Parameters not_final_estimator final_estimator
fit X, y=None, **fit_params fit_transform fit
fit_transform X, y=None, **fit_params fit_transform fit_transform
predict X, **predict_params transform predict
fit_predict X, y=None, **fit_params fit_transform fit_predict
score X, y=None, sample_weight=None transform score

The points to be noted are listed below.

--If fit_transform method is not defined, fit method and transform method are executed in order. --Unlike the fit_transform method, an error will occur if the fit_predict method is not defined. -\ * \ * fit_params can be passed with target step name (tuple key part) __ parameter name. --Example: pipeline.fit (X, y, key1__param1 = True) -Unlike \ * \ * fit_params, \ * \ * predict_params can only pass parameters to the predict method called final_estimator. As for the description method, just specify the parameter name in the predict method as it is. --Example: pipeline.predict (X, param1 = True)

As an aside, the sklearn-compliant model should not be designed to accept parameters when the fit method is executed. Therefore, it is better to avoid passing parameters using \ * \ * fit_params as much as possible. The sklearn compliant model is described in detail in here.

[^ 1]: User Guide [^ 2]: Source code [^ 3]: pipeline documentation

Recommended Posts

Understand the contents of sklearn's pipeline
Simulation of the contents of the wallet
See the contents of Kumantic Segumantion
I checked the contents of docker volume
Understand the benefits of the Django Rest Framework
[Python3] Understand the basics of Beautiful Soup
[Python] Understand the content of error messages
Understand the "temporary" part of UNIX / Linux
Read all the contents of proc / [pid]
[Python3] Understand the basics of file operations
Contents of __name__
The contents of the Python tutorial (Chapter 5) are itemized.
The contents of the Python tutorial (Chapter 4) are itemized.
The contents of the Python tutorial (Chapter 2) are itemized.
The contents of the Python tutorial (Chapter 8) are itemized.
The contents of the Python tutorial (Chapter 1) are itemized.
The contents of the Python tutorial (Chapter 10) are itemized.
About the development contents of machine learning (Example)
Dump the contents of redis db with lua
The contents of the Python tutorial (Chapter 6) are itemized.
The contents of the Python tutorial (Chapter 3) are itemized.
The beginning of cif2cell
Template of python script to read the contents of the file
A memo to visually understand the axis of pandas.Panel
The meaning of self
Obtained contents of sosreport
ML Pipeline: Highlights the Challenge of Manual Feature Extraction
[Statistics] Understand the mechanism of Q-Q plot by animation.
the zen of Python
The story of sys.path.append ()
I want to fully understand the basics of Bokeh
Not being aware of the contents of the data in python
Try to get the contents of Word with Golang
[Note] Contents of shape [0], shape [1], shape [2]
[Ev3dev] Let's understand the mechanism of LCD (screen) control
[Maya Python] Crush the contents of the script 2 ~ list Notes
14 quizzes to understand the surprisingly confusing scope of Python
Understand the status of data loss --Python vs. R
Also read the contents of arch / arm / kernel / swp_emulate.c
Understand the attributes of Linux files (ls -l command)
Revenge of the Types: Revenge of types
I searched for the contents of CloudWatch Logs Agent
[Ubuntu] How to delete the entire contents of a directory
Understand the process of merge sort. Finely disassemble following the flow.
Django returns the contents of the file as an HTTP response
Analyzing user dissatisfaction very easily from the contents of inquiries
Make the display of Python module exceptions easier to understand
[Maya Python] Crush the contents of the script 3 ~ List unknown Plugins
[Maya Python] Crush the contents of the script 1 ~ Camera Speed Editor
Settings to debug the contents of the library with VS Code
View the contents of the queue using the RabbitMQ Management Web API
How to see the contents of the Jupyter notebook ipynb file
[Data science memorandum] Confirmation of the contents of DataFrame type [python]
A Python script that compares the contents of two directories
Wagtail Recommendations (3) Understand and use the tree structure of pages
How to connect the contents of a list into a string
Align the version of chromedriver_binary
Scraping the result of "Schedule-kun"
Understand the tensor product (numpy.tensordot)
10. Counting the number of lines
The story of building Zabbix 4.4