I used to use sklearn's pipeline from time to time, but when I set
pipeline.fit_transform (X, y), I was curious about what kind of processing was done in the pipeline, so the official document [^ 1] and I read the source code [^ 2] and decided to organize it.
In addition, the problem awareness that I had is described in the comment of the code below. Some people may think, "It's natural!", But I was really curious, so I looked it up.
#Problem awareness 1:Fit in the converter_transform,Fit is called in the estimator?? #Problem awareness 2:What should I do if I want to pass parameters to the converter or estimator at this timing? ?? #Problem awareness 3:What are the requirements to be met if you want to install your own estimator / converter??? pipe.fit(X, y) #Problem awareness 4:Fit in the converter_transform,The estimator calls predict?? pipe.predict(X)
When using an estimator that performs classification and regression in a machine learning project, a transformer is often used together. Pipeline is provided as a function that can integrate the processing from data conversion to learning / estimation as one estimator.
A pipeline consists of a list whose elements are tuples of (key, value). Pass the name of the estimator / converter in key and the object of estimator / converter in value as steps to pipeline. An example of use is shown below.
from sklearn.pipeline import Pipeline from sklearn.svm import SVC from sklearn.decomposition import PCA from sklearn import datasets #Preparation of sample data iris = datasets.load_iris() X, y = iris.data, iris.target #Creating a pipeline estimators = [('reduce_dim', PCA()), ('clf', SVC())] pipe = Pipeline(steps=estimators) #Learning pipe.fit(X, y) #Forecast pipe.predict(X)
You may want to put your own estimator / converter in the pipeline. Describe the requirements that must be met at that time. The requirements change at the end of the pipeline steps (final_estimator) or at the other end (not_final_estimator).
--final_estimator: Have a fit method --not_final_estimator: Have fit and transform methods, or have fit_transform methods
Depending on the method called by pipeline, the requirements will increase, but the minimum requirements to be met are above.
As shown in the code in 1.1., I checked the processing in the pipeline when calling pipeline.fit and pipeline.predict [^ 3]. The methods that will be used frequently in the pipeline are summarized below. From the left, the pipeline method, the parameters passed to it, the method called not_final_estimator, and the method called final_estimator.
|fit||X, y=None, **fit_params||fit_transform||fit|
|fit_transform||X, y=None, **fit_params||fit_transform||fit_transform|
|fit_predict||X, y=None, **fit_params||fit_transform||fit_predict|
|score||X, y=None, sample_weight=None||transform||score|
The points to be noted are listed below.
--If fit_transform method is not defined, fit method and transform method are executed in order.
--Unlike the fit_transform method, an error will occur if the fit_predict method is not defined.
-\ * \ * fit_params can be passed with
target step name (tuple key part) __ parameter name.
pipeline.fit (X, y, key1__param1 = True)
-Unlike \ * \ * fit_params, \ * \ * predict_params can only pass parameters to the predict method called final_estimator. As for the description method, just specify the parameter name in the predict method as it is.
pipeline.predict (X, param1 = True)
As an aside, the sklearn-compliant model should not be designed to accept parameters when the fit method is executed. Therefore, it is better to avoid passing parameters using \ * \ * fit_params as much as possible. The sklearn compliant model is described in detail in here.