With make_pipeline that appeared in Code of 1st place in Mercari competition I didn't really understand the Function Transformer.
Make_pipeline → Convert code such as [preprocessing + learning + estimation] into one estimator. Code reduction is possible.
Function Transformer → Convert any function to a transformer. Because the argument of Pipeline needs to be a transformer. The minimum requirement for any function is that fit and transform exist.
In the example below, SVC is executed after PCA () is performed. Preprocessing and classification can be executed in a series of operations.
from sklearn.pipeline import Pipeline from sklearn.svm import SVC from sklearn.decomposition import PCA from sklearn import datasets #Preparation of sample data iris = datasets.load_iris() X, y = iris.data, iris.target #Creating a pipeline estimators = [('reduce_dim', PCA()), ('clf', SVC())] pipe = Pipeline(steps=estimators) #Learning pipe.fit(X, y) #Forecast pipe.predict(X)
Partial excerpt from Mercari Competition 1st Code
from sklearn.pipeline import make_pipeline, make_union, Pipeline from sklearn.feature_extraction.text import TfidfVectorizer as Tfidf def on_field(f: str, *vec) -> Pipeline: return make_pipeline(FunctionTransformer(itemgetter(f), validate=False), *vec) vectorizer = make_union( on_field('name', Tfidf(max_features=100000, token_pattern='\w+')), on_field('text', Tfidf(max_features=100000, token_pattern='\w+', ngram_range=(1, 2))), on_field(['shipping', 'item_condition_id'], FunctionTransformer(to_records, validate=False), DictVectorizer()), n_jobs=4)
I'm pipelined instances of itemgetter and Tfidf with make_pipeline. I am creating my own converter by converting itemxetter to a transformer with FunctionTransformer. This makes it possible to identify important character strings in itemgetter (extracting character strings) in a series of steps. Click here for item getter