[PYTHON] Basic machine learning procedure: ④ Classifier learning + ensemble learning

Introduction

Basic machine learning procedure: (1) Classification model organizes the procedure for creating a basic classification model. This time, I would like to focus on the learning of classifiers and realize the selection of classifiers and ensemble learning.

Procedure so far

-Basic machine learning procedure: ① Classification model -Basic machine learning procedure: ② Prepare data -Basic machine learning procedure: ③Compare and examine feature selection methods

Analytical environment

Google BigQuery Google Colaboratory

Target data

(1) Similar to the classification model, purchase data is stored in the following table structure.

id result product1 product2 product3 product4 product5
001 1 2500 1200 1890 530 null
002 0 750 3300 null 1250 2000

0. Target classifier

At first, I tried to compare the performance of classifiers, but it is difficult to decide that this is absolute. I think that it is important for each to have its own characteristics and to make the best use of those characteristics, so I am learning the following four classifiers for the time being.

--RandomForestClassifier: Random forest (light, fast, accurate to some extent) --LogisticRegression: Logistic regression (this is included because it is a zero-ichi classification) --KNeighborsClassifier: k-nearest neighbor method (easy to understand with a simple model) --LGBMClassifier: LightGBM (Recent trend. Increased accuracy)

1. Learning of each classifier

Prepare the classifier specified above. The list is made in the order of name, classifier, and parameter. Weight, which we will use later, is the weight of ensemble learning.

#Classifier

from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from lightgbm import LGBMClassifier

model_names=["RandomForestClassifier", "LogisticRegression", "KNeighborsClassifier", "LGBMClassifier"]
estimators=[RandomForestClassifier(), LogisticRegression(), KNeighborsClassifier(), LGBMClassifier()]
parameters=[
  {
    'n_estimators': [5, 10, 50, 100, 300],
    'max_depth': [5, 10, 20],
  },
  {
    'C': list(np.logspace(0, 4, 10))
  },
  {
    'weights': ['uniform','distance'],
    'n_neighbors': [3,5,10,20,30,50],
  },
  {
    'objective': ['binary'],
    'learning_rate': [0.01, 0.03, 0.05], 
    'n_estimators': [100, 150, 200], 
    'max_depth':[4, 6, 8]      
  }
]

weights=[1,1,1,1]

We will run and train the individual models defined above. Here, it takes a long time to execute serially one by one, so refer to "Parallel processing with parallelel of scikit-learn" and parallel I'm running.

from sklearn.model_selection import GridSearchCV
from sklearn.externals.joblib import Parallel, delayed

models = []

def tuneParams(n):

  estimator = estimators[n]
  param = parameters[n]
  
  clf = GridSearchCV(
      estimator,
      param_grid=param,
      cv=5
      )
  
  clf = clf.fit(train_features, train_target)
  model = clf.best_estimator_

  return model

model = Parallel(n_jobs=-1)( delayed(tuneParams)(n) for n in range(len(estimators)) )
models.append(model)

You now have the best parameters for each classifier in your models list.

2. Ensemble learning

Ensemble learning, which uses the parameters to learn by combining multiple classifiers, is "Touch the sckit-learn ensemble learning" Voting Classifier " Will be carried out with reference to.

It feels like putting the models created in # 1 into the Voting Classifier while looping. We will perform both ensemble learning and learning with individual classifiers.

from collections import defaultdict
from sklearn.ensemble import VotingClassifier
import sklearn.metrics as metrics

def modelingEnsembleLearning(train_features, test_features, train_target, test_target, models):

  mss = defaultdict(list)

  voting = VotingClassifier(list(zip([n for n in model_names],[m for m in models[0]])), voting='soft', weights=list([w for w in weights]))
  voting.fit(train_features,train_target)

  #Estimated by ensemble
  pred_target = voting.predict(test_features)
  ms = metrics.confusion_matrix(test_target.astype(int), pred_target.astype(int))
  mss['voting'].append(ms)

  #Estimated by individual classifiers
  for name, estimator in voting.named_estimators_.items():
      pred_target = estimator.predict(test_features)
      ms = metrics.confusion_matrix(test_target.astype(int), pred_target.astype(int))
      mss[name].append(ms)
      
  return voting, mss

voting, mss = modelingEnsembleLearning(train_features, test_features, train_target, test_target, models)

3. Model evaluation

And finally, the evaluation of the model. This is because the program hasn't changed from the beginning.

from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score

accuracy=accuracy_score(test_target, voting.predict(test_features))
precision=precision_score(test_target.astype(int), voting.predict(test_features).astype(int))
recall=recall_score(test_target.astype(int), voting.predict(test_features).astype(int))

print("Voting")
print("Accuracy : ", accuracy*100, "%")
print("Precision : ", precision*100, "%")
print("Recall : ", recall*100, "%")

When ensemble learning, LGBMClassifier is simply higher in Accuracy, and RandomForestClassifier is higher in Recall. However, the overall good balance may be the result of ensemble learning. When that happens, we will use ensemble learning as a model.

in conclusion

Starting with Basic machine learning procedure: ① Classification model, within my understanding of the classification model, the whole procedure + individual deep digging I have been advancing about. For the time being, I would like to move on to the next step with this as a paragraph about the classification model.

Recommended Posts

Basic machine learning procedure: ④ Classifier learning + ensemble learning
Basic machine learning procedure: ② Prepare data
Machine learning
[Memo] Machine learning
Machine learning classification
Basic machine learning procedure: ③ Compare and examine the selection method of features
Machine Learning sample
Machine learning tutorial summary
About machine learning overfitting
Machine learning ⑤ AdaBoost Summary
Machine Learning: Supervised --AdaBoost
Machine learning logistic regression
Tool MALSS (basic) that supports machine learning in Python
Machine learning support vector machine
Studying Machine Learning ~ matplotlib ~
Private Python learning procedure
Machine learning linear regression
Machine learning course memo
Machine learning library dlib
Machine learning (TensorFlow) + Lotto 6
[Python] I made a classifier for irises [Machine learning]
Somehow learn machine learning
What is ensemble learning?
Summary of the basic flow of machine learning with Python
[Machine learning] Text classification using Transformer model (Attention-based classifier)
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Machine Learning: k-Nearest Neighbors
What is machine learning?
How to use machine learning for work? 03_Python coding procedure
Machine learning learned with Pokemon
Data set for machine learning
Japanese preprocessing for machine learning
Machine learning in Delemas (practice)
An introduction to machine learning
Ensemble learning and basket analysis
Machine learning / classification related techniques
Machine Learning: Supervised --Linear Regression
Ensemble learning summary! !! (With implementation)
Machine learning beginners tried RBM
[Machine learning] Understanding random forest
Machine Learning Study Resource Notepad
Machine learning ② Naive Bayes Summary
Understand machine learning ~ ridge regression ~.
Machine learning article summary (self-authored)
About machine learning mixed matrices
Machine Learning: Supervised --Random Forest
Practical machine learning system memo
Machine learning Minesweeper with PyTorch
Machine learning environment construction macbook 2021
Build a machine learning environment
Python Machine Learning Programming> Keywords
Machine learning algorithm (simple perceptron)
Used in machine learning EDA
Importance of machine learning datasets
Machine learning and mathematical optimization
Machine Learning: Supervised --Support Vector Machine
Supervised machine learning (classification / regression)
I implemented Extreme learning machine
Beginning with Python machine learning