[PYTHON] xgboost: A valid machine learning model for table data

xgboost: High-performance classification / prediction model by gradient boosting of tree. Very popular with kaggle.

Reference

Installation (Official document)

@mac


$ cd <workspace>
$ git clone --recursive https://github.com/dmlc/xgboost
$ cd xgboost; cp make/minimum.mk ./config.mk; make -j4
$ cd python-package; sudo python setup.py install

@ubuntu


$ cd <workspace>
$ git clone --recursive https://github.com/dmlc/xgboost
$ cd xgboost; make -j4
$ cd python-package; sudo python setup.py install

Usage 1: Regression model

regressor.py


import xgboost as xgb
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error


#Data read
boston = load_boston()
X_train, X_test = boston.data[:400], boston.data[400:]
y_train, y_test = boston.target[:400], boston.target[400:]

#Creating an xgboost model
reg = xgb.XGBRegressor()

#Hyperparameter search
reg_cv = GridSearchCV(reg, {'max_depth': [2,4,6], 'n_estimators': [50,100,200]}, verbose=1)
reg_cv.fit(X_train, y_train)
print reg_cv.best_params_, reg_cv.best_score_

#Learn again with optimal parameters
reg = xgb.XGBRegressor(**reg_cv.best_params_)
reg.fit(X_train, y_train)

#Save and load learning model
# import pickle
# pickle.dump(reg, open("model.pkl", "wb"))
# reg = pickle.load(open("model.pkl", "rb"))

#Evaluation of learning model
pred_train = reg.predict(X_train)
pred_test = reg.predict(X_test)
print mean_squared_error(y_train, pred_train)
print mean_squared_error(y_test, pred_test)

#feature importance plot
import pandas as pd
import matplotlib.pyplot as plt
importances = pd.Series(reg.feature_importances_, index = boston.feature_names)
importances = importances.sort_values()
importances.plot(kind = "barh")
plt.title("imporance in the xgboost Model")
plt.show()

boston_importance.png

How to use 2: Classification model

classifier.py


import xgboost as xgb
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_digits
from sklearn.metrics import confusion_matrix, classification_report

#Data read
digits = load_digits()
X_train, X_test = digits.data[:1000], digits.data[1000:]
y_train, y_test = digits.target[:1000], digits.target[1000:]

#Creating an xgboost model
clf = xgb.XGBClassifier()

#Hyperparameter search
clf_cv = GridSearchCV(clf, {'max_depth': [2,4,6], 'n_estimators': [50,100,200]}, verbose=1)
clf_cv.fit(X_train, y_train)
print clf_cv.best_params_, clf_cv.best_score_

#Learn again with optimal parameters
clf = xgb.XGBClassifier(**clf_cv.best_params_)
clf.fit(X_train, y_train)

#Save and load learning model
# import pickle
# pickle.dump(clf, open("model.pkl", "wb"))
# clf = pickle.load(open("model.pkl", "rb"))

#Evaluation of learning model
pred = clf.predict(X_test)
print confusion_matrix(y_test, pred)
print classification_report(y_test, pred)

#              precision    recall  f1-score   support
# 
#           0       0.94      0.97      0.96        79
#           1       0.90      0.79      0.84        80
#           2       0.99      0.88      0.93        77
#           3       0.89      0.82      0.86        79
#           4       0.94      0.90      0.92        83
#           5       0.92      0.95      0.93        82
#           6       0.95      0.97      0.96        80
#           7       0.96      0.96      0.96        80
#           8       0.82      0.91      0.86        76
#           9       0.79      0.90      0.84        81
# 
# avg / total       0.91      0.91      0.91       797

Recommended Posts

xgboost: A valid machine learning model for table data
Data set for machine learning
Inversely analyze a machine learning model
Creating a development environment for machine learning
[Updated Ver1.3.1] I made a data preprocessing library DataLiner for machine learning.
Is Cutmix valid for table data as well?
A story about data analysis by machine learning
Creating a position estimation model for the Werewolf Intelligence Tournament using machine learning
[Machine learning] Create a machine learning model by performing transfer learning with your own data set
[Python] I made a classifier for irises [Machine learning]
Memo for building a machine learning environment using Python
Machine learning model considering maintainability
Japanese preprocessing for machine learning
Build a machine learning environment
Performance verification of data preprocessing for machine learning (numerical data) (Part 2)
Made icrawler easier to use for machine learning data collection
I tried using Tensorboard, a visualization tool for machine learning
Create a python machine learning model relearning mechanism with mlflow
Performance verification of data preprocessing for machine learning (numerical data) (Part 1)
Build a PyData environment for a machine learning study session (January 2017)
How about Anaconda for building a machine learning environment in Python?
Building a Windows 7 environment for getting started with machine learning with Python
<For beginners> python library <For machine learning>
Machine learning in Delemas (data acquisition)
Machine learning meeting information for HRTech
Preprocessing in machine learning 2 Data acquisition
[Recommended tagging for machine learning # 4] Machine learning script ...?
Preprocessing in machine learning 4 Data conversion
Basic machine learning procedure: ② Prepare data
How to collect machine learning data
One-click data prediction for the field realized by fully automatic machine learning
Quickly build a python environment for deep learning and data science (Windows)
Summary of mathematical scope and learning resources required for machine learning and data science
<Course> Machine Learning Chapter 3: Logistic Regression Model
First Steps for Machine Learning (AI) Beginners
Machine learning imbalanced data sklearn with k-NN
Create a model for your Django schedule
An introduction to OpenCV for machine learning
Why Python is chosen for machine learning
A story about machine learning with Kyasuket
"Usable" one-hot Encoding method for machine learning
Python: Preprocessing in machine learning: Data acquisition
[Shakyo] Encounter with Python for machine learning
<Course> Machine Learning Chapter 1: Linear Regression Model
[Python] First data analysis / machine learning (Kaggle)
Cross Validation improves machine learning model accuracy
[Python] Web application design for machine learning
<Course> Machine Learning Chapter 2: Nonlinear Regression Model
An introduction to Python for machine learning
Python: Preprocessing in machine learning: Data conversion
About data expansion processing for deep learning
Preprocessing in machine learning 1 Data analysis process
Try to draw a "weather map-like front" by machine learning based on weather data (5)
Try to draw a "weather map-like front" by machine learning based on weather data (3)
Machine learning beginners tried to make a horse racing prediction model with python
Machine learning
Try to draw a "weather map-like front" by machine learning based on weather data (1)
Try to draw a "weather map-like front" by machine learning based on weather data (4)
I tried to implement various methods for machine learning (prediction model) using scikit-learn.
I tried to process and transform the image and expand the data for machine learning
Implementation of a model that predicts the exchange rate (dollar-yen rate) by machine learning