[PYTHON] How to set xg boost using Optuna

How to set xg boost using Optuna

Let's set the regression of xgboost.

For xgboost, please refer to other HP. "What is Xgboost?" & A memo about the main parameters " https://qiita.com/2357gi/items/913af8b813b069617aad

Later, I referred to the parameters on the official website. https://xgboost.readthedocs.io/en/latest/parameter.html

I put in various things, but the decision tree system tends to be overfitted, so I think it is better to set the parameters that control it firmly. In xgboost, it is lambda and ʻalpha, but when setting with python, specify it with reg_ like reg_lambdaandreg_alpha`.

#Set the objective function of optuna
#It is a parameter setting of gtree.
def objective(trial):
    eta =  trial.suggest_loguniform('eta', 1e-8, 1.0)
    gamma = trial.suggest_loguniform('gamma', 1e-8, 1.0)
    max_depth = trial.suggest_int('max_depth', 1, 20)
    min_child_weight = trial.suggest_loguniform('min_child_weight', 1e-8, 1.0)
    max_delta_step = trial.suggest_loguniform('max_delta_step', 1e-8, 1.0)
    subsample = trial.suggest_uniform('subsample', 0.0, 1.0)
    reg_lambda = trial.suggest_uniform('reg_lambda', 0.0, 1000.0)
    reg_alpha = trial.suggest_uniform('reg_alpha', 0.0, 1000.0)
    
  
    regr =xgb.XGBRegressor(eta = eta, gamma = gamma, max_depth = max_depth,
                           min_child_weight = min_child_weight, max_delta_step = max_delta_step,
                           subsample = subsample,reg_lambda = reg_lambda,reg_alpha = reg_alpha)
 
    score = cross_val_score(regr, X_train, y_train, cv=5, scoring="r2")
    r2_mean = score.mean()
    print(r2_mean)
 
    return r2_mean

#Find the optimal value with optuna
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=500)
 
#Fits tuned hyperparameters
optimised_rf = xgb.XGBRegressor(eta = study.best_params['eta'],gamma = study.best_params['gamma'],
                                max_depth = study.best_params['max_depth'],min_child_weight = study.best_params['min_child_weight'],
                                max_delta_step = study.best_params['max_delta_step'],subsample = study.best_params['subsample'],
                                reg_lambda = study.best_params['reg_lambda'],reg_alpha = study.best_params['reg_alpha'])
 
optimised_rf.fit(X_train ,y_train)

This is the result of Boston.

xgboost_Figure 2020-08-08 185911.png

It's all.

# -*- coding: utf-8 -*-
 
from sklearn import datasets
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
import pandas as pd
import optuna
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import cross_val_score
 
#Load a Boston dataset
boston = datasets.load_boston()
 
#print(boston['feature_names'])
#Separate features and objective variables
X = boston['data']
y = boston['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8)

#Set the objective function of optuna
#It is a parameter setting of gtree.
def objective(trial):
    eta =  trial.suggest_loguniform('eta', 1e-8, 1.0)
    gamma = trial.suggest_loguniform('gamma', 1e-8, 1.0)
    max_depth = trial.suggest_int('max_depth', 1, 20)
    min_child_weight = trial.suggest_loguniform('min_child_weight', 1e-8, 1.0)
    max_delta_step = trial.suggest_loguniform('max_delta_step', 1e-8, 1.0)
    subsample = trial.suggest_uniform('subsample', 0.0, 1.0)
    reg_lambda = trial.suggest_uniform('reg_lambda', 0.0, 1000.0)
    reg_alpha = trial.suggest_uniform('reg_alpha', 0.0, 1000.0)
    
  
    regr =xgb.XGBRegressor(eta = eta, gamma = gamma, max_depth = max_depth,
                           min_child_weight = min_child_weight, max_delta_step = max_delta_step,
                           subsample = subsample,reg_lambda = reg_lambda,reg_alpha = reg_alpha)
 
    score = cross_val_score(regr, X_train, y_train, cv=5, scoring="r2")
    r2_mean = score.mean()
    print(r2_mean)
 
    return r2_mean
 
#Find the optimal value with optuna
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=500)
 
#Fits tuned hyperparameters
optimised_rf = xgb.XGBRegressor(eta = study.best_params['eta'],gamma = study.best_params['gamma'],
                                max_depth = study.best_params['max_depth'],min_child_weight = study.best_params['min_child_weight'],
                                max_delta_step = study.best_params['max_delta_step'],subsample = study.best_params['subsample'],
                                reg_lambda = study.best_params['reg_lambda'],reg_alpha = study.best_params['reg_alpha'])
 
optimised_rf.fit(X_train ,y_train)
#View results
print("Fits training data")
print("Training data accuracy=", optimised_rf.score(X_train, y_train))
pre_train = optimised_rf.predict(X_train)
print("Fits test data")
print("Test data accuracy=", optimised_rf.score(X_test, y_test))
pre_test = optimised_rf.predict(X_test)
 
#Graph display
plt.scatter(y_train, pre_train, marker='o', cmap = "Blue", label="train")
plt.scatter(y_test ,pre_test, marker='o', cmap= "Red", label="test")
plt.title('boston')
plt.xlabel('measurment')
plt.ylabel('predict')
#Fine-tune the text here
x = 30  
y1 = 12
y2 = 10
s1 =  "train_r2 =" + str(optimised_rf.score(X_train, y_train))
s2 =  "test_r2 =" + str(optimised_rf.score(X_test, y_test))
plt.text(x, y1, s1)
plt.text(x, y2, s2)
 
plt.legend(loc="upper left", fontsize=14)
plt.show()


Recommended Posts

How to set xg boost using Optuna
How to set up SVM using Optuna
How to set up Random forest using Optuna
How to set up Random forest using Optuna
How to set optuna (how to write search space)
How to set up a Python environment using pyenv
[Kaggle] Try using xg boost
How to set layer on Lambda using AWS SAM
Survey log of how to optimize LightGBM hyperparameters using Optuna
How to install python using anaconda
[Blender] How to set shape_key with script
How to draw a graph using Matplotlib
How to set the server time to Japanese time
How to install a package using a repository
Survivor prediction using kaggle's titanic xg boost [80.1%]
How to download youtube videos using pytube3
How to display Map using Google Map API (Android)
How to code a drone using image recognition
How to set browser location in Headless Chrome
How to set Django DB to mongodb visual studio 2019
How to deal with SessionNotCreatedException when using Selenium
How to get article data using Qiita API
How to set up a local development server
How to search HTML data using Beautiful Soup
How to upload to a shared drive using pydrive
How to uninstall a module installed using setup.py
How to set CPU affinity for process threads
How to set up public key authentication in ssh
How to write a GUI using the maya command
How to scrape horse racing data using pandas read_html
How to auto-submit Microsoft Forms using python (Mac version)
How to hold a hands-on seminar using Jupyter using docker
How to right click using keyboard input in RPA?
How to make a Python package using VS Code
[Blender] How to dynamically set the selection of EnumProperty
Set PATH equivalent to "sudo su-" using Ansible environment
How to exit when using Python in Terminal (Mac)
[Introduction to Udemy Python3 + Application] 30. How to use the set
How to play Cyberpunk 2077 on Linux/Ubuntu 20.04 using AMD GPU
How to analyze with Google Colaboratory using Kaggle API
How to retrieve multiple arrays using slice in python.
[Introduction to Python] How to stop the loop using break?
How to execute a command using subprocess in Python
How to set up and compile your Cython environment
How to write faster when using numpy like deque
[Introduction to Python] How to write repetitive statements using for statements
How to use xml.etree.ElementTree
How to use virtualenv
Scraping 2 How to scrape
How to use Seaboan
How to use image-match
How to use shogun
How to install Python
How to use Pandas 2
How to read PyPI
How to install pip
How to use Virtualenv
How to use numpy.vectorize
How to update easy_install
How to install archlinux
How to use pytest_report_header