[PYTHON] Try to predict cherry blossoms with xgboost

Try to predict cherry blossoms with xgboost Use data from March of last year to February of this year Python Beginner Machine learning

1. Purpose

There was an AI Sakura prediction, and there was an article saying that xgboost was used, so I tried to predict cherry blossoms with xgboost. https://www.businessinsider.jp/post-186528

2. Conclusion

It was a subtle result. It turned out that the above AI Sakura prediction is excellent. 無題.png Factors that have a large effect on the flowering time are the annual average temperature, the sunshine hours in July, the rainfall in August, and the lowest temperature in October. I know the average annual temperature, but I was surprised at the hours of sunshine in July, the rainfall in August, and the lowest temperature in October.

3. Data source

https://www.data.jma.go.jp/gmd/risk/obsdl/index.php The above data from the Japan Meteorological Agency was processed and used.

4. Code explanation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from xgboost import XGBRegressor, plot_importance
from sklearn.model_selection import GridSearchCV, KFold
from tqdm import tqdm_notebook
path="./"
train = pd.read_csv(path+"kikou5train.csv")
x_test = pd.read_csv(path+"kikou5test.csv")

y_train = train.kaika.copy()
x_train = train.drop("kaika", axis=1)
train_id = x_train.Id
x_train.head()
date	avtmp3	maxtmp3	mintmp3	ame3	nisho3	joki3	kumo3	avtmp4	maxtmp4	...	kumo13	avtmp14	maxtmp14	mintmp14	ame14	nisho14	joki14	kumo14	kaika	TotalInc
Id																					
1	1961	8.2	21.9	-0.4	106.6	181.1	6.7	6.3	14.9	26.0	...	3.8	5.9	24.5	-2.6	13.5	195.0	4.5	4.1	NaN	193.6
2	1962	8.2	18.8	-0.8	65.5	189.8	6.3	4.7	14.1	24.5	...	2.0	4.8	15.3	-4.1	21.3	199.9	4.1	4.9	NaN	182.3

Join

df = pd.concat([x_train, x_test])
df.head()

Feature Engineering Added annual average temperature

df["TotalInc"] = df.avtmp3 + df.avtmp4 + df.avtmp5 + df.avtmp6 + df.avtmp7 + df.avtmp8 + df.avtmp9 + df.avtmp10 + df.avtmp11 + df.avtmp12 + df.avtmp13 + df.avtmp14 #Of average temperature
df.head()
date	avtmp3	maxtmp3	mintmp3	ame3	nisho3	joki3	kumo3	avtmp4	maxtmp4	...	kumo13	avtmp14	maxtmp14	mintmp14	ame14	nisho14	joki14	kumo14	kaika	TotalInc
0	1980	8.2	21.2	1.3	173.5	157.5	6	6.2	13.6	24	...	2.9	5.3	17.2	-3.5	38	157.3	4.6	5.5	NaN	183.4
1 rows × 87 columns
x_train = df[df.Id.isin(train_id)].set_index("Id")
x_test = df[~df.Id.isin(train_id)].set_index("Id")

Optimal hyperparameter search

random_state = 0
params = {
          "learning_rate": [0.01, 0.05, 0.1],
          "min_child_weight": [0.1],
          "gamma": [0],
          "reg_alpha": [0],
          "reg_lambda": [1],
          "max_depth": [3, 5, 7],
          "max_delta_step": [0],
          "random_state": [random_state],
          "n_estimators": [50, 100, 200],
          }
reg = XGBRegressor()
cv = KFold(n_splits=3, shuffle=True, random_state=random_state)
reg_gs = GridSearchCV(reg, params, cv=cv)
reg_gs.fit(x_train, y_train)
GridSearchCV(cv=KFold(n_splits=3, random_state=0, shuffle=True),
             estimator=XGBRegressor(base_score=None, booster=None,
                                    colsample_bylevel=None,
                                    colsample_bynode=None,
                                    colsample_bytree=None, gamma=None,
                                    gpu_id=None, importance_type='gain',
                                    interaction_constraints=None,
                                    learning_rate=None, max_delta_step=None,
                                    max_depth=None, min_child_weight=None,
                                    missing=nan, monoto...
                                    num_parallel_tree=None, random_state=None,
                                    reg_alpha=None, reg_lambda=None,
                                    scale_pos_weight=None, subsample=None,
                                    tree_method=None, validate_parameters=None,
                                    verbosity=None),
             param_grid={'gamma': [0], 'learning_rate': [0.01, 0.05, 0.1],
                         'max_delta_step': [0], 'max_depth': [3, 5, 7],
                         'min_child_weight': [0.1],
                         'n_estimators': [50, 100, 200], 'random_state': [0],
                         'reg_alpha': [0], 'reg_lambda': [1]})
display(reg_gs.best_params_)
display(reg_gs.best_score_)
ax = plot_importance(reg_gs.best_estimator_, importance_type="gain")
fig = ax.figure
fig.set_size_inches(250, 250)
ax.figure.set_size_inches(18,18)
{'gamma': 0,
 'learning_rate': 0.1,
 'max_delta_step': 0,
 'max_depth': 5,
 'min_child_weight': 0.1,
 'n_estimators': 50,
 'random_state': 0,
 'reg_alpha': 0,
 'reg_lambda': 1}
0.36250088820449333

Forecast

y_pred3 = reg_gs.predict(x_test)

Evaluate the error from the correct label

y_true = pd.read_csv(path+"kikou5test.csv")
preds = pd.DataFrame({"pred3": y_pred3})
df_out = pd.concat([y_true, preds], axis=1)
df_out.head()
Id	date	avtmp3	maxtmp3	mintmp3	ame3	nisho3	joki3	kumo3	avtmp4	...	avtmp14	maxtmp14	mintmp14	ame14	nisho14	joki14	kumo14	kaika	pred3	loss3
0	100	1966	9.6	21.6	1.2	99.9	150.4	7.0	6.6	13.6	...	4.9	19.1	-4.0	43.8	162.6	5.1	5.0	30	29.816103	0.033818

RMSE

df_out["loss3"] = (df_out.kaika - df_out.pred3)**2
df_out.iloc[:, -3:].mean()
kaika    24.909091
pred3    26.849123
loss3    23.966188
dtype: float64
from sklearn.metrics import mean_squared_error, mean_absolute_error
#RMSE
rmse_kaika = np.sqrt(mean_squared_error(df_out.kaika, df_out.pred3))
rmse_kaika
4.895527368155607

The prediction accuracy of cherry blossoms is less than 5 days. It was surprisingly predictable, but subtle.

Recommended Posts

Try to predict cherry blossoms with xgboost
Try to predict forex (FX) with non-deep machine learning
Try to predict if tweets will burn with machine learning
Try to operate Facebook with Python
Try to profile with ONNX Runtime
Try to output audio with M5STACK
Try to reproduce color film with Python
Try logging in to qiita with Python
Try converting to tidy data with pandas
Quickly try to visualize datasets with pandas
First YDK to try with Cisco IOS-XE
Try to generate an image with aliasing
Try to predict FX with LSTM using Keras + Tensorflow Part 2 (Calculate with GPU)
I tried to predict next year with AI
Try to make your own AWS-SDK with bash
I tried to use lightGBM, xgboost with Boruta
Try to aggregate doujin music data with pandas
Try to solve the man-machine chart with Python
Try to extract Azure document DB document with pydocumentdb
Try to draw a life curve with python
Try to communicate with EV3 and PC! (MQTT)
How to try the friends-of-friends algorithm with pyfof
Try to automatically generate Python documents with Sphinx
I tried to predict Titanic survival with PyCaret
Try to make a dihedral group with Python
Try to make client FTP fastest with Pythonista
Try to predict FX with LSTM using Keras + Tensorflow Part 3 (Try brute force parameters)
Try to detect fish with python + OpenCV2.4 (unfinished)
Try to solve the programming challenge book with python3
[First API] Try to get Qiita articles with Python
How to use xgboost: Multi-class classification with iris data
Try to make a command standby tool with python
Try to dynamically create a Checkbutton with Python's Tkinter
Try to solve the internship assignment problem with Python
Try to operate DB with Python and visualize with d3
Try to make RESTful API with MVC using Flask 1.0.2
Schema-driven development with Responder: Try to display Swagger UI
[GCP] Try a sample to authenticate users with Firebase
[Neo4J] ④ Try to handle the graph structure with Cypher
A sample to try Factorization Machines quickly with fastFM
Try to tamper with requests from iphone with Burp Suite
Try to automate pdf format report creation with Python
Try to specify the axis with PyTorch's Softmax function
Try scraping with Python.
Convert 202003 to 2020-03 with pandas
Try to implement yolact
Try SNN with BindsNET
Try regression with TensorFlow
I tried to predict and submit Titanic survivors with Kaggle
Try to build a deep learning / neural network with scratch
Try to play with the uprobe that supports Systemtap directly
[Evangelion] Try to automatically generate Asuka-like lines with Deep Learning
Try to display various information useful for debugging with python
When I try to push with heroku, it doesn't work
[AWS] Try adding Python library to Layer with SAM + Lambda (Python)
Try to link iTunes and Hue collection case with MQTT
Try to bring up a subwindow with PyQt5 and Python
Try to extract Azure SQL Server data table with pyodbc
Try to automate the operation of network devices with Python
Try to process Titanic data with preprocessing library DataLiner (Append)
Try to get data while port forwarding to RDS with anaconda.