[PYTHON] I tried to implement time series prediction with GBDT

Introduction

We have organized the methods of time series analysis and regression model in the past, so if you are interested, we would appreciate it if you could refer to them as well.

GBDT Time Series Forecast

The python code is below.

#Import required libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from matplotlib import pylab as plt
%matplotlib inline

#Statistical model
import statsmodels.api as sm

# GBDT
from sklearn.ensemble import GradientBoostingRegressor

#Make the graph landscape
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 6

# https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/AirPassengers.html
df = pd.read_csv('AirPassengers.csv')

#Convert to float type
df['#Passengers'] = df['#Passengers'].astype('float64')
df = df.rename(columns={'#Passengers': 'Passengers'})

#Make it a datetime type and index it
df.Month = pd.to_datetime(df.Month)
df = df.set_index("Month")

#Check the contents of the data
df.head()

スクリーンショット 2020-12-05 12.41.18.png

Next, create a correlogram.

#Autocorrelation graph
fig = plt.figure(figsize=(12,8))
fig = sm.graphics.tsa.plot_acf(df["Passengers"], lags=30)

image.png

#Visualize partial autocorrelation
fig = plt.figure(figsize=(12,8))
fig = sm.graphics.tsa.plot_pacf(df["Passengers"], lags=20)

image.png

In this data, if you look at the graph of partial autocorrelation, you can see that there is a correlation every 12 months. In other words, we can see that there are seasonal periodic fluctuations.

Next, create a history for the past 12 months.

for i in range(1, 13):
    df['shift%s'%i] = df['Passengers'].shift(i)

pd.concat([df.head(13), df.tail(3)], axis=0, sort=False)

スクリーンショット 2021-01-11 13.30.13.png

Next, create a diff column that is often used for time series data.

df['deriv1'] = df['shift1'].diff(1)
df[['Passengers', 'deriv1']].head()

スクリーンショット 2021-01-11 13.31.50.png

Next, create the diff column twice.

df['deriv2'] = df['shift1'].diff(1).diff(1)
df[['Passengers', 'deriv2']].head()

スクリーンショット 2021-01-11 13.32.33.png

Finally, add the statistic to the explanatory variables as well.

df['mean'] = df['shift1'].rolling(12).mean()
df['median'] = df['shift1'].rolling(12).median()
df['max'] = df['shift1'].rolling(12).max()
df['min'] = df['shift1'].rolling(12).min()
df[['Passengers', 'mean', 'median', 'max', 'min']][12:24]

スクリーンショット 2021-01-11 13.33.45.png

From now on, we will make predictions with GBDT.

#Delete missing value data
df = df.dropna()
df.head()

x = df.drop('Passengers', axis=1)
y = df['Passengers']

#Create training data and evaluation data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)

#Standardize data
sc = StandardScaler()
sc.fit(x_train) #Standardized with training data
x_train_std = sc.transform(x_train)
x_test_std = sc.transform(x_test)

#Model learning
GBDT = GradientBoostingRegressor()
GBDT.fit(x_train_std, y_train)

#Forecast
y_pred = GBDT.predict(x_test_std)

y_ = np.concatenate([np.array([None for i in range(len(y_train))]), y_pred])
y_ = pd.DataFrame(y_, index=df.index)

plt.figure(figsize=(10,5))
plt.plot(y, label='original')
plt.plot(y_, '--', label='predict')
plt.legend()

結果.png

at the end

Thank you for reading to the end. This time, I tried to predict the time series data using a regression model. When using a regression model, feature creation and selection are important.

If you have a request for correction, we would appreciate it if you could contact us.

Recommended Posts

I tried to implement time series prediction with GBDT
I tried to implement Autoencoder with TensorFlow
I tried to implement CVAE with PyTorch
I tried to implement reading Dataset with PyTorch
I tried to implement PCANet
I tried to implement StarGAN (1)
I tried to implement and learn DCGAN with PyTorch
I tried to implement Minesweeper on terminal with python
I tried to implement an artificial perceptron with python
I tried to implement Grad-CAM with keras and tensorflow
I tried to implement SSD with PyTorch now (Dataset)
I tried to find an alternating series with tensorflow
Easy time series prediction with Prophet
I tried to implement adversarial validation
I tried to implement hierarchical clustering
I tried to implement Realness GAN
I tried to implement a volume moving average with Quantx
I made a package to filter time series with python
I tried to implement breakout (deception avoidance type) with Quantx
GBDT library: I tried fuel consumption prediction (regression) with CatBoost
I tried to implement ListNet of rank learning with Chainer
I tried to implement Harry Potter sort hat with CNN
I tried to implement SSD with PyTorch now (model edition)
I tried to implement PLSA in Python
I tried to describe the traffic in real time with WebSocket
I tried to implement permutation in Python
I tried to visualize AutoEncoder with TensorFlow
I tried to get started with Hy
I tried time series analysis! (AR model)
I tried to implement PLSA in Python 2
I tried to implement ADALINE in Python
I tried to implement sentence classification by Self Attention with PyTorch
I tried to implement PPO in Python
I tried to solve TSP with QAOA
I tried to implement deep learning that is not deep with only NumPy
I tried to implement a blockchain that actually works with about 170 lines
How to write offline real time I tried to solve E11 with python
How to write offline real time I tried to solve E12 with python
I tried to predict next year with AI
I tried to detect Mario with pytorch + yolov3
I tried to use lightGBM, xgboost with Boruta
I tried to learn logical operations with TF Learn
I tried to move GAN (mnist) with keras
I tried to save the data with discord
I tried to detect motion quickly with OpenCV
I tried to integrate with Keras in TFv1.1
I tried to get CloudWatch data with Python
I tried to output LLVM IR with Python
I tried to implement TOPIC MODEL in Python
I tried to detect an object with M2Det!
I tried to automate sushi making with python
I tried to predict Titanic survival with PyCaret
I tried to operate Linux with Discord Bot
I tried to implement selection sort in python
I tried to study DP with Fibonacci sequence
I tried to start Jupyter with Amazon lightsail
I tried to judge Tsundere with Naive Bayes
I tried to implement the traveling salesman problem
I tried to debug.
I tried to paste
I tried to implement merge sort in Python with as few lines as possible