[PYTHON] Challenge to future sales forecast: ④ Time series analysis considering seasonality by Stats Models

Introduction

Until the last time, we have used the ARIMA model and ARIMAX model of time series analysis to forecast future sales.

-Challenge to future sales forecast: ① What is time series analysis? -Challenge to future sales forecast: ② Time series analysis using PyFlux -Challenge to future sales forecast: ③ Parameter tuning of PyFlux

However, the accuracy does not improve. Assuming that the consideration of seasonality is not enough, I would like to apply the ARIMA model = SARIMA model including seasonality next.

However, it seems that SARIMA cannot be used with the PyFlux used up to the last time, so "[Predict the transition of TV Asahi's viewing rate with the SARIMA model](https://qiita.com/mshinoda88/items/749131478bfefc9bf365#sarima%] E3% 83% A2% E3% 83% 87% E3% 83% AB% E5% AD% A3% E7% AF% 80% E8% 87% AA% E5% B7% B1% E5% 9B% 9E% E5% B8% B0% E5% 92% 8C% E5% 88% 86% E7% A7% BB% E5% 8B% 95% E5% B9% B3% E5% 9D% 87% E3% 83% A2% E3% 83% 87% E3% 83% AB) ”will be used as a reference to use Stats Models.

Analytical environment

Google Colaboratory

Target data

As before [previous], the data uses daily sales and temperature (average, maximum, minimum) as explanatory variables.

date Sales amount Average temperature Highest temperature Lowest Temperature
2018-01-01 7,400,000 4.9 7.3 2.2
2018-01-02 6,800,000 4.0 8.0 0.0
2018-01-03 5,000,000 3.6 4.5 2.7
2018-01-04 7,800,000 5.6 10.0 2.6

ARIMA model (SARIMA) considering seasonality

Creating the original data is the same as Up to the last time. I will actually make a model immediately, but it can be used in the same way as pyflux.

We will also carry out parameter tuning using the previous. As SARIMA, the parameters (sp, sd, sq) considering seasonality are increasing.

You also need to set the following parameters: --enforce_stationarity: Whether to correct the stationarity of AR --enforce_invertibility: Whether to enforce MA repeatability

import pandas as pd
import statsmodels.api as sm

def optimisation_sarima(df, target):

  df_optimisations = pd.DataFrame(columns=['p','d','q','sp','sd','sq','aic'])

  max_p=4
  max_d=4
  max_q=4

  max_sp=2
  max_sd=2
  max_sq=2


  for p in range(0, max_p):
    for d in range(0, max_d):
      for q in range(0, max_q):
        for sp in range(0, max_sp):
          for sd in range(0, max_sd):
            for sq in range(0, max_sq):

              model = sm.tsa.SARIMAX(
                            df.kingaku, order=(p,d,q), 
                            seasonal_order=(sp,sd,sq,4), 
                            enforce_stationarity = False, 
                            enforce_invertibility = False
                        )
              x = model.fit()

              print("AR:",p, " I:",d, " MA:",q, "SAR:",sp, "SI:",sd, "SMA:",sq," AIC:", x.aic)

              tmp = pd.Series([p,d,q,sp,sd,sq,x.aic],index=df_optimisations.columns)
              df_optimisations = df_optimisations.append( tmp, ignore_index=True )

  return df_optimisations

df_optimisations = optimisation_sarima(df, 'Sales amount')
df_optimisations[df_optimisations.aic == min(df_optimisations.aic)]

This will display the parameter with the lowest AIC.

p d q sp sd sq aic
2.0 0.0 3.0 1.0 1.0 1.0 11056.356866

Specify that parameter in the model and rotate the model again to see the evaluation of the model.


sarima = sm.tsa.SARIMAX(
    df.kingaku, order=(3,0,3), 
    seasonal_order=(1,1,1,4), 
    enforce_stationarity = False, 
    enforce_invertibility = False
).fit()

sarima.summary()

You should see a result similar to the following:

Statespace Model Results
Dep. Variable:	kingaku	No. Observations:	363
Model:	SARIMAX(3, 0, 3)x(1, 1, 1, 4)	Log Likelihood	-5416.395
Date:	Tue, 03 Mar 2020	AIC	10850.790
Time:	11:18:46	BIC	10885.537
Sample:	01-03-2018	HQIC	10864.619
- 12-31-2018		
Covariance Type:	opg		
coef	std err	z	P>|z|	[0.025	0.975]
ar.L1	0.7365	0.132	5.583	0.000	0.478	0.995
ar.L2	-0.3535	0.165	-2.145	0.032	-0.677	-0.031
ar.L3	-0.5178	0.132	-3.930	0.000	-0.776	-0.260
ma.L1	-0.4232	0.098	-4.315	0.000	-0.615	-0.231
ma.L2	-0.0282	0.096	-0.295	0.768	-0.216	0.159
ma.L3	0.6885	0.068	10.140	0.000	0.555	0.822
ar.S.L4	0.4449	0.091	4.903	0.000	0.267	0.623
ma.S.L4	-0.7696	0.057	-13.547	0.000	-0.881	-0.658
sigma2	1.489e+12	6.05e-14	2.46e+25	0.000	1.49e+12	1.49e+12
Ljung-Box (Q):	777.09	Jarque-Bera (JB):	44.86
Prob(Q):	0.00	Prob(JB):	0.00
Heteroskedasticity (H):	1.09	Skew:	0.60
Prob(H) (two-sided):	0.63	Kurtosis:	4.27


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 3.14e+41. Standard errors may be unstable.

AIC doesn't seem to have changed that much. .. .. Let's look at the graph.

#Forecast
ts_pred = sarima.predict()

#Illustration of actual data and forecast results
plt.figure(figsize=(15, 10))
plt.plot(df.kingaku, label='DATA')
plt.plot(ts_pred, label='SARIMA', color='red')
plt.legend(loc='best')

image.png

Blue is a real number and red is a model value. Although it has become possible to predict the rise and fall in normal times, it is not possible to predict the extreme parts such as the end of the year. Also, the beginning of the year has become a strange number.

in conclusion

Is it a feeling of going 3 steps and going down 2.5 steps? Next, we are considering how to improve it. .. ..

Recommended Posts

Challenge to future sales forecast: ④ Time series analysis considering seasonality by Stats Models
Challenge to future sales forecast: ⑤ Time series analysis by Prophet
Challenge to future sales forecast: ③ PyFlux parameter tuning
Time series analysis Part 3 Forecast
Challenges for future sales forecasts: (1) What is time series analysis?
Python: Time Series Analysis
Time series analysis 1 Basics
A study method for beginners to learn time series analysis
Time series analysis using a general Gaussian state-space model using Python [Implementation example considering exogenous and seasonality]