[PYTHON] Time series analysis 4 Construction of SARIMA model

Aidemy 2020/10/29

Introduction

Hello, it is Yope! I am a liberal arts student, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the fourth post in the next series analysis. Nice to meet you.

What to learn this time ・ Construction of SARIMA model

About the SARIMA model

-The __SARIMA model __ is a model in which ARIMA (p, d, q), which converts data into a step difference series, can be converted to a time series with a __seasonal cycle. -The SARIMA model has a parameter called __SARIMA (sp, sd, sq, s) __.

sp,sd,sq -Sp, sd, sq are called __ "seasonal autocorrelation", "seasonal derivation", and "seasonal moving average" __, respectively, and are basically the same as p, d, q of ARIMA. ・ It was added that the data of sp, sd, and sq are influenced by the data of the past seasonal period. -The other parameter of SARIMA, __ "s" __, represents __seasonal cycle __. If it is a 12-month cycle, s = 12 should be set. -For sq, it indicates how many cycles ago the affected "past seasonal period" is.

Parameter determination

-It is necessary to check appropriate values for the above parameters. ・ At this time, __Information Criterion __ is used. This time, I will use __ "BIC" __ among them. However, this time we will not deal with it in detail. -The lower the BIC value, the more appropriate the parameter value.

Visualization of autocorrelation coefficient / partial autocorrelation coefficient

-__ Partial autocorrelation __ is __ without the influence of data between the two values that perform autocorrelation __. -For example, the partial autocorrelation between y1 and y7 is obtained by removing the influence of y2 to y6 between them. -By visualizing this partial autocorrelation, __ set the optimum value of the __parameter "s". -When visualized, the value of the partial autocorrelation becomes high in the __period part, so this should be the value of s.

-Visualization (graphing) of partial autocorrelation is performed as follows. __sm.graphics.tsa.plot_pacf (data) __

-Code (visualization of correlation of wine sales data)![Screenshot 2020-10-29 14.27.54.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/ 0/698700/81b92856-d4b8-1d98-03e7-28261cb8cabc.png)

・ Result![Screenshot 2020-10-29 14.28.23.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/3e5a9a67-eb31-5839- b557-ebc383e407e3.png)

Building a SARIMA model

・ Construction procedure ① __Data reading __: pd.read_csv () ② __Data organization __: pd.date_range () ③ __Data visualization __: sm.graphics.tsa.plot_pacf () ④ __Data cycle (s) grasp __: From ④. ⑤ Parameter settings other than __s __: Grasp with BIC --New part from here-- ⑥ __ Model construction __: __ sm.tsa.statespace.SARIMAX (). Fit () __ ⑦ __Data prediction / visualization __: __ predict () __ / plt.show ()

・ ⑥ The arguments of SARIMAX () for model construction are as follows. __SARIMAX (data, order = (p, d, q), seasonal_order = (sp, sd, sq, s)) __

・ ⑦ The predict () of data prediction is as follows. Only the start of the forecast needs to be the time in the time series data. __ Model.predict ("prediction start", "prediction end") __

・ A series of code execution (⑤ is not done)![Screenshot 2020-10-29 14.31.57.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/ 0/698700/cfbcbcc9-9b7b-6a92-d24d-327dfa3f1d97.png)

・ Result![Screenshot 2020-10-29 14.32.20.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/63b27173-80e3-81fd- 6f4f-d68be98d37b5.png)

Summary

-The __SARIMA model __ is a model that makes ARIMA (p, d, q), which converts data into a step difference series, compatible with time series that have a seasonal cycle. -The SARIMA model has parameters __ (sp, sd, sq, s) __. -For sp, sd, and sq, check the appropriate values using BIC, the __information criterion __. ・ S is a diagrammatic examination of __partial autocorrelation __. -Pass these parameters to __sm.tsa.statespace.SARIMAX (). Fit () __ to build a model and make predictions with __predict () __.

This time is over. Thank you for reading until the end.

Recommended Posts

Time series analysis 4 Construction of SARIMA model
Python: Time Series Analysis: Building a SARIMA Model
Time series analysis 3 Preprocessing of time series data
I tried time series analysis! (AR model)
Python: Time Series Analysis
RNN_LSTM1 Time series analysis
Time series analysis 1 Basics
Python: Time Series Analysis: Stationarity, ARMA / ARIMA Model
Power of forecasting methods in time series data analysis Semi-optimization (SARIMA) [Memo]
Time series analysis part 4 VAR
Time series analysis Part 3 Forecast
Time series analysis Part 1 Autocorrelation
Prediction model construction ①
Prediction model construction ①
real-time-Personal-estimation (new model construction)
Kaggle House Prices ② ~ Model Creation ~
Time series analysis 4 Construction of SARIMA model
I implemented "Basics of Time Series Analysis and State Space Model" (Hayamoto) with pystan
Calculation of time series customer loyalty
Python: Time Series Analysis: Preprocessing Time Series Data
Time series analysis practice sales forecast
Differentiation of time series data (discrete)
[Statistics] [Time series analysis] Plot the ARMA model and grasp the tendency.
Time series analysis Part 2 AR / MA / ARMA
[Python] Accelerates loading of time series CSV
Time series analysis # 6 Spurious regression and cointegration
Prediction model construction ①
Time variation analysis of black holes using python
measurement of time
Shortening the analysis time of Openpose using sound
Time Series Decomposition
Acquisition of time series data (daily) of stock prices
Smoothing of time series and waveform data 3 methods (smoothing)
View details of time series data with Remotte
Introduction to Time Series Analysis ~ Seasonal Adjustment Model ~ Implemented in R and Python
Python 3.4 Create Windows7-64bit environment (for financial time series analysis)
Anomaly detection of time series data by LSTM (Keras)
"Measurement Time Series Analysis of Economic and Finance Data" Solving Chapter End Problems with Python
Python time series question
What you should not do in the process of time series data analysis (including reflection)
Basics of regression analysis
Measurement of execution time
Display TOPIX time series
Time series plot / Matplotlib
Challenge to future sales forecast: ② Time series analysis using PyFlux
A study method for beginners to learn time series analysis
Reformat the timeline of the pandas time series plot with matplotlib
Time change visualization (spectrogram) of frequency analysis (FFT) with scipy
A story about clustering time series data of foreign exchange
Challenge to future sales forecast: ⑤ Time series analysis by Prophet
Challenges for future sales forecasts: (1) What is time series analysis?
Time series analysis using a general Gaussian state-space model using Python [Implementation example considering exogenous and seasonality]
A beginner who has been programming for 2 months tried to analyze the real GDP of Japan in time series with the SARIMA model.