Time series analysis using a general Gaussian state-space model using Python [Implementation example considering exogenous and seasonality]

What to do in this article

-** I will describe how to implement time series analysis of a general Gauss state-space model using Python ** --Implement using stats models --Implementation when seasonality and exogenousness are taken into consideration --The theoretical part is limited to the reference article. --No detailed examination (cannot be done ...) -** "Basics of Time Series Analysis and State Space Models Theory and Implementation Learned with R and Stan" I read it, but I was in trouble because I didn't understand R! Write in python! !! Person **

Introduction

I am studying time series analysis using ** "Basics of Time Series Analysis and State Space Models: Theory and Implementation Learned with R and Stan" ** (so-called Hayabusa). However, since I am used to Python, I skipped the implementation in R and implemented it by referring to the following article on the blog of the author of the book. --State space model by Python: https://logics-of-blue.com/python-state-space-models/

In this article, I was able to understand how to implement a general Gaussian state-space model using Python, but it was not as effective as the implementation method when it was exogenous. Therefore, we will use ** separate data to summarize the implementation and effects of exogenous data. ** **

Data preparation

--Using TEPCO's 2019 electricity demand data: http://www.tepco.co.jp/forecast/index-j.html

python


df = pd.read_csv("data/juyo-2019.csv",encoding="shift-jis")
df.head()

Since the data is hourly, reorganize it into daily data. You can do it right away with the groupby function. Then correct the index of the data to date.

python


df["date"] = pd.to_datetime(df["date"],format="%Y/%m/%d")
df["power"] = df["power"].values.astype(int)
daily_data = df.groupby("date")["power"].sum().reset_index()
daily_data = daily_data.set_index(["date"])
rcParams["figure.figsize"] = [12,5]
daily_data.plot()

ダウンロード (5).png

Looking at the data, we can see that there are ** periodic jaggedness **. Since this fluctuates every 7 days, it can be inferred that it is a cycle related to the value day of the week. In other words, electricity demand changes depending on whether it is a holiday or not.

In addition, it is presumed that not only the days of the week but also holidays affect the power demand. Holidays do not occur periodically and can be modeled as extrinsic.

** In summary, we will try to model more accurately by modeling the following. --Periodic: Occurs every 7 days on the day of the week --Extrinsic: Occurs randomly on holidays **

When seasonality and exogenous are not considered

First, let's model without considering seasonality, that is, the day of the week. Model using a local linear model that does not consider seasonality and exogenousness. For this modeling,

python


#Local linear trend model
mod_trend = sm.tsa.UnobservedComponents(daily_data,"local linear trend")
res_trend = mod_trend.fit(method="bfgs")

rcParams["figure.figsize"] = 15,15
fig = res_trend.plot_components()

ダウンロード (8).png

In the figure, the top is the actual data, the second is the state, and the third is the trend data. ** You can see that the trend is messed up without considering the periodicity. ** **

When considering seasonality

Next, when considering periodicity. All you have to do is put the seasonal argument in Unobserved Components. Since the day of the week is taken into consideration here, the argument is set to 7. A graph showing seasonality has been added to the fourth of the figure. You can see that it fluctuates periodically.

python


#Local linear trend model with seasonality
mod_season_trend = sm.tsa.UnobservedComponents(daily_data.power,
                                              "local linear trend",
                                              seasonal = 7)

res_season_trend = mod_season_trend.fit(
    method='bfgs', 
    maxiter=1000, 
)

rcParams["figure.figsize"] = 15,20
fig = res_season_trend.plot_components()

ダウンロード (9).png

** The big difference from the previous one is that the trend data is no longer jagged. ** ** By removing the periodicity, the trend can be seen more accurately than before. (I'm worried that the first data per month in the periodic graph is high ...)

You can also see that the level graph has decreased significantly at the beginning of May. This is probably due to the GW rather than the state changing. So next, let's consider a model when holidays are incorporated as extrinsic.

When seasonality and exogenous are taken into consideration

Finally, it is a model when exogenous is also taken into consideration. Consider holidays as extrinsic. Import jpholiday to put the holiday flag. (Here is the code when using google colab) Add a holiday column that will be 1 for holidays.

Since Saturday and Sunday are closed from the beginning, the holiday flag is not set.

python


#Put a holiday flag, in the case of google colab
!pip install jpholiday
import jpholiday

#Holiday flag
daily_data["holiday"] = 0
for i in range(len(daily_data)):
  if daily_data.index[i].dayofweek != 6: #Only if not Sunday
    date_i = daily_data.index[i] #date
    if jpholiday.is_holiday(date_i): #For holidays
      daily_data.loc[date_i,"holiday"] = 1
daily_data.head()

If it's only a holiday, you can't consider consecutive holidays. Manually rewrite the winter vacation from December to January, the GW in May, and the Obon holiday in August to 1.

python


list1 = daily_data.query("20190101 <= index <= 20190104").index
list2 = daily_data.query("20190429 <= index <= 20190503").index
list3 = daily_data.query("20190812 <= index <= 20190816").index
list4 = daily_data.query("20191230 <= index <= 20191231").index

for i in list1:
  daily_data.loc[i,"holiday"]=1
for i in list2:
  daily_data.loc[i,"holiday"]=1
for i in list3:
  daily_data.loc[i,"holiday"]=1
for i in list4:
  daily_data.loc[i,"holiday"]=1

Finally, let's create a local linear trend model with exogenous and plot it. ** Put exog in the argument and specify the holiday flag created earlier. ** **

python


#Local linear trend model with seasonality and exogenousness
mod_season_exog_trend = sm.tsa.UnobservedComponents(
    daily_data.power,
    "local linear trend",
    seasonal=7,
    exog=daily_data.holiday
)

res_season_exog_trend = mod_season_exog_trend.fit(
    method="bfgs",
    maxiter=1000, 
)

#Drawing estimated state / trend / seasonal effects
rcParams['figure.figsize'] = 15, 20
fig = res_season_exog_trend.plot_components()

ダウンロード (10).png

Focusing on the beginning of May, the decrease is ** slightly ** suppressed compared to the case of periodicity alone. However, it seems that this decrease is not completely suppressed only by adding the holiday flag.

It seems that further improvement is needed, but this was the limit of my current ability. .. ..

Summary

--Implementing seasonal and extrinsic models using stats model --Implemented seasonality as a day of the week and extrinsic as a holiday for electricity demand. --The periodic model was highly effective and could be implemented. —— Holidays alone are not enough for extrinsic models and need further improvement

Thank you for reading for me until the end.

Reference article

--State space model by Python: https://logics-of-blue.com/python-state-space-models/ --Quick trend extraction: Introduction to time series analysis with statsmodels in Python: https://data.gunosy.io/entry/statsmodel_trend

Recommended Posts

Time series analysis using a general Gaussian state-space model using Python [Implementation example considering exogenous and seasonality]
[Python] Implementation of clustering using a mixed Gaussian model
General Gaussian state-space model in Python
Python: Time Series Analysis: Stationarity, ARMA / ARIMA Model
Introduction to Time Series Analysis ~ Seasonal Adjustment Model ~ Implemented in R and Python
Python: Time Series Analysis
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 5: Training and saving a model using T-SQL
Graph time series data in Python using pandas and matplotlib
[Statistics] [Time series analysis] Plot the ARMA model and grasp the tendency.
Implement a model with state and behavior (3) --Example of implementation by decorator
Time series analysis 2 Stationary, ARMA / ARIMA model
I tried time series analysis! (AR model)
Recommendation tutorial using association analysis (python implementation)
Time series analysis 4 Construction of SARIMA model
Time series analysis # 6 Spurious regression and cointegration
Challenge to future sales forecast: ④ Time series analysis considering seasonality by Stats Models