[PYTHON] Challenge to future sales forecast: ⑤ Time series analysis by Prophet

Introduction

Until the last time, we have forecasted future sales using the ARIMA model of time series analysis. I'm planning to make various efforts, but the parameters that can be adjusted are limited, and the accuracy does not improve.

-Challenge to future sales forecast: ① What is time series analysis? -Challenge to future sales forecast: ② Time series analysis using PyFlux -Challenge to future sales forecast: ③ Parameter tuning of PyFlux -Challenge to future sales forecast: ④ Time series analysis considering seasonality by Stats Models

Therefore, I would like to learn from the current trend of Deep Learning instead of the ARIMA model. However, it is difficult to start from scratch suddenly, so this time I would like to use Prophet, a library for time series analysis published by Facebook, which is often used as "speaking of time series analysis".

I was programming while looking at the following sites, but it didn't work as I expected in some places. Is the version of the library changed?

-Introduction to Prophet [Python] Facebook Time Series Prediction Tool -Future prediction of time series data using AI prophet on facebook

Or rather, Prophet was released in 2017. I lived without knowing that. .. ..

Analytical environment

Google Colaboratory

Target data

As before [previous], the data uses daily sales and temperature (average, maximum, minimum) as explanatory variables.

date Sales amount Average temperature Highest temperature Lowest Temperature
2018-01-01 7,400,000 4.9 7.3 2.2
2018-01-02 6,800,000 4.0 8.0 0.0
2018-01-03 5,000,000 3.6 4.5 2.7
2018-01-04 7,800,000 5.6 10.0 2.6

1. Original data creation

The process of pulling data from BigQuery to Pandas is the same as before. However, since I am predicting the future, I am making the past 2 years (df) and the future 1 month (df_future).

You also need to convert the date item to datetime64 type after that. In addition, the date should be changed to ds and the predicted value (here the sales amount) should be changed to the variable name y.

import pandas as pd

query = """
SELECT * 
FROM `myproject.mydataset.mytable`
WHERE CAST(Date AS TIMESTAMP) between CAST("{from_day}" AS TIMESTAMP) AND CAST("{to_day}" AS TIMESTAMP) ORDER BY p_date'
"""

df = pd.io.gbq.read_gbq(query.format(from_day="2017-01-01",to_day="2018-12-31"), project_id="myproject", dialect="standard")
df_future = pd.io.gbq.read_gbq(query.format(from_day="2019-01-01",to_day="2019-01-31"), project_id="myproject", dialect="standard")

from datetime import datetime

#Convert date item to datetime64 type
def strptime_with_offset(string, format='%Y-%m-%d'):
  base_dt = datetime.strptime(string, format)
  return base_dt

df['date'] = df['date'].apply(strptime_with_offset)

df.rename(columns={'Sales amount': 'y','date': 'ds'}, inplace=True)

2. Model learning

Call Prophet and add various things to the model.

from fbprophet import Prophet

#The model is non-linear
model = Prophet(growth='logistic', daily_seasonality=False)

#You can specify a country to add holidays
model.add_country_holidays(country_name="JP")

#Add seasonality with monthly elements
model.add_seasonality(name='monthly', period=30.5, fourier_order=5)

#Variables to add to the forecast
features_list =["Average temperature","Highest temperature","Lowest Temperature"]

for f in features_list:
  model.add_regressor(f)

#In the case of non-linearity, CAP is essential, so enter the upper limit value.
df['cap']=15000000 

model.fit(df)

This will train the model. You can easily add items, so it seems good to learn while adding and subtracting various elements.

Then apply the resulting model to future data.

#How far do you predict? Specify 30 days here
future = model.make_future_dataframe(periods=30, freq='D')
future["cap"]=15000000

#Since we need variables to add to the forecast such as temperature, df_Predict after merging with future
future=pd.merge(future, df_future, on="ds")
df_forecast = model.predict(future)

The prediction result is now stored in df_forecast. Looking at the contents, it seems that it is entered with a value of yhat. Furthermore, it predicts by width as yhat_lower and yhat_upper. In addition, various trends, seasonality, temperature, etc. are analyzed.

3. Visualize prediction results

Let's graph the analysis results in an easy-to-understand manner. You can compare the sales forecast and the actual results for the past month.

from matplotlib import pyplot as plt
% matplotlib inline

df_output=pd.merge(df_forecast, df_future, on="ds")

#For some reason, in the current version, an error occurred without the following
pd.plotting.register_matplotlib_converters()

df_output.plot(figsize=(18, 12), x="ds", y=["yhat","y"])

image.png

The forecast (yhat) is slightly higher, but it seems that the future forecast shows a fairly good trend by raising and lowering.

You can also extract and see the trend and periodicity.

model.plot_components(df_forecast)
plt.show()

image.png

――Holiday is Coming-of-Age Day. It is piercing. ――At Weekly, weekends on Saturdays and Sundays are still expensive. ――Monthly is squishy. Does that mean that the end of the month and the beginning of the month are high?

in conclusion

It wasn't straightforward, such as the item names being ds and y, and the programs of the pioneers in some places causing errors, but when it was completed, it was very simple to move.

The formula is not included in the program, but when comparing y and yhat, the monthly error is within about 10%, so I feel that it can be used sufficiently.

This time, the sales amount of the entire store was used, but in the future, I would like to find something with higher accuracy, such as the number of visitors and the sales amount of only a specific category.

Recommended Posts

Challenge to future sales forecast: ⑤ Time series analysis by Prophet
Challenge to future sales forecast: ② Time series analysis using PyFlux
Challenge to future sales forecast: ④ Time series analysis considering seasonality by Stats Models
Time series analysis practice sales forecast
Challenge to future sales forecast: ③ PyFlux parameter tuning
Time series analysis Part 3 Forecast
Challenges for future sales forecasts: (1) What is time series analysis?
Python: Time Series Analysis
RNN_LSTM1 Time series analysis
Time series analysis 1 Basics
Time series analysis related memo
A study method for beginners to learn time series analysis
Time series analysis part 4 VAR
Time series analysis Part 1 Autocorrelation
Easy time series prediction with Prophet
Python: Time Series Analysis: Preprocessing Time Series Data
Time series analysis 2 Stationary, ARMA / ARIMA model
I tried time series analysis! (AR model)
Time series analysis Part 2 AR / MA / ARMA
Time series analysis 4 Construction of SARIMA model
matplotlib Write text to time series graph
How to handle time series data (implementation)
Time series analysis # 6 Spurious regression and cointegration
Introduction to Time Series Analysis ~ Seasonal Adjustment Model ~ Implemented in R and Python
Predicting the future of Numazu's population transition by time-series regression analysis using Prophet