[PYTHON] Challenges for future sales forecasts: (1) What is time series analysis?

Introduction

Recently, the retail industry is also whispering with big data and AI, and there are various consultations from each department every day. Especially recently, there are many future cases such as "I want you to predict next month's sales", "How much should I sell next week", "Should I do additional sales promotion next month" from the store department. About sales.

Previously, the target was 105% compared to the same month of the previous year, but the declining birthrate and aging population, inbound weather, abnormal weather, and other changes in the world have made the year-on-year comparison useless. Therefore, I would like to know how much it will sell and how much it will not sell if it is done as usual at the store, and use it as a standard for thinking about how much to add to it at events and advertisements.

What analysis is good

I'm working on data now, but I'm a crunchy human being, so I'm not familiar with the complex methods of statistics. So, at the beginning, I tried to make predictions by using regression analysis with information on weather, sales promotion measures, and surrounding events, but the accuracy did not improve at all. .. ..

At that time, when I researched various things, I learned that there is a "time series analysis" for predicting stocks.

What is time series analysis?

"Statistics to understand all humankind", "[Predict TV Asahi's audience rate transition with SARIMA model](https://s I would like to organize the time series analysis with my understanding while referring to ": //qiita.com/mshinoda88/items/749131478bfefc9bf365)". (I'm sorry if I made a mistake. Please tell me without any difficult formulas ...)

1. Time series analysis means that past sales are included in forecast variables.

In the regression analysis I originally made, I was trying to explain sales with a completely different variable:

Earnings= a{1} *temperature+ a{2} *Promotional expenses+・ ・ ・

However, if one day's sales are 10 million yen, how much will the next day's sales be? It won't be a million yen. On the contrary, it will not be 100 million yen. Probably, 12 million yen or 8 million yen, I think that it will not be so much off the sales of the previous day.

Therefore, the method is to improve the accuracy by using the past sales as explanatory variables as follows.

Earnings{n} = a{1} * Earnings{n-1} + a{2} * Earnings{n-2} +・ ・ ・

It seems that this is called AR (autoregressive).

2. Time series analysis considers past errors

For the autoregressive model of 1, if the sales of last month are higher than the original, it is considered that there was a pre-emption of sales, and the possibility that sales will decrease this month is considered. This can be expressed as:

Earnings{n} = b{1} *error{n} + b{2} *error{n-1} +・ ・ ・

It seems that this is called MA (moving average).

3. Time series analysis does not mean repeating the exact same cycle

It's easy if the cycle is repeated, but that's not the case with the strict time series of reality. It seems to be called "non-stationary process" in difficult words.

It seems that we should consider the uptrend and downtrend as a medium- to long-term trend rather than a short-term trend.

These 1 to 3 are collectively called the ARIMA (Auto Regressive Integrated Moving Average) model. The feeling that AR and MA are united is cool.

4. Since it is a time series, seasonality must be taken into consideration.

Even if you do so far, the accuracy will not improve. But that's what retailers know. There should be seasonality, such as sales not increasing every year in February and September, but I haven't taken that into consideration.

Even though it is seasonal, I think there are various cycles.

--Days of the week: On Saturdays and Sundays, sales increase at stores that stock up on holidays --Days in the month: After the 25th or payday, a little expensive items will sell and sales will increase. --Month of the year: As mentioned above, sales will decline in February and September.

It seems that the SARIMA model can take these cycles into consideration.

5. We have to consider factors other than time series

So far we have seen the time series elements, but I would like to incorporate sudden elements as well.

――Weather: Not only does it rain, but there are also recent abnormal weather. —— Event: If there is an athletic meet or festival near the store, that alone will greatly increase sales. ――Competition: If a rival store is opened nearby, sales will drop by a certain amount of 10 to 10% after that.

It seems that the ARIMAX model considers these external variables.

A site to refer to in order to realize these with Python

-State-space model by Python A model with + data interpretation added to the ARIMA model is called a state space model.

-Time series data prediction library --PyFlux- It seems that there is a library called PyFlux that can implement ARIMA, ARIMAX, and state space models.

-I understand this time RNN, LSTM edition -Forecasting airline passenger numbers next month with RNN These are like neural networks

in conclusion

This time, I'm sorry for all the letters. From the next time onward, I will actually try time series analysis.

Recommended Posts

Challenges for future sales forecasts: (1) What is time series analysis?
Time series analysis practice sales forecast
Challenge to future sales forecast: ② Time series analysis using PyFlux
Challenge to future sales forecast: ⑤ Time series analysis by Prophet
Challenge to future sales forecast: ④ Time series analysis considering seasonality by Stats Models
Python: Time Series Analysis
RNN_LSTM1 Time series analysis
Time series analysis 1 Basics
Python 3.4 Create Windows7-64bit environment (for financial time series analysis)
What is Logistic Regression Analysis?
Time series analysis related memo
A study method for beginners to learn time series analysis
Time series analysis part 4 VAR
Time series analysis Part 3 Forecast
What is the interface for ...
Time series analysis Part 1 Autocorrelation
What is Python? What is it used for?
Python: Time Series Analysis: Preprocessing Time Series Data
What is scraping? [Summary for beginners]
What is the python underscore (_) for?
Time series analysis 3 Preprocessing of time series data
What is xg boost (1) (for beginners)
What is Multinomial Logistic Regression Analysis?
LSTM (1) for time series forecasting (for beginners)
Time series analysis 2 Stationary, ARMA / ARIMA model
[Python] What is pandas Series and DataFrame?
I tried time series analysis! (AR model)
Time series analysis Part 2 AR / MA / ARMA
[Statistics for programmers] What is an event?
Time series analysis 4 Construction of SARIMA model
Time series data anomaly detection for beginners
Time series analysis # 6 Spurious regression and cointegration