Last time, in "Challenge to future sales forecast: ② Time series analysis using PyFlux", we built a model of ARIMA and ARIMAX using PyFlux. I did.
However, the accuracy was not very good. I was groping for the parameters such as the number of dimensions of AR and MA, but was that not the case? However, "statistically this is good!" Is a high hurdle for me (sweat).
So I searched for something like GridSearch in scikit-learn. Then, "[Predict the transition of TV Asahi's viewing rate with the SARIMA model](https://qiita.com/mshinoda88/items/749131478bfefc9bf365#sarima%E3%83%A2%E3%83%87%E3%" 83% AB% E5% AD% A3% E7% AF% 80% E8% 87% AA% E5% B7% B1% E5% 9B% 9E% E5% B8% B0% E5% 92% 8C% E5% 88% 86% E7% A7% BB% E5% 8B% 95% E5% B9% B3% E5% 9D% 87% E3% 83% A2% E3% 83% 87% E3% 83% AB) ", Stats Models Since parameter tuning of time series analysis using was implemented, I made it with reference to that.
Google Colaboratory
Last time Similarly, the data uses daily sales and temperature (average, maximum, minimum) as explanatory variables.
date | Sales amount | Average temperature | Highest temperature | Lowest Temperature |
---|---|---|---|---|
2018-01-01 | 7,400,000 | 4.9 | 7.3 | 2.2 |
2018-01-02 | 6,800,000 | 4.0 | 8.0 | 0.0 |
2018-01-03 | 5,000,000 | 3.6 | 4.5 | 2.7 |
2018-01-04 | 7,800,000 | 5.6 | 10.0 | 2.6 |
The following is the program of Last time. The parameters are ar, ma, and integ.
import pyflux as pf
model = pf.ARIMA(data=df, ar=5, ma=5, integ=1, target='Sales amount', family=pf.Normal())
x = model.fit('MLE')
So far, I've talked about parameter tuning, but basically each parameter takes an integer, so I'm looping through the numbers.
def optimisation_arima(df, target):
import pyflux as pf
df_optimisations = pd.DataFrame(columns=['p','d','q','aic'])
max_p=4
max_d=4
max_q=4
for p in range(0, max_p):
for d in range(0, max_d):
for q in range(0, max_q):
model = pf.ARIMA(data=df, ar=p, ma=q, integ=d, target=target, family=pf.Normal())
x = model.fit('MLE')
print("AR:",p, " I:",d, " MA:",q, " AIC:", x.aic)
tmp = pd.Series([p,d,q,x.aic],index=df_optimisations.columns)
df_optimisations = df_optimisations.append( tmp, ignore_index=True )
return df_optimisations
Now when you call it like this
df_output = optimisation_arima(df, "Sales amount")
The result is displayed. There are several evaluation criteria for PyFlux, but we use AIC (the smaller the better model).
AR: 0 I: 0 MA: 0 AIC: 11356.163772323638
AR: 0 I: 0 MA: 1 AIC: 11262.28357561013
AR: 0 I: 0 MA: 2 AIC: 11218.453940684196
AR: 0 I: 0 MA: 3 AIC: 11171.121950637687
AR: 0 I: 1 MA: 0 AIC: 11462.586538415879
Therefore, the AR / I / MA combination with the lowest AIC can be selected as the optimum parameter.
df_optimisations[df_optimisations.aic == min(df_optimisations.aic)]
Since the accuracy of previous was a terrible defeat, brute force parameter tuning was performed.
However, the accuracy of the results that came out did not improve.
You have to think about the next improvement plan.
Recommended Posts