Is it possible to predict the future from past temperatures? To distract me, I tried the Gluon toolkit ** Gluon TS ** for stochastic time series modeling.
To use it, you need the mxnet and gluon libraries (see this article). If you have installed Python with anaconda etc., you can install it immediately with pip.
$pip install -U pip
$pip install mxnet
$pip install gluonts
If pip itself is old, an error may occur when executing the source code after this, so it is a good idea to update pip itself.
I want to get the temperature data as a csv file, so I get the data from the Past Meteorological Data Download Site of the Japan Meteorological Agency. I decided to do it.
As an example, I downloaded the data for 2 years from August 14, 2018 to August 14, 2020 in Saga City, Saga Prefecture. I managed to get the data for two years in order to remember the yearly transition.
# Confirmed operation on jupyter notebook
import pandas as pd
import datetime
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# First, read the temperature data of Saga Prefecture
tempera_data = pd.read_excel ("saga_weather_data_20200815.xlsx") # * Modified to excel file for easy processing
tempera_data ["time"] = pd.to_datetime (tempera_data ['date'])
Currently, the data looks like this.
We will use the "time" and "average temperature (℃)" columns of this DataFrame.
analysis_data = tempera_data [["time",'average temperature (℃)']]
Let's visualize what kind of data it is.
plt.figure(figsize=(15, 5))
plt.plot (analysis_data ["time"], analysis_data ["average temperature (℃)"])
plt.grid(True)
plt.show()
In order to predict the latest 7 days, we decided to set learning: 1st to 725th and evaluation: 726th. It is the orange part in the figure below.
tmp_time = np.arange(0, len(analysis_data))
analysis_data ["re_time"] = tmp_time # Convert time information
plt.figure(figsize=(15, 5))
# Area you want to infer (shaded part)
plt.axvspan(725,len(tmp_time),color="orange")
plt.plot (tmp_time, analysis_data ["average temperature (℃)"])
plt.xlabel("[day]")
plt.grid(True)
plt.show()
Define learning and evaluation data.
from gluonts.dataset.common import ListDataset
make_train
predict_length = 7
training_data = ListDataset(
[{"start": analysis_data["time"].values[0], "target": analysis_data.iloc[:len(tmp_time)-predict_length, 1]}],
freq = "24H")
make_test
test_data = ListDataset(
[{"start": analysis_data["time"].values[0], "target": analysis_data.iloc[:len(tmp_time), 1]}],
freq = "24H")
Define an estimator for learning. The parameters were set by referring to the reference article and Official Tutorial.
from gluonts.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.trainer import Trainer
estimator = SimpleFeedForwardEstimator(freq="24H",
context_length=20,
prediction_length=10,
trainer=Trainer(epochs=300,
batch_size=32,
learning_rate=0.001))
predictor = estimator.train(training_data=training_data)
Let's visualize what kind of inference result will be.
from gluonts.dataset.util import to_pandas
for test_entry, forecast in zip(test_data, predictor.predict(test_data)):
plt.figure(figsize=(15, 5))
to_pandas(test_entry).plot(linewidth=2)
forecast.plot(color='g', prediction_intervals=[50.0, 90.0])
plt.legend(["observations", "median prediction", "90% confidence interval", "50% confidence interval"],
loc='lower left')
plt.grid(which='both')
Although there is a lot of variation, it seems that we can make some predictions.
Thank you very much!
Recommended Posts