Google has begun publishing COVID-19 forecast data on its dashboard Second decoction, but I tried how Prophet would predict I used this because the data on the number of infected people in Japan was in Kaggle.
--Implementation: November 24, 2020 --Package: Prophet
data set Period: 2020/2/6 ~ 2020/11/20 (It seems to be updated every 3 days) Domestic: Domestic Airport: Airport inspection Returnee: Returnee Positive: Number of negatives Tested: Number of inspectors There are other columns, but data loss was scattered, so this time I will use Domestic and Airport Positive
import numpy as np
import pandas as pd
from fbprophet import Prophet
from fbprophet.plot import add_changepoints_to_plot
df = pd.read_csv('covid_jpn_total_1124.csv')
df_dom = df[df['Location'] == 'Domestic']
#print(df_dom.isnull().sum())
df_air = df[df['Location'] == 'Airport']
#print(df_air.isnull().sum())
df_air = df_air.dropna(how='any')
print(df_air.describe)
Raw data was cumulative, so take the difference daily pos_def: Positive number / day test_def: Tested number / day (I intend to use it to predict the negative rate, but I will not use it this time)
arr3 = [0]
arr1 = np.array(df_dom.iloc[1:,2])
arr2 = np.array(df_dom.iloc[:-1,2])
arr3 = np.append(arr3, arr1 - arr2)
df_dom['pos_def'] = arr3
arr3 = [0]
arr1 = np.array(df_dom.iloc[1:,3])
arr2 = np.array(df_dom.iloc[:-1,3])
arr3 = np.append(arr3, arr1 - arr2)
df_dom['test_def'] = arr3
Prepare a Dataframe according to the Prophet specifications
df_test = pd.DataFrame()
df_test['ds'] = pd.to_datetime(df_air['DS'])
df_test['y'] = df_air['pos_def']
print(df_test)
df_test.iloc[:,1].plot()
Fit the Prophet model to the prepared data and execute the prediction including the next 30 days
m = Prophet(yearly_seasonality=False, weekly_seasonality=True, daily_seasonality=True)
m.fit(df_test)
future = m.make_future_dataframe(periods=30, freq='D', include_history=True)
#future.tail()
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
Draw result
fig = m.plot(forecast, figsize=(20, 10))
ax = add_changepoints_to_plot(fig.gca(), m, forecast)
ax = fig.gca()
ax.set_title("Positive", size=16)
ax.set_xlabel("date", size=16)
ax.set_ylabel("# Positives", size=16)
ax.tick_params(axis="x", labelsize=14)
ax.tick_params(axis="y", labelsize=14)
Black dots are actual data (Ground Truth) The light blue region shows the upper and lower limits of the 95% confidence interval. As shown in the graph, the model can trace the data accurately. When the number of infected people exceeded 3,000 in early December, the third wave was predicted to converge.
It's too optimistic for everyone, but the data would make such a prediction based on past changes in the number of infected people. Prophet captures seasonal fluctuations, but since the data used is less than a year, there is no tendency for the number of infected people to increase because it is winter. If the corona is widespread for three or four years, I think that such a tendency will be visible in the data, but I hope Accuracy cannot be expected unless other explanatory variables * 1 are added and multivalidate.
In addition, Google's current (11/24) forecast is as shown in the figure below. Similar to the Prophet result, the number exceeded 3,000 in early December, but has increased steadily since then.
Incidentally, the positive judgment prediction in Prophet's airport inspection is shown in the figure below.
After a few weeks, I will re-predict with the same code and compare it with this result, maybe around 4000 people.
Recommended Posts