Overview

Google has begun publishing COVID-19 forecast data on its dashboard Second decoction, but I tried how Prophet would predict I used this because the data on the number of infected people in Japan was in Kaggle.

--Implementation: November 24, 2020 --Package: Prophet

Predict the number of infected people with Prophet

data set Period: 2020/2/6 ~ 2020/11/20 (It seems to be updated every 3 days) Domestic: Domestic Airport: Airport inspection Returnee: Returnee Positive: Number of negatives Tested: Number of inspectors There are other columns, but data loss was scattered, so this time I will use Domestic and Airport Positive

import numpy as np 
import pandas as pd 
from fbprophet import Prophet
from fbprophet.plot import add_changepoints_to_plot

df = pd.read_csv('covid_jpn_total_1124.csv')
df_dom = df[df['Location'] == 'Domestic']
#print(df_dom.isnull().sum())
df_air = df[df['Location'] == 'Airport']
#print(df_air.isnull().sum())

df_air = df_air.dropna(how='any')
print(df_air.describe)

Raw data was cumulative, so take the difference daily pos_def: Positive number / day test_def: Tested number / day (I intend to use it to predict the negative rate, but I will not use it this time)

arr3 = [0]
arr1 = np.array(df_dom.iloc[1:,2])
arr2 = np.array(df_dom.iloc[:-1,2]) 
arr3 = np.append(arr3, arr1 - arr2)
df_dom['pos_def'] = arr3

arr3 = [0]
arr1 = np.array(df_dom.iloc[1:,3])
arr2 = np.array(df_dom.iloc[:-1,3]) 
arr3 = np.append(arr3, arr1 - arr2)
df_dom['test_def'] = arr3

Prepare a Dataframe according to the Prophet specifications

df_test = pd.DataFrame()
df_test['ds'] = pd.to_datetime(df_air['DS'])
df_test['y'] = df_air['pos_def']
print(df_test)
df_test.iloc[:,1].plot()

Fit the Prophet model to the prepared data and execute the prediction including the next 30 days

m = Prophet(yearly_seasonality=False, weekly_seasonality=True, daily_seasonality=True)
m.fit(df_test)
future = m.make_future_dataframe(periods=30, freq='D', include_history=True)
#future.tail()
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

Draw result

fig = m.plot(forecast, figsize=(20, 10))
ax = add_changepoints_to_plot(fig.gca(), m, forecast)
ax = fig.gca()
ax.set_title("Positive", size=16)
ax.set_xlabel("date", size=16)
ax.set_ylabel("# Positives", size=16)
ax.tick_params(axis="x", labelsize=14)
ax.tick_params(axis="y", labelsize=14)

Black dots are actual data (Ground Truth) The light blue region shows the upper and lower limits of the 95% confidence interval. As shown in the graph, the model can trace the data accurately. When the number of infected people exceeded 3,000 in early December, the third wave was predicted to converge.

It's too optimistic for everyone, but the data would make such a prediction based on past changes in the number of infected people. Prophet captures seasonal fluctuations, but since the data used is less than a year, there is no tendency for the number of infected people to increase because it is winter. If the corona is widespread for three or four years, I think that such a tendency will be visible in the data, but I hope Accuracy cannot be expected unless other explanatory variables * 1 are added and multivalidate.

1. Is it multiple variables that change in a month or two, such as the negative rate, the number of severely ill people, the number of discharged people, and the number of immigrants? The poor quality and incompetence of politicians and officials is constant, so it is useless in data science.

In addition, Google's current (11/24) forecast is as shown in the figure below. Similar to the Prophet result, the number exceeded 3,000 in early December, but has increased steadily since then.

Incidentally, the positive judgment prediction in Prophet's airport inspection is shown in the figure below.

After a few weeks, I will re-predict with the same code and compare it with this result, maybe around 4000 people.

[PYTHON] Predict the number of people infected with COVID-19 with Prophet

Overview

Predict the number of infected people with Prophet