I am not an infectious disease expert, so please read it with an understanding.
The new type of corona pneumonia (Covid-19) that occurred in Wuhan, Hubei Province, China from December 2019 has spread to Japan and the number of infected people is increasing. On April 7, 2020, a state of emergency was issued in Japan as well, calling for refraining from going out. Although working from home is becoming more widespread, there are reports that it is not enough to refrain from going out. Therefore how much going out self-restraint can I, or to change how the increase of infected people number by going out self-restraint, I would like to predict.
This time, we will predict the number of infected people by returning to machine learning. There are various algorithms for regression, but this time, scikit-learn's Support Vector Regression .html) is used.
I used Kaggle's COVID-19 Complete Dataset (Updated every 24hrs) covid_19_clean_complete.csv. This data contains data from all over the world, but only the number of infected people in Japan is used. For Japan, there are data from 2020/1/22 to 2020/4/9. The figure below shows the number of infected people and the number of new infected people on a daily basis.
There is no good data on how much people are refraining from going out all over Japan due to refraining from going out. However, the New Coronavirus Infectious Disease Control Site in Tokyo has data on changes in the number of Toei Subway users. We will use this data on the Toei Subway in Tokyo, although it is limited to weekly data. However, this data is distributed in PDF. Therefore, I manually entered it in CSV format (crying). The infected person data distributed on the new coronavirus infection control site is distributed in CSV format, so I wanted the number of Toei Subway users to be distributed in CSV format as well. Below is a graph of the rate of increase / decrease in the number of Toei Subway users.
Day | Shinjuku 7 o'clock | Shinjuku 8 o'clock | Shinjuku 9 o'clock | Shibuya 7 o'clock | Shibuya 8 o'clock | Shibuya 9 o'clock | Tokyo 7 o'clock | Tokyo 8 o'clock | Tokyo 9 o'clock |
---|---|---|---|---|---|---|---|---|---|
2020/1/31 | 1.88% | -2.96% | 0.39% | 0.57% | -4.58% | -1.86% | -1.49% | -1.93% | 0.44% |
2020/2/7 | 0.18% | -1.03% | 2.06% | -0.56% | -4.05% | 1.65% | 1.15% | 0.84% | 1.97% |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
The following data is extracted from Kaggle's number of infected people data.
Index | Day | Number of infected people |
---|---|---|
1 | 2020/1/22 | 2 |
2 | 2020/1/23 | 2 |
3 | 2020/1/24 | 2 |
4 | 2020/1/25 | 2 |
5 | 2020/1/26 | 4 |
6 | 2020/1/27 | 4 |
7 | 2020/1/28 | 7 |
8 | 2020/1/29 | 7 |
9 | 2020/1/30 | 11 |
10 | 2020/1/31 | 5 |
... | ... | ... |
79 | 2020/4/9 | 4667 |
The explanatory variable X is [0, 1, 2, 2, 3, ...., 79]. Predicted target Y is the number of infected people [2, 2, 2, 2, 4, 4, 7, ...].
From 2020/1/22 to 2020/4/9, 90% of the front is learning data and 10% of the back is test data. The regression model was trained with the training data and evaluated with the test data. The training data part is almost exactly the same, but the test data part is a little different. Simple regression alone cannot predict correctly. We also predicted 10 days in the future.
For the simple forecast, we used only Kaggle data, but we will add the rate of decrease in the number of Toei Subway users to make the forecast. It is said that the effect of self-restraint will appear after about two weeks. Therefore, the number of infected people and the rate of decrease in Toei Subway users two weeks ago are combined to make a forecast.
From 2020/1/22 to 2020/4/9, 90% of the front is learning data and 10% of the back is test data. The regression model was trained with the training data and evaluated with the test data. The training data part and the test data part are also exactly the same. In addition, we forecast the future 10 days and show it with a thick red line. In the future, the number of infected people will increase rapidly.
We evaluated how the effect of self-restraint appears in the prediction using a prediction model. The Toei Subway user increase / decrease rate will be affected after 2 weeks, so if the Toei Subway user increase / decrease rate is 40% lower than it actually is from 2 weeks ago (March 27), we will simulate what would happen. I tried it. The number of subway users on April 3 decreased by about 40%, depending on the time of day. Since it is further reduced by 40%, it is a simulation when the reduction is about 80%.
As of March 27, two weeks ago, if the number of subway users could be reduced by about 80%, it was predicted that the pace of increase in the number of infected people could be significantly reduced. If you can see the effect of refraining from going out through such predictions, you will feel like refraining from going out. However, the effect will be visible after more than two weeks, so you have to put up with it.
Kaggle : https://www.kaggle.com/imdevskp/corona-virus-report Tokyo Metropolitan Government's new coronavirus infection control site: https://stopcovid19.metro.tokyo.lg.jp/
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import numpy as np
import pandas as pd
from datetime import datetime
import copy
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVR
def read_confirmed():
data = pd.read_csv('covid_19_clean_complete.csv')
data_japan = data[data.loc[:, 'Country/Region']=='Japan']
data_japan = data_japan.reset_index(drop=True)
data_japan = data_japan.loc[:, ['Date', 'Confirmed', 'Deaths', 'Recovered']]
data_japan['New_confirmed'] = (data_japan['Confirmed'] - data_japan['Confirmed'].shift(1)).fillna(0)
return data_japan
def read_subway(data_japan):
subway = pd.read_csv('20200409_subway.csv')
data_japan1 = pd.merge(data_japan, subway, left_on='Date', right_on='Date3', how="outer")
data_japan1 = data_japan1.drop(["Date1", "Date2", "Date3"], axis=1)
data_japan1 = data_japan1.loc[:78, :]
data_japan1 = data_japan1.fillna(0)
data_japan1["x"] = data_japan1.index
return data_japan1
def plot1(data_japan):
# x=data_japan.loc[:, 'Date']
x = np.arange(len(data_japan))
plt.plot(x, data_japan.loc[:, 'Confirmed'], label='confirmed')
plt.plot(x, data_japan.loc[:, 'New_confirmed'], label='New_confirmed')
plt.title('Japan')
plt.legend()
plt.savefig('japan_confirmed.png')
plt.cla()
def predict_svr1(data_japan):
y = data_japan['Confirmed']
x = np.arange(len(data_japan)).reshape((-1,1))
Xtrain, Xtest, Ytrain, Ytest = train_test_split(x, y, test_size=0.10, shuffle=False)
svm_confirmed = SVR(shrinking=True, kernel='poly',gamma=0.01, epsilon=1,degree=5, C=0.1)
svm_confirmed.fit(Xtrain, Ytrain)
Ytrain_pred = svm_confirmed.predict(Xtrain)
Ytest_pred = svm_confirmed.predict(Xtest)
#Future forecast
Xtest2 = np.arange(Xtest[-1]+1, Xtest[-1]+11).reshape((-1, 1))
Ytest_pred2 = svm_confirmed.predict(Xtest2)
#plot
plt.plot(np.arange(len(data_japan)), data_japan.loc[:, 'Confirmed'], label="confirmed", color='blue')
plt.plot(Xtrain, Ytrain_pred, '--', label="train_pred", color='red')
plt.plot(Xtest, Ytest_pred, label="test_pred", color='red', linewidth=1)
plt.plot(Xtest2, Ytest_pred2, label="pred2", color='red', linewidth=3)
plt.legend()
plt.title('Japan')
plt.savefig('japan_redict1.png')
plt.cla()
def predict_svr2(data_japan):
y = data_japan["Confirmed"]
x = data_japan[['Shinjyuku_7', 'Shinjyuku_8', 'Shinjyuku_9', 'Shibuya_7', 'Shibuya_8', 'Shibuya_9', 'Tokyo_7', 'Tokyo_8', 'Tokyo_9', 'x']]
Xtrain, Xtest, Ytrain, Ytest = train_test_split(x, y, test_size=0.10, shuffle=False)
Xtest = Xtest.reset_index(drop=True)
svm_confirmed = SVR(shrinking=True, kernel='poly',gamma=0.01, epsilon=1,degree=5, C=0.1)
svm_confirmed.fit(Xtrain, Ytrain)
Ytrain_pred = svm_confirmed.predict(Xtrain)
Ytest_pred = svm_confirmed.predict(Xtest)
#Future Prediction 1
Xtest2 = copy.deepcopy(Xtest)
last = len(Xtest) - 1
Xtest2.loc[:, 'x'] = np.arange(Xtest.loc[last, 'x'] + 1, Xtest.loc[last, 'x'] + 1 + len(Xtest))
last = len(Xtest) - 1
Xtest2['Shinjyuku_7'] = Xtest.loc[last, 'Shinjyuku_7']
Xtest2['Shinjyuku_8'] = Xtest.loc[last, 'Shinjyuku_8']
Xtest2['Shinjyuku_9'] = Xtest.loc[last, 'Shinjyuku_9']
Xtest2['Shibuya_7'] = Xtest.loc[last, 'Shibuya_7']
Xtest2['Shibuya_8'] = Xtest.loc[last, 'Shibuya_8']
Xtest2['Shibuya_9'] = Xtest.loc[last, 'Shibuya_9']
Xtest2['Tokyo_7'] = Xtest.loc[last, 'Tokyo_7']
Xtest2['Tokyo_8'] = Xtest.loc[last, 'Tokyo_8']
Xtest2['Tokyo_9'] = Xtest.loc[last, 'Tokyo_9']
Ytest_pred2 = svm_confirmed.predict(Xtest2)
#Future Prediction 2
Xtest3 = copy.deepcopy(Xtest)
Xtest3.loc[:, 'x'] = np.arange(Xtest.loc[last, 'x'] + 1, Xtest.loc[last, 'x'] + 1 + len(Xtest))
reduce = -0.1
num = len(Xtest3)
diff = reduce / (num - 1)
Xtest3.loc[:, 'Shinjyuku_7'] = [Xtest.loc[last, 'Shinjyuku_7'] + diff * i for i in range(num)]
Xtest3.loc[:, 'Shinjyuku_8'] = [Xtest.loc[last, 'Shinjyuku_8'] + diff * i for i in range(num)]
Xtest3.loc[:, 'Shinjyuku_9'] = [Xtest.loc[last, 'Shinjyuku_9'] + diff * i for i in range(num)]
Xtest3.loc[:, 'Shibuya_7'] = [Xtest.loc[last, 'Shibuya_7'] + diff * i for i in range(num)]
Xtest3.loc[:, 'Shibuya_8'] = [Xtest.loc[last, 'Shibuya_8'] + diff * i for i in range(num)]
Xtest3.loc[:, 'Shibuya_9'] = [Xtest.loc[last, 'Shibuya_9'] + diff * i for i in range(num)]
Xtest3.loc[:, 'Tokyo_7'] = [Xtest.loc[last, 'Tokyo_7'] + diff * i for i in range(num)]
Xtest3.loc[:, 'Tokyo_8'] = [Xtest.loc[last, 'Tokyo_8'] + diff * i for i in range(num)]
Xtest3.loc[:, 'Tokyo_9'] = [Xtest.loc[last, 'Tokyo_9'] + diff * i for i in range(num)]
Ytest_pred3 = svm_confirmed.predict(Xtest3)
#plot
plt.plot(np.arange(len(data_japan)), data_japan.loc[:, 'Confirmed'], label="confirmed", color='blue')
plt.plot(Xtrain['x'], Ytrain_pred, '--', label="train_pred", color='red')
plt.plot(Xtest['x'], Ytest_pred, label="test_pred", color='red', linewidth=1)
plt.plot(Xtest2['x'], Ytest_pred2, label="pred2", color='red', linewidth=3)
plt.plot(Xtest2['x'], Ytest_pred3, label="pred3", color='green', linewidth=3)
plt.legend()
plt.title('Japan')
plt.savefig('japan_redict2.png')
def main():
#Infected person data read
data_japan = read_confirmed()
data_japan.to_csv('data_japan.csv', index=False)
plot1(data_japan)
#Prediction 1
predict_svr1(data_japan)
#Infected person data read
data_japan = read_subway(data_japan)
#Prediction 2
predict_svr2(data_japan)
if __name__ == '__main__':
main()
Recommended Posts