[PYTHON] I tried to predict the number of people infected with coronavirus in consideration of the effect of refraining from going out

at first

I am not an infectious disease expert, so please read it with an understanding.

The new type of corona pneumonia (Covid-19) that occurred in Wuhan, Hubei Province, China from December 2019 has spread to Japan and the number of infected people is increasing. On April 7, 2020, a state of emergency was issued in Japan as well, calling for refraining from going out. Although working from home is becoming more widespread, there are reports that it is not enough to refrain from going out. Therefore how much going out self-restraint can I, or to change how the increase of infected people number by going out self-restraint, I would like to predict.

Prediction algorithm

This time, we will predict the number of infected people by returning to machine learning. There are various algorithms for regression, but this time, scikit-learn's Support Vector Regression .html) is used.

data

Number of infected people data

I used Kaggle's COVID-19 Complete Dataset (Updated every 24hrs) covid_19_clean_complete.csv. This data contains data from all over the world, but only the number of infected people in Japan is used. For Japan, there are data from 2020/1/22 to 2020/4/9. The figure below shows the number of infected people and the number of new infected people on a daily basis. japan_confirmed.png

Self-restraint effect

There is no good data on how much people are refraining from going out all over Japan due to refraining from going out. However, the New Coronavirus Infectious Disease Control Site in Tokyo has data on changes in the number of Toei Subway users. We will use this data on the Toei Subway in Tokyo, although it is limited to weekly data. However, this data is distributed in PDF. Therefore, I manually entered it in CSV format (crying). The infected person data distributed on the new coronavirus infection control site is distributed in CSV format, so I wanted the number of Toei Subway users to be distributed in CSV format as well. Below is a graph of the rate of increase / decrease in the number of Toei Subway users.

Day Shinjuku 7 o'clock Shinjuku 8 o'clock Shinjuku 9 o'clock Shibuya 7 o'clock Shibuya 8 o'clock Shibuya 9 o'clock Tokyo 7 o'clock Tokyo 8 o'clock Tokyo 9 o'clock
2020/1/31 1.88% -2.96% 0.39% 0.57% -4.58% -1.86% -1.49% -1.93% 0.44%
2020/2/7 0.18% -1.03% 2.06% -0.56% -4.05% 1.65% 1.15% 0.84% 1.97%
... ... ... ... ... ... ... ... ... ...

subway.png

Evaluation method

Simple prediction

The following data is extracted from Kaggle's number of infected people data.

Index Day Number of infected people
1 2020/1/22 2
2 2020/1/23 2
3 2020/1/24 2
4 2020/1/25 2
5 2020/1/26 4
6 2020/1/27 4
7 2020/1/28 7
8 2020/1/29 7
9 2020/1/30 11
10 2020/1/31 5
... ... ...
79 2020/4/9 4667

The explanatory variable X is [0, 1, 2, 2, 3, ...., 79]. Predicted target Y is the number of infected people [2, 2, 2, 2, 4, 4, 7, ...].

From 2020/1/22 to 2020/4/9, 90% of the front is learning data and 10% of the back is test data. The regression model was trained with the training data and evaluated with the test data. The training data part is almost exactly the same, but the test data part is a little different. Simple regression alone cannot predict correctly. We also predicted 10 days in the future. japan_redict1.png

Prediction considering self-restraint from going out

For the simple forecast, we used only Kaggle data, but we will add the rate of decrease in the number of Toei Subway users to make the forecast. It is said that the effect of self-restraint will appear after about two weeks. Therefore, the number of infected people and the rate of decrease in Toei Subway users two weeks ago are combined to make a forecast.

From 2020/1/22 to 2020/4/9, 90% of the front is learning data and 10% of the back is test data. The regression model was trained with the training data and evaluated with the test data. The training data part and the test data part are also exactly the same. In addition, we forecast the future 10 days and show it with a thick red line. In the future, the number of infected people will increase rapidly. japan_redict2.png

We evaluated how the effect of self-restraint appears in the prediction using a prediction model. The Toei Subway user increase / decrease rate will be affected after 2 weeks, so if the Toei Subway user increase / decrease rate is 40% lower than it actually is from 2 weeks ago (March 27), we will simulate what would happen. I tried it. The number of subway users on April 3 decreased by about 40%, depending on the time of day. Since it is further reduced by 40%, it is a simulation when the reduction is about 80%.

japan_redict2A.png

As of March 27, two weeks ago, if the number of subway users could be reduced by about 80%, it was predicted that the pace of increase in the number of infected people could be significantly reduced. If you can see the effect of refraining from going out through such predictions, you will feel like refraining from going out. However, the effect will be visible after more than two weeks, so you have to put up with it.

References

Kaggle : https://www.kaggle.com/imdevskp/corona-virus-report Tokyo Metropolitan Government's new coronavirus infection control site: https://stopcovid19.metro.tokyo.lg.jp/

program

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
import numpy as np
import pandas as pd
from datetime import datetime
import copy
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVR

def read_confirmed():
    data = pd.read_csv('covid_19_clean_complete.csv')
    data_japan = data[data.loc[:, 'Country/Region']=='Japan']
    data_japan = data_japan.reset_index(drop=True)
    data_japan = data_japan.loc[:, ['Date', 'Confirmed', 'Deaths', 'Recovered']]
    data_japan['New_confirmed'] = (data_japan['Confirmed'] - data_japan['Confirmed'].shift(1)).fillna(0)
    return data_japan

def read_subway(data_japan):
    subway = pd.read_csv('20200409_subway.csv')
    data_japan1 = pd.merge(data_japan, subway, left_on='Date', right_on='Date3', how="outer")
    data_japan1 = data_japan1.drop(["Date1", "Date2", "Date3"], axis=1)
    data_japan1 = data_japan1.loc[:78, :]
    data_japan1 = data_japan1.fillna(0)
    data_japan1["x"] = data_japan1.index
    return data_japan1

def plot1(data_japan):
    # x=data_japan.loc[:, 'Date']
    x = np.arange(len(data_japan))
    plt.plot(x, data_japan.loc[:, 'Confirmed'], label='confirmed')
    plt.plot(x, data_japan.loc[:, 'New_confirmed'], label='New_confirmed')
    plt.title('Japan')
    plt.legend()
    plt.savefig('japan_confirmed.png')
    plt.cla()

def predict_svr1(data_japan):
    y = data_japan['Confirmed']
    x = np.arange(len(data_japan)).reshape((-1,1))
    Xtrain, Xtest, Ytrain, Ytest = train_test_split(x, y, test_size=0.10, shuffle=False)
    svm_confirmed = SVR(shrinking=True, kernel='poly',gamma=0.01, epsilon=1,degree=5, C=0.1)
    svm_confirmed.fit(Xtrain, Ytrain)
    Ytrain_pred = svm_confirmed.predict(Xtrain)
    Ytest_pred = svm_confirmed.predict(Xtest)

    #Future forecast
    Xtest2 = np.arange(Xtest[-1]+1, Xtest[-1]+11).reshape((-1, 1))
    Ytest_pred2 = svm_confirmed.predict(Xtest2)

    #plot
    plt.plot(np.arange(len(data_japan)), data_japan.loc[:, 'Confirmed'], label="confirmed", color='blue')
    plt.plot(Xtrain, Ytrain_pred, '--', label="train_pred", color='red')
    plt.plot(Xtest, Ytest_pred, label="test_pred", color='red', linewidth=1)
    plt.plot(Xtest2, Ytest_pred2, label="pred2", color='red', linewidth=3)
    plt.legend()
    plt.title('Japan')
    plt.savefig('japan_redict1.png')
    plt.cla()

def predict_svr2(data_japan):
    y = data_japan["Confirmed"]
    x = data_japan[['Shinjyuku_7', 'Shinjyuku_8', 'Shinjyuku_9', 'Shibuya_7', 'Shibuya_8', 'Shibuya_9', 'Tokyo_7', 'Tokyo_8', 'Tokyo_9', 'x']]
    Xtrain, Xtest, Ytrain, Ytest = train_test_split(x, y, test_size=0.10, shuffle=False)
    Xtest = Xtest.reset_index(drop=True)
    svm_confirmed = SVR(shrinking=True, kernel='poly',gamma=0.01, epsilon=1,degree=5, C=0.1)
    svm_confirmed.fit(Xtrain, Ytrain)
    Ytrain_pred = svm_confirmed.predict(Xtrain)
    Ytest_pred = svm_confirmed.predict(Xtest)

    #Future Prediction 1
    Xtest2 = copy.deepcopy(Xtest)
    last = len(Xtest) - 1
    Xtest2.loc[:, 'x'] = np.arange(Xtest.loc[last, 'x'] + 1, Xtest.loc[last, 'x'] + 1 + len(Xtest))
    last = len(Xtest) - 1
    Xtest2['Shinjyuku_7'] = Xtest.loc[last, 'Shinjyuku_7']
    Xtest2['Shinjyuku_8'] = Xtest.loc[last, 'Shinjyuku_8']
    Xtest2['Shinjyuku_9'] = Xtest.loc[last, 'Shinjyuku_9']
    Xtest2['Shibuya_7'] = Xtest.loc[last, 'Shibuya_7']
    Xtest2['Shibuya_8'] = Xtest.loc[last, 'Shibuya_8']
    Xtest2['Shibuya_9'] = Xtest.loc[last, 'Shibuya_9']
    Xtest2['Tokyo_7'] = Xtest.loc[last, 'Tokyo_7']
    Xtest2['Tokyo_8'] = Xtest.loc[last, 'Tokyo_8']
    Xtest2['Tokyo_9'] = Xtest.loc[last, 'Tokyo_9']
    Ytest_pred2 = svm_confirmed.predict(Xtest2)

    #Future Prediction 2
    Xtest3 = copy.deepcopy(Xtest)
    Xtest3.loc[:, 'x'] = np.arange(Xtest.loc[last, 'x'] + 1, Xtest.loc[last, 'x'] + 1 + len(Xtest))
    reduce = -0.1
    num = len(Xtest3)
    diff = reduce / (num - 1)
    Xtest3.loc[:, 'Shinjyuku_7'] = [Xtest.loc[last, 'Shinjyuku_7'] + diff * i for i in range(num)]
    Xtest3.loc[:, 'Shinjyuku_8'] = [Xtest.loc[last, 'Shinjyuku_8'] + diff * i for i in range(num)]
    Xtest3.loc[:, 'Shinjyuku_9'] = [Xtest.loc[last, 'Shinjyuku_9'] + diff * i for i in range(num)]
    Xtest3.loc[:, 'Shibuya_7'] = [Xtest.loc[last, 'Shibuya_7'] + diff * i for i in range(num)]
    Xtest3.loc[:, 'Shibuya_8'] = [Xtest.loc[last, 'Shibuya_8'] + diff * i for i in range(num)]
    Xtest3.loc[:, 'Shibuya_9'] = [Xtest.loc[last, 'Shibuya_9'] + diff * i for i in range(num)]
    Xtest3.loc[:, 'Tokyo_7'] = [Xtest.loc[last, 'Tokyo_7'] + diff * i for i in range(num)]
    Xtest3.loc[:, 'Tokyo_8'] = [Xtest.loc[last, 'Tokyo_8'] + diff * i for i in range(num)]
    Xtest3.loc[:, 'Tokyo_9'] = [Xtest.loc[last, 'Tokyo_9'] + diff * i for i in range(num)]
    Ytest_pred3 = svm_confirmed.predict(Xtest3)

    #plot
    plt.plot(np.arange(len(data_japan)), data_japan.loc[:, 'Confirmed'], label="confirmed", color='blue')
    plt.plot(Xtrain['x'], Ytrain_pred, '--', label="train_pred", color='red')
    plt.plot(Xtest['x'], Ytest_pred, label="test_pred", color='red', linewidth=1)
    plt.plot(Xtest2['x'], Ytest_pred2, label="pred2", color='red', linewidth=3)
    plt.plot(Xtest2['x'], Ytest_pred3, label="pred3", color='green', linewidth=3)
    plt.legend()
    plt.title('Japan')
    plt.savefig('japan_redict2.png')

def main():
    #Infected person data read
    data_japan = read_confirmed()
    data_japan.to_csv('data_japan.csv', index=False)
    plot1(data_japan)
    #Prediction 1
    predict_svr1(data_japan)

    #Infected person data read
    data_japan = read_subway(data_japan)
    #Prediction 2
    predict_svr2(data_japan)

if __name__ == '__main__':
    main()

Recommended Posts

I tried to predict the number of people infected with coronavirus in consideration of the effect of refraining from going out
I tried to predict the number of people infected with coronavirus in Japan by the method of the latest paper in China
I tried to predict the number of domestically infected people of the new corona with a mathematical model
I tried to predict the behavior of the new coronavirus with the SEIR model.
Predict the number of people infected with COVID-19 with Prophet
I tried to summarize the new coronavirus infected people in Ichikawa City, Chiba Prefecture
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
Let's visualize the number of people infected with coronavirus with matplotlib
I tried to find the trend of the number of ships in Tokyo Bay from satellite images.
I tried to sort out the objects from the image of the steak set meal-② Overlap number sorting
I tried to predict the price of ETF
Considering the situation in Japan by statistician Nate Silver, "The number of people infected with coronavirus is meaningless"
[Completed version] Try to find out the number of residents in the town from the address list with Python
I tried to predict the horses that will be in the top 3 with LightGBM
I tried to automatically send the literature of the new coronavirus to LINE with Python
python beginners tried to predict the number of criminals
I wanted to know the number of lines in multiple files, so I tried to get it with a command
I tried to sort out the objects from the image of the steak set meal-④ Clustering
I tried to find the entropy of the image with python
I tried to find the average of the sequence with TensorFlow
I tried to tabulate the number of deaths per capita of COVID-19 (new coronavirus) by country
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
Sentiment analysis with natural language processing! I tried to predict the evaluation from the review text
I tried to predict the genre of music from the song title on the Recurrent Neural Network
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I tried to sort out the objects from the image of the steak set meal-① Object detection
I tried to predict the sales of game software with VARISTA by referring to the article of Codexa
Convert PDF of the situation of people infected in Tokyo with the new coronavirus infection of the Tokyo Metropolitan Health and Welfare Bureau to CSV
Introduction to AI creation with Python! Part 1 I tried to classify and predict what the numbers are from the handwritten number images.
I tried to describe the traffic in real time with WebSocket
I tried to process the image in "sketch style" with OpenCV
I tried to process the image in "pencil style" with OpenCV
I tried to expand the size of the logical volume with LVM
I tried to cut out a still image from the video
I tried to improve the efficiency of daily work with Python
I tried fitting the exponential function and logistics function to the number of COVID-19 positive patients in Tokyo
I tried to sort out the objects from the image of the steak set meal --③ Similar image Heat map detection
I tried to open the latest data of the Excel file managed by date in the folder with Python
I tried to get the authentication code of Qiita API with Python.
I tried to automatically extract the movements of PES players with software
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to learn the angle from sin and cos with chainer
I tried to streamline the standard role of new employees with Python
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to get the movie information of TMDb API with Python
I tried to display the altitude value of DTM in a graph
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried to refactor the template code posted in "Getting images from Flickr API with Python" (Part 2)
I tried to sort out the objects from the image of the steak set meal-⑤ Similar image feature point detection
Create a bot that posts the number of people positive for the new coronavirus in Tokyo to Slack
[Python & SQLite] I tried to analyze the expected value of a race with horses in the 1x win range ①
Introduction to AI creation with Python! Part 2 I tried to predict the house price in Boston with a neural network
I tried to predict the deterioration of the lithium ion battery using the Qore SDK
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
Create a BOT that displays the number of infected people in the new corona
I tried to predict the presence or absence of snow by machine learning.
I tried to predict the change in snowfall for 2 years by machine learning
[Python] I tried the same calculation as LSTM predict with from scratch [Keras]
The story of making soracom_exporter (I tried to monitor SORACOM Air with Prometheus)
I tried to display the infection condition of coronavirus on the heat map of seaborn
I tried to create a model with the sample of Amazon SageMaker Autopilot