[PYTHON] I tried to predict the infection of new pneumonia using the SIR model: ☓ Wuhan edition ○ Hubei edition

Recently, new pneumonia has become popular. I had previously conducted research using a mathematical model of infectious diseases called the SIR model, so I applied it to a new model of pneumonia. Mathematical models can be used to predict the future of infectious diseases.

This time, we will focus on the center of outbreaks ~~ Wuhan ~~ *** Hubei Province *** (province including Wuhan) to model infectious diseases and predict the future of infectious diseases.

keyword Epidemiology, SIR model, nCoV-2019, New pneumonia, New coronavirus

SIR model

The SIR model is a model that expresses the transition of the number of infected people as a differential equation (also analyzed in an easy-to-understand manner in [Wikipedia](https://ja.wikipedia.org/wiki/SIR model)). In the SIR model, a person is considered to have three conditions for an infectious disease.

  1. People who can get infected: S
  2. Infected person: I
  3. Those who have recovered from the infection and gained immunity, or who died: R

The SIR model is based on the S (t), I (t), and R (t) notations for those who may be infected at time t, those who are infected, and those who have been cured of the infection, respectively.

\dot{S}(t) = -\beta S(t)I(t),\\ \dot{I}(t) = \beta S(t)I(t) - \gamma I(t)\\ \dot{R}(t) = \gamma I(t)

It is described as. Here, β represents the infection rate, and γ represents the recovery rate (+ mortality rate). The increase in the number of infected people is proportional to the infection rate β, the person S (t) who may be infected, and the person I (t) who is infected.

Please note that people who die do not cause infection, so they are equated with people who have been cured of the infection.

here, $ S(t) + I(t) + R(t) = N $ Is constant and matches the population of the area. This time, we will use the population of Hubei Province ~~ Wuhan ~~.

~~ Wuhan ~~ Using the infection data of new hepatitis in Hubei Province, we will learn the infection rate β and recovery rate γ and predict the future of Wuhan.

Data used

Infection data is taken from here published on kaggle. In addition, the population data of ~~ Wuhan ~~ Hubei Province uses the 2017 demographic data described in here.

Learning parameters using SIR model

~~ Wuhan ~~ The transition of infection from January 22nd to February 4th, 2020 in Hubei Province is as follows.

Hubei_time_dose.png

The blue line is the number of infected people, the orange line is the number of deaths, and the green line is the number of people who have recovered. Is it a little strange that the number of people recovered and the number of deaths are about the same? I think.

I tried fitting with the SIR model so that this data can be expressed.

SIR_model_Hubei.png

Recovered people is the sum of the number of people who have recovered and the number of deaths. In addition, the blue dot is the number of infected people actually observed, and the blue line is the result of approximation by the SIR model. The orange dots and orange lines are the measured and predicted values of the number of people who have recovered.

It seems that it can be approximated sufficiently.

~~ Wuhan ~~ Predicting the future of Hubei infection

It seems that it can be approximated enough, so I used the learned parameters to predict the future of infection in Hubei Province.

The following figure shows the forecast for 10 days from February 4th.

Hubei_10days_future.png

The points are the measured values and the lines are the predicted values. According to the SIR model, it seems to increase.

Next is the forecast for one year from February 4th.

Hubei_365days_future.png

The infection doesn't seem to stop at all.

Consideration

--Using the SIR model, I tried to predict the spread of the new pneumonia infection in Hubei Province, but the result was that the spread would not stop. --~~ Wuhan ~~ It seems that the cause is that the number of people who recovered in Hubei Province is not accurately measured. --If the recovery rate γ can be measured accurately using other data, it will be possible to make better predictions.

Future tasks

Next, I would like to predict the spread of infection throughout China based on traffic volume.

Postscript: Basic reproduction number R0

The infectivity of the disease is evaluated by the basic reproduction number R0. R0 is given by the ratio of the dimensionless infection rate β_hat to the dimensionless recovery rate γ_hat. Therefore, the basic reproduction number R0 in Hubei Province is

R_0 = \frac{\hat{\beta}}{\hat{\gamma}} = \frac{\beta N^2}{\gamma N} \approx 17.54

It becomes. This value is about the same as airborne diseases such as measles. It is thought that the strength of this infectivity is because the recovery rate is underestimated as mentioned in the discussion.

code

https://github.com/yuji0001/2020nCoV_analysis

Recommended Posts

I tried to predict the infection of new pneumonia using the SIR model: ☓ Wuhan edition ○ Hubei edition
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried to predict the price of ETF
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried to predict the number of domestically infected people of the new corona with a mathematical model
I tried to predict the deterioration of the lithium ion battery using the Qore SDK
I tried to predict the victory or defeat of the Premier League using the Qore SDK
I tried refactoring the CNN model of TensorFlow using TF-Slim
Day 71 I tried to predict how long this self-restraint will continue with the SIR model
I tried to get the index of the list using the enumerate function
I tried to predict Covid-19 using Darts
I tried using PDF data of online medical care based on the spread of the new coronavirus infection
I tried to predict the up and down of the closing price of Gurunavi's stock price using TensorFlow (progress)
I tried to get the batting results of Hachinai using image processing
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
I tried using the trained model VGG16 of the deep learning library Keras
I tried to extract and illustrate the stage of the story using COTOHA
I tried to streamline the standard role of new employees with Python
Using COTOHA, I tried to follow the emotional course of Run, Melos!
I tried to analyze the New Year's card by myself using python
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried using the image filter of OpenCV
I tried to vectorize the lyrics of Hinatazaka46!
I tried to notify the update of "Hamelin" using "Beautiful Soup" and "IFTTT"
[Python] I tried to judge the member image of the idol group using Keras
I tried to display the infection condition of coronavirus on the heat map of seaborn
I tried to create a model with the sample of Amazon SageMaker Autopilot
I tried to automatically send the literature of the new coronavirus to LINE with Python
python beginners tried to predict the number of criminals
I tried to predict the J-League match (data analysis)
I tried to approximate the sin function using chainer
I tried using the API of the salmon data project
I tried to visualize the spacha information of VTuber
[MNIST] I tried Fine Tuning using the ImageNet model.
I tried to erase the negative part of Meros
I tried to identify the language using CNN + Melspectogram
I tried to complement the knowledge graph using OpenKE
I tried to classify the voices of voice actors
I tried to compress the image using machine learning
I tried to summarize the string operations of Python
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried to extract the text in the image file using Tesseract of the OCR engine
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I tried to find the entropy of the image with python
[Horse Racing] I tried to quantify the strength of racehorses
I tried to simulate how the infection spreads with Python
I tried to find the average of the sequence with TensorFlow
[Introduction to SIR model] Predict the end time of each country with COVID-19 data fitting ♬
I tried to simulate ad optimization using the bandit algorithm.
I tried to compare the accuracy of machine learning models using kaggle as a theme.
I made a function to check the model of DCGAN
[Python] I tried to visualize the follow relationship of Twitter
[TF] I tried to visualize the learning result using Tensorboard
[Machine learning] I tried to summarize the theory of Adaboost
[Python] I tried collecting data using the API of wikipedia
I tried to predict the genre of music from the song title on the Recurrent Neural Network
I tried to fight the Local Minimum of Goldstein-Price Function
I tried to approximate the sin function using chainer (re-challenge)